08/22/2022, West Palm Beach – Nelson Correa, Founder & CEO of Andinum, will be presenting “Enterprise Semantic Search with Python Large Language Models” at PyData Miami 2022 on September 22, 2022.
Enterprise Search is a key use case in big data and business computing. The talk introduces enterprise semantic search, BM25, and dense vector search with pre-trained large language models (LLMs), and presents a working demonstration in the financial domain, using the recent HuggingFace transformers library and data visualization library UMAP.
Dense vector search with pre-trained models must be used with caution, per use case. In spite of impressive performance of LLMs for many natural language processing tasks, such performance requires model fine-tuning to the task and dataset at hand – document search and ranking, in this case. BM25 remains a strong baseline, and often outperforms the zero-shot system performance of pre-trained models.
The talk will be of interest to developers working on text search and new unstructured data applications. Slides and a demo notebook will be available after the talk.
Update 09/15/2022: Talk slides and GitHub repository