Based on our record, txtai should be more popular than Annoy. It has been mentiond 76 times since March 2021. We are tracking product recommendations and mentions on various public social media platforms and blogs. They can help you identify which product is more popular and what people think of it.
Txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. - Source: dev.to / 4 months ago
Ideal For: Projects requiring quick setup and robust search capabilities. GitHub Repository. - Source: dev.to / 4 months ago
Excellent project. As mentioned in another comment, I've put together an embeddings database using the arxiv dataset (https://huggingface.co/NeuML/txtai-arxiv) recently. For those interested in the literature search space, a couple other projects I've worked on that may be of interest. Annotateai (https://github.com/neuml/annotateai) - Semantic search and workflows for medical/scientific papers. Built on txtai... - Source: Hacker News / 4 months ago
If you're looking for a lightweight open-source framework designed to handle the patterns mentioned in this article: https://github.com/neuml/txtai Disclaimer: I'm the author of the framework. - Source: Hacker News / 4 months ago
I fully agree. Postgres has solved many of the problems that many are re-solving with GenAI related databases. With txtai (https://github.com/neuml/txtai), I've went all in with Postgres + pgvector. Projects can start small with a SQLite backend then switch the persistence to Postgres. With this, you get all the years of battle-tested production experience from Postgres... - Source: Hacker News / 5 months ago
The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy. Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy), an open source embeddings database. - Source: Hacker News / over 1 year ago
If you want to go larger you could still use some simple setup in conjunction with faiss, annoy or hnsw. Source: almost 2 years ago
I then use annoy to compare them. Annoy can use different measures for distance, like cosine, euclidean and more. Source: almost 2 years ago
Yes you can do this for equality predicates if your row groups are sorted . This blog post (that I didn't write) might add more color. You can't do this for any kind of text searching. If you need to do this with file based storage I'd recommend using a vector based text search and utilize a ANN index library like Annoy. Source: almost 2 years ago
If you need large scale (1000+ dimension, millions+ source points, >1000 queries per second) and accept imperfect results / approximate nearest neighbors, then other people have already mentioned some of the best libraries (FAISS, Annoy). Source: almost 2 years ago
Milvus - Vector database built for scalable similarity search Open-source, highly scalable, and blazing fast.
Weaviate - Welcome to Weaviate
Scikit-learn - scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language.
Vespa.ai - Store, search, rank and organize big data
Qdrant - Qdrant is a high-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Typesense - Typo tolerant, delightfully simple, open source search 🔍