English-only embedding models for multilingual docs
Text Embeddings: Speaking languages without learning them?
tl;dr: some *-en models perform well on other languages too.
Text Embeddings: Speaking languages without learning them?
tl;dr: some *-en models perform well on other languages too.
Let’s process 1.369.841 single English XML/HTML files and index them with BAAI/bge-base-en-v1.5 embeddings for semantic search! Sounds fun? Let’s go!
A full semantic search tutorial about:
Let’s set up a public Zulip instance for our geospatial community! It’s live: zulip.gis.chat
Semantic search right in your browser! Calculates the embeddings and cosine similarity client-side without server-side inferencing, using transformers.js and a quantized version of sentence-transformers/all-MiniLM-L6-v2.
Create a fully working semantic search stack with only Qdrant as vector database with built-in API and transformers.js using any huggingface model as your frontend-only embedding generator. No additional inference server needed!
Image courtesy Qdrant & Hugging Face.
Using qdrant for querying text data with vector search and geospatial filters without GPU (CPU only)
Image courtesy Qdrant transformed with Stable Diffusion v2 by stability-ai.