Embedding Service

Local Testing UI

Embedding Service Architecture

CPU-deployable RAG pipeline with multilingual embeddings, hybrid search, and AI-powered metadata extraction

384
Embedding Dimensions
3
Search Modes
100+
Languages Supported

System Architecture

API Layer
Embeddings
POST /v1/embeddings
Search
POST /v1/search
PDF Upload
POST /v1/pdf-upload
Extract
POST /v1/extract
Processing Layer
ONNX Embedder
multilingual-e5-small
CPU-Optimized
BM25 Encoder
Sparse vectors
Keyword Search
Reranker
Cross-encoder
Relevance Boost
PDF Processor
Docling + Chunking
Document Parsing
LLM Extractor
OpenRouter API
Metadata AI
Storage Layer
Qdrant Vector Database
High-performance vector search engine
v1.16.2
Cosine Similarity
HNSW Index

Core Capabilities

Text Embeddings

Generate 384-dimensional semantic vectors for 100+ languages using E5-small model

Multilingual CPU-Fast ONNX

Hybrid Search

Combine semantic (dense) and keyword (sparse) search with Reciprocal Rank Fusion

Dense + Sparse RRF BM25

Cross-Encoder Reranking

Improve search relevance by reranking top results with cross-encoder model

Precision Boost Optional

PDF Processing

Automatic PDF parsing, chunking, and embedding using Docling document understanding

Docling Auto-Chunk Batch

LLM Extraction

Extract structured metadata from documents using AI-powered LLM analysis (Romanian + English)

OpenRouter Qwen Bilingual

Vector Search

Fast semantic search with metadata filtering, score thresholds, and pagination support

HNSW Filtering Scalable

Specs Management

View, search, and filter extracted specifications across all documents in a collection

Searchable Filterable Organized

Example Workflows

Real test data from

Using mock data - click refresh to run real benchmarks

1

Simple Semantic Search

Query: "How do I reset my password?"
POST /v1/embeddings β†’ Generate 384-dim vector
Search Qdrant with cosine similarity
POST /v1/collections/docs/search β†’ Top 5 results
2

Hybrid Search + Rerank

Dense (semantic) + Sparse (BM25)
Parallel search, merged with RRF
Fetch top 50 candidates
rerank_top_k: 50 β†’ Cross-encoder scores
Return top 5 reranked results
Precision-optimized ranking
Improved relevance:
3

PDF Ingestion with AI Extraction

Upload product_manual.pdf
POST /v1/collections/products/pdf-upload
LLM extracts metadata (Romanian)
Embed all chunks β†’ Store in Qdrant
47 vectors + metadata stored
4

Cross-Language Retrieval

Query in English: "warranty information"
E5 model understands semantic intent
Finds Romanian documents
Matches "garanΘ›ie 2 ani" in stored docs
Cross-lingual magic:

Quick API Examples

1

Generate Embeddings

POST /v1/embeddings
{
  "input": "Hello world",
  "input_type": "document"
}
β†’ [0.023, -0.145, ..., 0.091]
384 dimensions, ~50ms
2

Hybrid Search

POST /v1/collections/docs/search
{
  "query": "password reset",
  "search_mode": "hybrid",
  "rerank": true,
  "top_k": 5
}
β†’ 5 results, reranked
3

Upload PDFs

POST /v1/collections/docs/pdf-upload
FormData:
  files: manual.pdf
  chunk_size: 400
  extract_specs: true
β†’ 47 chunks, metadata extracted
4

Extract Metadata

POST /v1/extract
{
  "text": "Router TP-Link...",
  "language_hint": "ro"
}
β†’ product, specs, confidence

Technology Stack

🐍
Python 3.11+
Runtime
⚑
FastAPI
Web Framework
πŸ”’
ONNX Runtime
Inference
πŸ—„οΈ
Qdrant
Vector DB
πŸ“„
Docling
PDF Parser
πŸ€–
OpenRouter
LLM API
🐳
Docker
Deployment
πŸ“¦
uv
Package Manager

Configuration

Configure your API key and service endpoint

Default: test-api-key

The embedding service endpoint

Health Dashboard

Monitor service and dependency status

Service Status

Generate Embeddings

Convert text into 384-dimensional vector embeddings

Response


          

Collections

Manage your vector collections

New Collection

Similarity Search

Find similar vectors using text queries

More Keyword Balanced More Semantic

Rerank results using a cross-encoder model for improved relevance

Number of candidates to fetch before reranking (min: 10)

Search Results

Reranked ( candidates)

Extracted Specifications

View and search all extracted specifications from uploaded PDFs

Loading specifications...

Found documents with extracted specs

No specifications found

Metadata Extraction

Extract structured metadata from text using LLM (OpenRouter API)

Text will be truncated to 14,000 characters

Optional: Define custom fields to extract

Extracted Metadata

Confidence:
fields

Product

Manufacturer

Model Number

Specifications

Categories

Language:
Tokens used:

                    
                  

PDF Upload

Upload PDFs to automatically chunk, embed, and store

Configuration

Extract product specs using AI (requires API key)

Optional: Define custom extraction fields

Processing PDFs...

Converting file(s), chunking with Docling, generating embeddings, and storing vectors

Upload Results

Total PDFs

Successful

Chunks Created