Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install kevinzai-commander-skills-ccc-data-vector-searchgit clone https://github.com/KevinZai/commander.gitcp commander/SKILL.MD ~/.claude/skills/kevinzai-commander-skills-ccc-data-vector-search/SKILL.md---
name: vector-search
description: "Vector database setup and semantic search implementation with Pinecone, pgvector, Qdrant, and embedding pipelines."
version: 1.0.0
category: data
parent: ccc-data
tags: [ccc-data, vector-search, embeddings, semantic-search]
disable-model-invocation: true
---
# Vector Search
## What This Does
Sets up vector databases and implements semantic search — enabling search by meaning rather than keywords. Covers embedding generation, vector database selection and configuration, indexing pipelines, hybrid search (vector + keyword), and relevance tuning. Supports Pinecone, pgvector, Qdrant, Weaviate, and ChromaDB.
## Instructions
1. **Assess the use case.** Clarify:
- What content is being searched? (documents, products, code, images)
- How much content? (thousands, millions, billions of vectors)
- What's the query pattern? (natural language, similar items, recommendations)
- Latency requirements? (real-time search, batch processing)
- Do you need filtering? (search within a category, date range, etc.)
2. **Choose the vector database.**
| Database | Best For | Hosting | Filtering | Cost |
|----------|----------|---------|-----------|------|
| pgvector | Already on Postgres, < 1M vectors | Self-managed | SQL WHERE clauses | Free (extension) |
| Pinecone | Managed service, production scale | Cloud | Metadata filtering | Pay per vector |
| Qdrant | Self-hosted, high performance | Self or cloud | Rich filtering | Free (open source) |
| Weaviate | Multi-modal, GraphQL API | Self or cloud | GraphQL filters | Free (open source) |
| ChromaDB | Prototyping, Python-native | In-process | Basic filtering | Free (open source) |
**Recommendation:** pgvector if you already have Postgres, Qdrant for self-hosted production, Pinecone for fully managed.
3. **Choose the embedding model.**
| Model | Dimensions | Best For | Cost |
|-------|-----------|----------|------|
| OpenAI text-embedding-3-small | 1536 | General text, good quality/cost balance | $0.02/1M tokens |
| OpenAI text-embedding-3-large | 3072 | Highest quality text embeddings | $0.13/1M tokens |
| Cohere embed-v3 | 1024 | Multilingual, search-optimized | $0.10/1M tokens |
| Voyage AI voyage-3 | 1024 | Code and technical content | $0.06/1M tokens |
| sentence-transformers (local) | 384-768 | Free, private, offline | Free (CPU/GPU) |
4. **Set up pgvector.**
```sql
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table with vector column
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1536), -- match your model's dimensions
created_at TIMESTAMPTZ DEFAULT now()
);
-- Create HNSW index (recommended for production)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Semantic search query
SELECT id, content, metadata,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE metadata->>'category' = 'tech' -- pre-filter
ORDER BY embedding <=> $1::vector
LIMIT 10;
```
5. **Set up Qdrant.**
```typescript
import { QdrantClient } from '@qdrant/js-client-rest';
const client = new QdrantClient({ url: 'http://localhost:6333' });
// Create collection
await client.createCollection('documents', {
vectors: {
size: 1536,
distance: 'Cosine',
},
optimizers_config: {
default_segment_number: 2,
},
});
// Create payload index for filtering
await client.createPayloadIndex('documents', {
field_name: 'category',
field_schema: 'keyword',
});
// Upsert vectors
await client.upsert('documents', {
points: documents.map((doc, i) => ({
id: doc.id,
vector: doc.embedding,
payload: {
content: doc.content,
category: doc.category,
created_at: doc.createdAt,
},
})),
});
// Search with filter
const results = await client.search('documents', {
vector: queryEmbedding,
filter: {
must: [{ key: 'category', match: { value: 'tech' } }],
},
limit: 10,
with_payload: true,
});
```
6. **Build the embedding pipeline.**
```typescript
import OpenAI from 'openai';
const openai = new OpenAI();
// Generate embeddings
async function embed(texts: string[]): Promise<number[][]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts,
});
return response.data.map(d => d.embedding);
}
// Chunking strategy for long documents
function chunkDocument(text: string, maxTokens: number = 500): string[] {
const sentences = text.split(/[.!?]+\s/);
const chunks: string[] = [];
let current = '';
for (const sentence of sentences) {
if (estimateTokens(current + sentence) > maxTokens) {
if (current) chunks.push(current.trim());
current = sentence;
} else {
current += (current ? '. ' : '') + sentence;
}
}
if (current) chunks.push(current.trim());
return chunks;
}
// Index a document
async function indexDocument(doc: Document) {
const chunks = chunkDocument(doc.content);
const embeddings = await embed(chunks);
await vectorDB.upsert(
chunks.map((chunk, i) => ({
id: `${doc.id}-${i}`,
vector: embeddings[i],
payload: {
content: chunk,
document_id: doc.id,
chunk_index: i,
total_chunks: chunks.length,
},
}))
);
}
```
7. **Implement hybrid search (vector + keyword).**
```sql
-- pgvector + pg_trgm hybrid search
WITH semantic AS (
SELECT id, content, 1 - (embedding <=> $1::vector) AS semantic_score
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 20
),
keyword AS (
SELECT id, content, ts_rank(to_tsvector(content), plainto_tsquery($2)) AS keyword_score
FROM documents
WHERE to_tsvector(content) @@ plainto_tsquery($2)
LIMIT 20
)
SELECT
COALESCE(s.id, k.id) AS id,
COALESCE(s.content, k.content) AS content,
COALESCE(s.semantic_score, 0) * 0.7 + COALESCE(k.keyword_score, 0) * 0.3 AS combined_score
FROM semantic s
FULL OUTER JOIN keyword k ON s.id = k.id
ORDER BY combined_score DESC
LIMIT 10;
```
## Output Format
```markdown
# Vector Search Setup: {Application}
## Architecture
| Component | Choice | Reason |
|-----------|--------|--------|
| Vector DB | {database} | {why} |
| Embedding model | {model} | {why} |
| Dimensions | {n} | {matches model} |
| Index type | {HNSW/IVFFlat} | {why} |
## Schema
{Table/collection definition}
## Indexing Pipeline
{How documents are chunked, embedded, and stored}
## Search API
{Query interface with example requests/responses}
## Performance
- Index size: {estimated}
- Query latency: {p50, p99}
- Recall@10: {estimated}
```
## Tips
- pgvector is good enough for most applications under 1M vectors — don't over-engineer with a dedicated vector DB
- HNSW index is slower to build but faster to query than IVFFlat — use HNSW for production
- Chunking strategy matters more than embedding model choice — experiment with chunk sizes (200-500 tokens)
- Hybrid search (vector + keyword) almost always outperforms either alone
- Include document metadata in vector payload for post-retrieval context
- For RAG applications, return the top 5-10 chunks, not just top 1 — let the LLM synthesize
- Re-index when you change embedding models — vectors from different models are incompatible
- Monitor search quality with user feedback (thumbs up/down) and relevance scoring