OthermkurmanFree

clip

OpenAI CLIP — contrastive language-image pre-training. Zero-shot image classification, image-text similarity, concept search, and cross-modal retrieval. Embed images and text into shared space.

Repo bundle on Versuzmkurman/zorai3 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/mkurman/zorai Yours? Claim it ↗

§ 01 — Stats

Stars308

Forks22

Prior1815

Quality68.0

Score—

§ 02 — Install

Get clip.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install clip

Or clone the repo

$git clone https://github.com/mkurman/zorai.git

Or copy the SKILL.md manually

$cp zorai/skills/scientific-skills/clip/SKILL.md ~/.claude/skills/clip/SKILL.md

More Versuz picks

★ Featured$1.99

vz-bench-debug

Document

★ Featured$0.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge clip↵

Show SKILL.md content (~374 tokens)

---
name: clip
description: "OpenAI CLIP — contrastive language-image pre-training. Zero-shot image classification, image-text similarity, concept search, and cross-modal retrieval. Embed images and text into shared space."
tags: [clip, multimodal, image-text, zero-shot, embeddings, openai, zorai]
---
## Overview

OpenAI CLIP (Contrastive Language-Image Pre-training) learns joint text-image representations. Enables zero-shot image classification, image-text similarity, cross-modal search, and image captioning without task-specific training.

## Installation

```bash
uv pip install openai-clip
```

## Zero-Shot Classification

```python
import clip
import torch

model, preprocess = clip.load("ViT-B/32")
image = preprocess(load_image("photo.jpg")).unsqueeze(0)
text = clip.tokenize(["a dog", "a cat", "a bird"])

with torch.no_grad():
    logits, _ = model(image, text)
    probs = logits.softmax(dim=-1)

print(f"Predicted: class {probs.argmax().item()} with {probs.max():.2%} confidence")
```

## Text-Image Similarity

```python
images = torch.stack([preprocess(img) for img in [load_image("a.jpg"), load_image("b.jpg")]])
texts = clip.tokenize(["sunset", "ocean", "mountain"])

with torch.no_grad():
    similarity = model(images, texts)[0].softmax(dim=-1)
```

## References
- [CLIP GitHub](https://github.com/openai/CLIP)
- [CLIP paper](https://arxiv.org/abs/2103.00020)