Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install vivekkarmarkar-claude-code-os-skills-standalone-experimental-testgit clone https://github.com/VivekKarmarkar/claude-code-os.gitcp claude-code-os/SKILL.MD ~/.claude/skills/vivekkarmarkar-claude-code-os-skills-standalone-experimental-test/SKILL.md---
name: standalone-experimental-test
description: Run standalone experimental tests that never touch the existing codebase. Creates isolated test files, runs them, captures results, then cleans up. Zero risk to working code.
triggers:
- "test this experimentally"
- "standalone test"
- "experimental test"
- "try this without touching the code"
- "test in isolation"
- "can we verify this works"
- "/standalone-experimental-test"
---
# Standalone Experimental Test — Zero-Risk Experimentation
Run experiments to verify assumptions, test APIs, or explore behavior WITHOUT modifying any existing project files. Everything happens in an isolated scratch space.
## Philosophy
The codebase is sacred. Experiments are disposable. Never mix the two.
## Workflow
### Step 1: Define the experiment
Before writing anything, state clearly:
```
Experiment: [what we're testing]
Hypothesis: [what we expect to happen]
Success criteria: [how we'll know it worked]
Files needed: [what test files we'll create]
Dependencies: [what packages/tools are needed]
```
Wait for user approval before proceeding.
### Step 2: Create the scratch space
Create an isolated directory for the experiment:
```bash
mkdir -p /tmp/experiment-<descriptive-name>-$(date +%s)
```
Rules:
- **NEVER create test files inside the project directory**
- **NEVER modify existing project files** — not even "temporarily"
- **NEVER add dependencies to the project's package.json or pyproject.toml**
- If the experiment needs project dependencies, install them separately in the scratch space
- If the experiment needs to import from the project, use absolute paths or symlinks — don't copy project files
### Step 3: Write the test
Create minimal, focused test files in the scratch space. The test should:
- Be as small as possible — test ONE thing
- Be self-contained — runnable without the project
- Print clear output — what happened, what was expected
- Handle errors gracefully — catch and report, don't crash silently
If the test needs project context (e.g., testing how a library interacts with project code):
- Read project files but don't write to them
- Import from the project using absolute paths
- Use the project's virtual environment if needed, but don't modify it
### Step 4: Run and capture
Run the test and capture ALL output:
```bash
cd /tmp/experiment-<name>-<timestamp>
# Run the test, capture stdout and stderr
python test.py 2>&1 | tee results.txt
```
### Step 5: Report findings
Present results clearly:
```
Experiment: [name]
Result: [CONFIRMED / DENIED / INCONCLUSIVE]
Evidence: [what the output showed]
Implication: [what this means for the project]
```
If the hypothesis was confirmed, explain what the next step would be to integrate it into the project (but don't do it — that's a separate task).
If denied, explain what we learned and what alternative approaches exist.
### Step 6: Clean up
Ask the user if they want to keep the experiment files:
```
Experiment complete. Keep the scratch files at /tmp/experiment-<name>-<timestamp>?
```
- If yes: leave them
- If no: `rm -rf /tmp/experiment-<name>-<timestamp>`
## Rules
1. **NEVER modify project files.** Not even a single line. Not even "just a log statement." If you need to test a modified version of a project file, copy it to the scratch space and modify the copy.
2. **NEVER install packages into the project.** If the experiment needs a package, install it in the scratch space or a temporary venv.
3. **NEVER commit experiment files.** They live in /tmp and die there.
4. **One experiment, one question.** Don't combine multiple hypotheses into one test. Run separate experiments.
5. **The scratch space is disposable.** Don't build anything in /tmp that you'd be sad to lose. If results are valuable, save them to the project as a deliberate, separate step.
6. **Read-only access to the project.** The experiment can READ project files, USE the project's installed packages (via their paths), and REFERENCE project configuration. It cannot WRITE, MODIFY, or ADD anything to the project.
## Examples
### Testing if a library handles partial input
```
Experiment: Does Streamdown render incomplete LaTeX gracefully?
Hypothesis: Passing partial LaTeX like "$$x = \frac{" renders without crashing
Success criteria: Component renders without error, shows partial equation
Files needed: test_partial_latex.html in scratch space
Dependencies: streamdown (use project's node_modules)
```
### Testing API response format
```
Experiment: What does the GitHub API return for repo tree requests?
Hypothesis: Returns flat array of file paths with types
Success criteria: We see the exact JSON structure
Files needed: test_gh_api.sh in scratch space
Dependencies: gh CLI (already installed)
```
## Proven Example: Testing OpenAI Realtime API Delta Streaming
This experiment was run on 2026-03-26 and CONFIRMED that function call arguments stream token by token.
**Definition:**
```
Experiment: Does OpenAI Realtime API stream function call arguments via delta events?
Hypothesis: Delta events fire token by token during function calls
Success criteria: Multiple delta events with partial JSON for a single function call
Result: CONFIRMED — 147 delta events over 1.4 seconds, each containing 1-6 chars
```
**Test file** (`test_deltas.py`):
```python
"""
Tests whether OpenAI Realtime API streams function call arguments.
Connects via WebSocket, sends a prompt that forces a function call,
captures all response.function_call_arguments.delta events.
Requirements: websockets, python-dotenv
Env: OPENAI_API_KEY must be set (or loaded from a .env.local)
"""
import asyncio
import json
import os
import time
# Load API key — point this at your project's .env.local (READ-ONLY)
from dotenv import load_dotenv
load_dotenv("/path/to/your/project/.env.local")
import websockets
RESULTS_FILE = os.path.join(os.path.dirname(__file__), "results.txt")
def log(msg):
timestamp = time.time()
line = f"[{timestamp:.3f}] {msg}"
print(line, flush=True)
with open(RESULTS_FILE, "a") as f:
f.write(line + "\n")
async def test_realtime_delta_streaming():
api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
log("ERROR: OPENAI_API_KEY not found")
return
url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview"
headers = {
"Authorization": f"Bearer {api_key}",
"OpenAI-Beta": "realtime=v1",
}
log("Connecting to OpenAI Realtime API...")
async with websockets.connect(url, additional_headers=headers) as ws:
# Configure session with a tool, text-only mode
await ws.send(json.dumps({
"type": "session.update",
"session": {
"modalities": ["text"],
"tools": [{
"type": "function",
"name": "show_content",
"description": "Show visual content",
"parameters": {
"type": "object",
"properties": {
"content": {
"type": "string",
"description": "Markdown/LaTeX content. Use $$ for display math."
}
},
"required": ["content"]
}
}],
"tool_choice": "required",
}
}))
resp = json.loads(await ws.recv())
log(f"Session configured: {resp['type']}")
# Prompt that forces a long function call argument
await ws.send(json.dumps({
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "user",
"content": [{
"type": "input_text",
"text": (
"Show me on the blackboard: the quadratic formula, "
"Euler's identity, the Pythagorean theorem, "
"and the definition of a derivative. "
"Use full LaTeX formatting with display math."
)
}]
}
}))
await ws.send(json.dumps({"type": "response.create"}))
log("Prompt sent. Waiting for function call events...")
delta_events = []
done = False
start_time = time.time()
while not done:
try:
msg = await asyncio.wait_for(ws.recv(), timeout=30)
except asyncio.TimeoutError:
log("TIMEOUT")
break
event = json.loads(msg)
event_type = event.get("type", "unknown")
if "function_call" in event_type:
elapsed = time.time() - start_time
if event_type == "response.function_call_arguments.delta":
delta_events.append({
"elapsed": round(elapsed, 3),
"delta": event.get("delta", ""),
})
log(f" DELTA #{len(delta_events):3d} @ {elapsed:.3f}s: '{event.get('delta', '')}'")
elif event_type == "response.function_call_arguments.done":
log(f" DONE @ {elapsed:.3f}s: ({len(event.get('arguments', ''))} chars)")
if event_type == "response.done":
done = True
# Report
log("")
log("=" * 60)
log(f"Total delta events: {len(delta_events)}")
if len(delta_events) > 1:
first = delta_events[0]["elapsed"]
last = delta_events[-1]["elapsed"]
log(f"CONFIRMED: {len(delta_events)} deltas over {last - first:.3f}s")
log(f"Reconstructed: {''.join(d['delta'] for d in delta_events)}")
elif len(delta_events) == 1:
log("DENIED: Only one delta event")
else:
log("INCONCLUSIVE: No delta events")
if __name__ == "__main__":
asyncio.run(test_realtime_delta_streaming())
```
**How to run:**
```bash
mkdir -p /tmp/experiment-deltas
cp test_deltas.py /tmp/experiment-deltas/
cd /tmp/experiment-deltas
# Use a venv with websockets and python-dotenv installed
python test_deltas.py 2>&1 | tee results.txt
```
**Expected output (CONFIRMED):**
```
DELTA # 1 @ 0.819s: '{"'
DELTA # 2 @ 0.819s: 'content'
DELTA # 3 @ 0.819s: '":"'
DELTA # 4 @ 0.819s: 'Here'
...
DELTA #147 @ 2.186s: '"}'
DONE @ 2.208s: (349 chars)
Total delta events: 147
CONFIRMED: 147 deltas over 1.367s
```