DatadurinwinterFree

azure-databricks

Databricks on Azure: cluster lifecycle, Delta Lake operations, MLflow tracking, and Unity Catalog patterns for this repo.

Repo bundle on Versuzdurinwinter/Harkonnen-Labs32 indexed entries (SKILL.md and CLAUDE.md) from this repository — open the full bundle view.

Open bundle →

View on GitHub ↗</>github.com/durinwinter/Harkonnen-Labs Yours? Claim it ↗

§ 01 — Stats

Stars4

Prior1728

Quality75.0

Score—

Tasks—

§ 02 — Install

Get azure-databricks.

Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.

One-line install · Claude Code

$npx versuz@latest install durinwinter-harkonnen-labs-factory-skill-registry-domain-azure-databricks

Or clone the repo

$git clone https://github.com/durinwinter/Harkonnen-Labs.git

Or copy the SKILL.md manually

cp Harkonnen-Labs/SKILL.MD ~/.claude/skills/durinwinter-harkonnen-labs-factory-skill-registry-domain-azure-databricks/SKILL.md

More Versuz picks

★ Featured$1.99

vz-scrape-runner

Web

Got something better ?Submit your skill — it enters tomorrow's cycle. No fee.

Submit yours →

§ 05 — Challenge

Think you can beat it?

$npx versuz challenge durinwinter-harkonnen-labs-factory-skill-registry-domain-azure-databricks↵

Show SKILL.md content (~515 tokens)

---
name: azure-databricks
description: "Databricks on Azure: cluster lifecycle, Delta Lake operations, MLflow tracking, and Unity Catalog patterns for this repo."
user-invocable: false
allowed-tools:
- Bash(databricks *)
- WebFetch(docs.databricks.com)
---

# Azure Databricks Domain Guide

This repo uses Azure Databricks. Apply these patterns.

## Cluster Lifecycle

- Never create interactive clusters when job clusters suffice — job clusters are cheaper and auto-terminate.
- Pin cluster runtime versions explicitly; do not rely on `latest`.
- Check cluster state before submitting jobs: `databricks clusters get --cluster-id <id>`.
- Auto-termination: always set `autotermination_minutes` on interactive clusters.

## Delta Lake

- Always read with `spark.read.format("delta")` — not `parquet` — for managed Delta tables.
- Schema evolution: use `.option("mergeSchema", "true")` only when schema drift is intentional.
- Vacuuming: `VACUUM tablename RETAIN 168 HOURS` — never vacuum below the 7-day default without explicit approval.
- Z-ORDER by query columns, not partition columns.

## MLflow

- Log every experiment run: `mlflow.set_experiment("name")` before `mlflow.start_run()`.
- Log parameters, metrics, and artifacts explicitly — autolog misses edge cases.
- Registered models go in Unity Catalog: `models:/CatalogName.SchemaName.ModelName/version`.

## Unity Catalog

- Use three-part names: `catalog.schema.table`.
- Never use `hive_metastore` for new tables — that's the legacy path.
- Check grants before assuming read/write access: `SHOW GRANTS ON TABLE catalog.schema.table`.

## Safety

- Treat production clusters as read-first. Never write to production Delta tables without explicit approval.
- Databricks secrets must be stored in secret scopes, not in notebook cells or job configs.
- Use service principals for automation — not personal access tokens committed to source.

azure-databricks

Get azure-databricks.

vz-bench-debug

vz-scrape-runner

Think you can beat it?