Free SKILL.md scraped from GitHub. Clone the repo or copy the file directly into your Claude Code skills directory.
npx versuz@latest install jeremylongshore-claude-code-plugins-plus-skills-plugins-saas-packs-castai-pack-skills-castai-core-workflow-agit clone https://github.com/jeremylongshore/claude-code-plugins-plus-skills.gitcp claude-code-plugins-plus-skills/SKILL.MD ~/.claude/skills/jeremylongshore-claude-code-plugins-plus-skills-plugins-saas-packs-castai-pack-skills-castai-core-workflow-a/SKILL.md---
name: castai-core-workflow-a
description: 'Configure CAST AI autoscaler policies and node templates for cost optimization.
Use when enabling Phase 2 automation, setting spot instance policies,
or configuring node downscaler and evictor settings.
Trigger with phrases like "cast ai autoscaler", "cast ai policies",
"cast ai spot instances", "cast ai node optimization".
'
allowed-tools: Read, Write, Edit, Bash(curl:*), Bash(kubectl:*), Grep
version: 1.0.0
license: MIT
author: Jeremy Longshore <jeremy@intentsolutions.io>
tags:
- saas
- kubernetes
- cost-optimization
- castai
compatibility: Designed for Claude Code
---
# CAST AI Core Workflow: Autoscaler & Policies
## Overview
Primary workflow for CAST AI: configure autoscaler policies to optimize cluster costs. Covers enabling spot instances, configuring the node downscaler and evictor, setting cluster CPU/memory limits, and creating node templates for workload-specific requirements.
## Prerequisites
- Completed `castai-install-auth` with Phase 2 (cluster controller + evictor)
- `CASTAI_API_KEY` and `CASTAI_CLUSTER_ID` set
- Cluster in "ready" status
## Instructions
### Step 1: Read Current Policies
```bash
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
"https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/policies" \
| jq .
```
### Step 2: Enable Cost-Optimized Autoscaling
```bash
curl -X PUT -H "X-API-Key: ${CASTAI_API_KEY}" \
-H "Content-Type: application/json" \
"https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/policies" \
-d '{
"enabled": true,
"unschedulablePods": {
"enabled": true,
"headroom": {
"cpuPercentage": 10,
"memoryPercentage": 10,
"enabled": true
}
},
"nodeDownscaler": {
"enabled": true,
"emptyNodes": {
"enabled": true,
"delaySeconds": 180
}
},
"spotInstances": {
"enabled": true,
"clouds": ["aws"],
"spotDiversityEnabled": true,
"spotDiversityPriceIncreaseLimitPercent": 20
},
"clusterLimits": {
"enabled": true,
"cpu": {
"minCores": 4,
"maxCores": 100
}
}
}'
```
### Step 3: Configure Node Templates via Terraform
```hcl
resource "castai_node_template" "spot_workers" {
cluster_id = castai_eks_cluster.this.id
name = "spot-workers"
is_default = false
is_enabled = true
constraints {
min_cpu = 2
max_cpu = 16
min_memory = 4096
max_memory = 65536
spot = true
use_spot_fallbacks = true
fallback_restore_rate_seconds = 600
instance_families {
include = ["m5", "m6i", "c5", "c6i", "r5", "r6i"]
}
architectures = ["amd64"]
}
custom_labels = {
"workload-type" = "batch"
}
}
resource "castai_node_template" "gpu_ondemand" {
cluster_id = castai_eks_cluster.this.id
name = "gpu-ondemand"
is_default = false
is_enabled = true
constraints {
spot = false
gpu_manufacturers = ["NVIDIA"]
instance_families {
include = ["p3", "p4d", "g4dn", "g5"]
}
}
custom_labels = {
"workload-type" = "gpu"
}
}
```
### Step 4: Verify Autoscaler is Working
```bash
# Check if the autoscaler is processing nodes
curl -s -H "X-API-Key: ${CASTAI_API_KEY}" \
"https://api.cast.ai/v1/kubernetes/external-clusters/${CASTAI_CLUSTER_ID}/nodes" \
| jq '[.items[] | {name, instanceType, lifecycle, castaiManaged: .castaiManaged}]
| group_by(.lifecycle)
| map({lifecycle: .[0].lifecycle, count: length})'
# Expected: mix of spot and on-demand nodes
```
## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| Policy update returns 400 | Invalid policy JSON | Validate with `jq` before sending |
| Nodes not scaling | Policy not enabled | Verify `.enabled: true` in policy |
| Spot instances not used | Provider not configured | Add cloud provider to `spotInstances.clouds` |
| Evictor too aggressive | Low delay threshold | Increase `emptyNodes.delaySeconds` |
| Cluster limit hit | `maxCores` too low | Increase `clusterLimits.cpu.maxCores` |
## Resources
- [Autoscaler Policies](https://docs.cast.ai/docs/autoscaler-settings)
- [Node Configuration](https://docs.cast.ai/docs/node-configuration)
- [Terraform Node Templates](https://registry.terraform.io/providers/castai/castai/latest/docs/resources/node_template)
## Next Steps
For workload-level autoscaling, see `castai-core-workflow-b`.