apache/superset

MCP Service - LLM Agent Guide

View on GitHub ↗Yours? Claim it ↗

§ 01 — Stats

Stars72.8k

Forks17.2k

Prior1429

Quality—

Score—

Tasks—

§ 02 — Use

Drop into your project.

A CLAUDE.md is just a markdown file at the root of your repo. Copy the content below into your own project's CLAUDE.md to give your agent the same context.

One-line install · current directory

$npx versuz@latest install apache-superset --kind=claude-md

Or curl directly

$curl -o CLAUDE.md https://raw.githubusercontent.com/apache/superset/HEAD/CLAUDE.md

Project typenextjs

Tokens

Embed badge

Show

Style

[![Versuz · apache/superset](https://versuz.dev/badge/claude-md/apache-superset)](https://versuz.dev/claude-md/apache-superset)

Show CLAUDE.md content (~6.3k tokens)

# MCP Service - LLM Agent Guide

This guide helps LLM agents understand the Superset MCP (Model Context Protocol) service architecture and development conventions.

## CRITICAL: Apache License Headers

**EVERY Python file in the MCP service MUST have the Apache Software Foundation license header.**

This includes:
- All `.py` files (tool files, schemas, __init__.py files, etc.)
- **NEVER remove existing license headers during refactoring or edits**
- **ALWAYS add license headers when creating new files**
- **ALWAYS verify license headers are present after editing files**

If you see a file without a license header, ADD IT IMMEDIATELY. If you accidentally remove one during editing, ADD IT BACK.

Use this exact template at the top of EVERY Python file:

```python
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
```

**Note**: LLM instruction files like `CLAUDE.md`, `AGENTS.md`, etc. are excluded from this requirement (listed in `.rat-excludes`) to avoid token overhead, but ALL other Python files require it.

## Architecture Overview

The MCP service provides programmatic access to Superset via the Model Context Protocol, allowing AI assistants to interact with dashboards, charts, datasets, databases, SQL Lab, and instance metadata.

### Key Components

```
superset/mcp_service/
├── app.py                      # FastMCP app factory and tool registration
├── auth.py                     # Authentication, authorization, and RBAC
├── mcp_config.py              # Default configuration
├── mcp_core.py                # Reusable core classes for tools
├── flask_singleton.py         # Flask app singleton for MCP context
├── middleware.py              # FastMCP middleware (logging, errors, size guards)
├── server.py                  # Server startup (streamable-http, multi-pod)
├── jwt_verifier.py            # JWT token validation
├── chart/                     # Chart tools, schemas, prompts, resources
│   ├── schemas.py
│   ├── chart_utils.py
│   ├── preview_utils.py
│   ├── validation.py
│   ├── tool/
│   ├── prompts/
│   └── resources/
├── dashboard/                 # Dashboard tools and schemas
│   ├── schemas.py
│   └── tool/
├── dataset/                   # Dataset tools and schemas
│   ├── schemas.py
│   └── tool/
├── explore/                   # Explore link generation
│   ├── schemas.py
│   └── tool/
├── sql_lab/                   # SQL Lab tools (execute, save, open)
│   ├── schemas.py
│   └── tool/
├── system/                    # System tools (health, instance info, schema)
│   ├── schemas.py
│   ├── tool/
│   ├── prompts/
│   └── resources/
├── common/                    # Shared error schemas
├── commands/                  # MCP-specific command classes
└── utils/                     # Utilities (URL, schema parsing, error builders)
```

### Dependency Injection Architecture

The `@tool` and `@prompt` decorators are defined as stubs in the `superset-core` package (`superset_core.mcp.decorators`). At startup, `app.py` calls `initialize_core_mcp_dependencies()` which replaces these stubs with concrete implementations that register tools/prompts with the FastMCP instance. This avoids circular imports between `superset_core` and `superset`.

**Startup flow**:
1. `app.py` creates the FastMCP `mcp` instance
2. `initialize_core_mcp_dependencies()` injects the real decorator implementations
3. Tool/prompt/resource imports at the bottom of `app.py` trigger registration
4. `server.py` adds middleware and starts the transport

## Critical Convention: Tool, Prompt, and Resource Registration

**IMPORTANT**: When creating new MCP tools, prompts, or resources, you MUST add their imports to `app.py` for auto-registration. Do NOT add them to `server.py` - that approach doesn't work properly.

### How to Add a New Tool

1. **Create the tool file** in the appropriate directory (e.g., `chart/tool/my_new_tool.py`)
2. **Decorate with `@tool`** using the decorator from `superset_core.mcp.decorators`
3. **Export from the module's `__init__.py`** (e.g., `chart/tool/__init__.py`)
4. **Add import to `app.py`** at the bottom of the file where other tools are imported

**Example (read-only tool)**:
```python
# superset/mcp_service/chart/tool/my_new_tool.py
from fastmcp import Context
from superset_core.mcp.decorators import tool, ToolAnnotations

from superset.extensions import event_logger

@tool(
    tags=["core"],
    class_permission_name="Chart",
    annotations=ToolAnnotations(
        title="My new tool",
        readOnlyHint=True,
        destructiveHint=False,
    ),
)
async def my_new_tool(request: MyRequest, ctx: Context) -> MyResponse:
    """Tool description for LLMs."""
    await ctx.info("Doing something: param=%s" % (request.param,))
    with event_logger.log_context(action="mcp.my_new_tool"):
        result = do_something()
    return MyResponse(data=result)
```

**Example (mutating tool)**:
```python
@tool(
    tags=["mutate"],
    class_permission_name="Chart",
    method_permission_name="write",
    annotations=ToolAnnotations(
        title="Create something",
        readOnlyHint=False,
        destructiveHint=False,
    ),
)
async def create_something(request: CreateRequest, ctx: Context) -> CreateResponse:
    """Creates a new resource."""
    ...
```

**Then add to app.py**:
```python
# superset/mcp_service/app.py (at the bottom, after initialize_core_mcp_dependencies())
from superset.mcp_service.chart.tool import (  # noqa: F401, E402
    get_chart_info,
    list_charts,
    my_new_tool,  # ADD YOUR TOOL HERE
)
```

**Why this matters**: Tools register automatically on import via the `@tool` decorator. The import MUST be in `app.py` at the bottom (after `initialize_core_mcp_dependencies()` is called). DO NOT add imports to `server.py`.

### How to Add a New Prompt

1. **Create the prompt file** in the appropriate directory (e.g., `chart/prompts/my_new_prompt.py`)
2. **Decorate with `@prompt`** from `superset_core.mcp.decorators`
3. **Add import to module's `__init__.py`** (e.g., `chart/prompts/__init__.py`)
4. **Ensure module is imported in `app.py`**

**Example**:
```python
# superset/mcp_service/chart/prompts/my_new_prompt.py
from superset_core.mcp.decorators import prompt

@prompt("my_new_prompt")
async def my_new_prompt_handler(
    chart_type: str = "auto", business_goal: str = "exploration"
) -> str:
    """Interactive prompt for doing something."""
    return "Prompt instructions here..."
```

### How to Add a New Resource

Resources use direct FastMCP decorators and **must include `@mcp_auth_hook`** for authentication:

```python
# superset/mcp_service/chart/resources/my_new_resource.py
from superset.mcp_service.app import mcp
from superset.mcp_service.auth import mcp_auth_hook  # REQUIRED for resources

@mcp.resource("superset://chart/my_resource")
@mcp_auth_hook  # Always add this decorator to resources
def get_my_resource() -> str:
    """Resource description for LLMs."""
    return "Resource data here..."
```

## Tool Development Patterns

### 1. Tool Decorator Parameters

The `@tool` decorator from `superset_core.mcp.decorators` accepts:

- **`tags`**: List of tags (e.g., `["core"]`, `["mutate"]`). Default: `[]`
- **`class_permission_name`**: FAB permission class (e.g., `"Chart"`, `"Dashboard"`). Default: `None`
- **`method_permission_name`**: Permission action (e.g., `"read"`, `"write"`). Default: Auto — `"write"` if `"mutate"` in tags, else `"read"`
- **`protect`**: Enable authentication wrapping. Default: `True`
- **`annotations`**: MCP `ToolAnnotations` object. Default: `None`

**ToolAnnotations** (from `superset_core.mcp.decorators`):
```python
annotations=ToolAnnotations(
    title="Human-readable title",
    readOnlyHint=True,   # Whether tool only reads data
    destructiveHint=False, # Whether tool has destructive side effects
)
```

### 2. Use Core Classes for Reusability

The `mcp_core.py` module provides reusable patterns:

- **`ModelListCore`**: For listing resources with filtering, search, and pagination
  - Used by: `list_charts`, `list_dashboards`, `list_datasets`, `list_databases`
- **`ModelGetInfoCore`**: For getting resource details by ID, UUID, or slug
  - Used by: `get_chart_info`, `get_dashboard_info`, `get_dataset_info`, `get_database_info`
- **`ModelGetSchemaCore`**: For schema discovery (columns, filters, sortable columns)
  - Used by: `get_schema`
- **`InstanceInfoCore`**: For instance statistics and metadata
  - Used by: `get_instance_info`

### 3. Authentication and RBAC

Authentication is handled automatically by the `@tool` decorator (via `mcp_auth_hook` internally). RBAC permission checking uses `class_permission_name` and `method_permission_name`.

```python
from superset_core.mcp.decorators import tool, ToolAnnotations

# Authentication + RBAC enabled (default)
@tool(
    class_permission_name="Chart",  # Checks user has Chart access
)
async def my_tool(request: MyRequest, ctx: Context) -> MyResponse:
    # g.user is set automatically before this runs
    ...

# Public tool (no auth) - use sparingly
@tool(protect=False)
async def health_check(ctx: Context) -> dict:
    return {"status": "healthy"}
```

**Authentication priority order** (in `auth.py`):
1. JWT context (per-request ContextVar from FastMCP)
2. API Key authentication (via FAB SecurityManager)
3. `MCP_DEV_USERNAME` config (development only)
4. `g.user` fallback (set by external middleware)

**`@mcp_auth_hook`** is only used directly on **resources** — tools get auth wrapping from `@tool(protect=True)`.

### 4. Use Pydantic Schemas

**All tool inputs and outputs must be Pydantic models**. Place schemas in `{module}/schemas.py`.

```python
from pydantic import BaseModel, ConfigDict, Field

class MyToolRequest(BaseModel):
    model_config = ConfigDict(populate_by_name=True)

    param: str = Field(..., description="Parameter description for LLMs")
    optional_param: str | None = Field(None, description="Optional parameter")

class MyToolResponse(BaseModel):
    result: str = Field(..., description="Result description")
    error: str | None = Field(None, description="Error message if failed")
```

### 5. Follow the DAO Pattern

**Use Superset's DAO (Data Access Object) layer** instead of direct database queries:

```python
from superset.daos.dashboard import DashboardDAO

# GOOD: Use DAO
dashboard = DashboardDAO.find_by_id(dashboard_id)

# BAD: Don't query directly
dashboard = db.session.query(Dashboard).filter_by(id=dashboard_id).first()
```

### 6. Python Type Hints (Python 3.10+ Style)

**CRITICAL**: Always use modern Python 3.10+ union syntax for type hints.

```python
# GOOD - Modern Python 3.10+ syntax
from typing import Any

from pydantic import BaseModel, Field

class MySchema(BaseModel):
    name: str | None = Field(None, description="Optional name")
    tags: list[str] = Field(default_factory=list)
    metadata: dict[str, Any] = Field(default_factory=dict)

def my_function(
    id: int,
    filters: list[str] | None = None,
) -> MySchema | None:
    pass

# BAD - Old-style (DO NOT USE)
from typing import Optional, List, Dict
name: Optional[str]  # Wrong! Use str | None
tags: List[str]      # Wrong! Use list[str]
```

### 7. Event Logger Instrumentation

**All tool operations should use `event_logger`** for observability:

```python
from superset.extensions import event_logger

@tool(...)
async def my_tool(request: MyRequest, ctx: Context) -> MyResponse:
    with event_logger.log_context(action="mcp.my_tool.step_name"):
        result = do_something()
    return MyResponse(data=result)
```

### 8. Context Logging

Use the FastMCP `Context` object for structured logging within tools:

```python
async def my_tool(request: MyRequest, ctx: Context) -> MyResponse:
    await ctx.info("Starting: param=%s" % (request.param,))
    await ctx.debug("Details: keys=%s" % (sorted(request.model_dump().keys()),))
    await ctx.warning("Something unexpected: %s" % (warning_msg,))
    await ctx.error("Failed: %s" % (str(exc),))
    await ctx.report_progress(1, 5, "Step 1 of 5")
```

### 9. Error Handling

**Pattern**: Catch specific exceptions for known failure modes, use broad `Exception` only as the outermost safety net that re-raises:

```python
from superset.commands.dataset.exceptions import DatasetInvalidError, DatasetCreateFailedError

@tool(...)
async def my_tool(request: MyRequest, ctx: Context) -> MyResponse:
    try:
        # Specific exception handling for known failure modes
        with event_logger.log_context(action="mcp.my_tool"):
            result = SomeCommand(properties).run()
        return MyResponse(data=result)

    except DatasetInvalidError as exc:
        # Return structured error response (don't raise)
        await ctx.error("Validation failed: %s" % (exc.normalized_messages(),))
        return MyResponse(error=str(exc.normalized_messages()))

    except DatasetCreateFailedError as exc:
        await ctx.error("Creation failed: %s" % (str(exc),))
        return MyResponse(error=f"Failed: {exc}")

    except Exception as exc:
        # Outermost safety net: log and re-raise (middleware handles it)
        await ctx.error("Unexpected: %s: %s" % (type(exc).__name__, str(exc)))
        raise
```

### 10. Dataset Validation for Chart Tools

All chart-related tools must validate that the chart's dataset is accessible:

```python
from superset.mcp_service.chart.chart_utils import validate_chart_dataset

validation_result = validate_chart_dataset(chart, check_access=True)
if not validation_result.is_valid:
    await ctx.warning("Dataset not accessible: %s" % (validation_result.error,))
    return ChartError(
        error=validation_result.error or "Chart's dataset is not accessible",
        error_type="DatasetNotAccessible",
    )
```

Used by: `get_chart_info`, `get_chart_preview`, `get_chart_data`, `generate_chart`

### 11. Compile Check for Chart Creation

When creating, saving, or previewing charts, run schema validation (Tier 1)
and optionally a compile check (Tier 2) before persisting or caching.
``validate_and_compile`` glues both together; tools with tight SLAs
(``generate_explore_link``, ``update_chart_preview``) opt out of Tier 2.

```python
from superset.mcp_service.chart.compile import validate_and_compile

result = validate_and_compile(
    config, form_data, dataset, run_compile_check=True
)
if not result.success:
    # ``result.error_obj`` is a ``ChartGenerationError`` with fuzzy-match
    # suggestions ("did you mean sum_boys?") so the LLM can self-correct.
    ...
```

The lower-level ``_compile_chart(form_data, dataset_id)`` is still exported
for callers that have already done their own schema validation.

### 12. Flexible Input Parsing

`ModelListCore` handles JSON string vs. native object parsing automatically via utilities in `superset.mcp_service.utils.schema_utils`:

- `parse_json_or_passthrough(value, param_name)` - JSON string or dict
- `parse_json_or_list(value, param_name)` - JSON array, list, or comma-separated string
- `parse_json_or_model(value, model_class, param_name)` - JSON string or dict to Pydantic model
- `parse_json_or_model_list(value, model_class, param_name)` - JSON array to list of Pydantic models

These are used internally by `ModelListCore` for `filters` and `select_columns`. Individual tools using core classes do NOT need to add parsing logic.

## Middleware

The MCP service uses FastMCP middleware (registered in `server.py`):

- **`LoggingMiddleware`**: Logs tool calls with duration, entity IDs, sanitizes sensitive data
- **`GlobalErrorHandlerMiddleware`**: Catches unhandled exceptions, converts to ToolError
- **`StructuredContentStripperMiddleware`**: Strips structuredContent from responses (Claude.ai compatibility)
- **`ResponseSizeGuardMiddleware`**: Prevents oversized responses from crashing clients
- **`ResponseCachingMiddleware`**: Optional response caching (in-memory by default, Redis when store enabled)

Middleware is applied in `server.py` and should NOT be modified in individual tools.

## Configuration

Default configuration is in `mcp_config.py`. Override in `superset_config.py`:

```python
# Authentication
MCP_DEV_USERNAME = None          # Fallback username for dev mode
MCP_AUTH_ENABLED = False         # Enable JWT/API key auth
MCP_AUTH_FACTORY = None          # Custom auth factory function
MCP_JWT_PUBLIC_KEY = None
MCP_JWT_SECRET = None
MCP_JWKS_URI = None
MCP_USER_RESOLVER = None         # Custom function to extract username from JWT

# RBAC
MCP_RBAC_ENABLED = True          # Enable permission checking (default: True)


# Response Caching (optional, uses in-memory store by default; Redis when MCP_STORE_CONFIG enabled)
MCP_CACHE_CONFIG = {
    "enabled": False,
    "list_tools_ttl": 300,
    "call_tool_ttl": 3600,
    "excluded_tools": ["execute_sql", "generate_dashboard"],  # add tools to exclude
}

# Multi-pod Storage (optional, requires Redis)
MCP_STORE_CONFIG = {
    "enabled": False,
    "CACHE_REDIS_URL": None,
    "event_store_ttl": 3600,
}
```

## Testing Conventions

### Test Organization

Tests mirror the MCP service module structure:
```
tests/unit_tests/mcp_service/
├── conftest.py                    # Global fixtures (disable_mcp_rbac)
├── chart/
│   ├── test_chart_utils.py
│   ├── test_chart_schemas.py
│   └── tool/
│       ├── test_list_charts.py
│       ├── test_generate_chart.py
│       └── ...
├── dashboard/tool/
├── dataset/tool/
├── sql_lab/tool/
├── system/tool/
├── test_auth_*.py                 # Auth/RBAC tests
└── test_middleware*.py            # Middleware tests
```

### Async Tool Tests (primary pattern)

```python
from unittest.mock import MagicMock, patch
import pytest
from fastmcp import Client

from superset.mcp_service.app import mcp
from superset.utils import json

@pytest.fixture
def mcp_server():
    return mcp

@pytest.mark.asyncio
async def test_my_tool_success(mcp_server):
    mock_obj = MagicMock()
    mock_obj.id = 1
    mock_obj.name = "test"

    with patch("superset.daos.chart.ChartDAO.find_by_id", return_value=mock_obj):
        async with Client(mcp_server) as client:
            result = await client.call_tool(
                "my_tool", {"request": {"id": 1}}
            )
            data = json.loads(result.content[0].text)

    assert data["id"] == 1
```

### Key Testing Patterns

- **RBAC is disabled globally** via `conftest.py` autouse fixture (`MCP_RBAC_ENABLED = False`)
- **RBAC tests** are separate in `test_auth_rbac.py` with their own `enable_mcp_rbac` fixture
- **Auth is mocked** via `mock_auth` fixture that patches `get_user_from_request`
- **Mock objects** must have all attributes set explicitly (no auto-generation)
- **Patch at the DAO level**: `patch("superset.daos.chart.ChartDAO.find_by_id", ...)`
- **Schema validation tests** are synchronous (no Client needed)

## Common Pitfalls to Avoid

### 1. Forgetting Tool Import in app.py
**Problem**: Tool exists but isn't available to MCP clients.
**Solution**: Add tool import to `app.py` at the bottom (after `initialize_core_mcp_dependencies()`).

### 2. Adding Tool Imports to server.py
**Problem**: Tools won't register properly.
**Solution**: Tool imports MUST be in `app.py`, not `server.py`.

### 3. Wrong Decorator Import Path
**Problem**: Using stale import path.
**Solution**: Use `from superset_core.mcp.decorators import tool, ToolAnnotations` (NOT `superset_core.api.mcp`).

### 4. Missing ToolAnnotations
**Problem**: Tool lacks MCP directory compliance metadata.
**Solution**: Always include `annotations=ToolAnnotations(title=..., readOnlyHint=..., destructiveHint=...)`.

### 5. Using `Optional` Instead of Union Syntax
**Problem**: Old-style `Optional[T]` is not Python 3.10+ style.
**Solution**: Use `T | None` and `list[str]` instead of `Optional[T]` and `List[str]`.

### 6. Direct Database Queries
**Problem**: Bypasses Superset's security and caching layers.
**Solution**: Use DAO classes (ChartDAO, DashboardDAO, DatasetDAO, DatabaseDAO).

### 7. Not Using Core Classes
**Problem**: Duplicating list/get_info logic across tools.
**Solution**: Use `ModelListCore`, `ModelGetInfoCore`, `ModelGetSchemaCore`.

### 8. Missing Apache License Headers
**Problem**: CI fails on license check.
**Solution**: Add ASF license header to all new `.py` files (see template at top of this doc).

### 9. Circular Imports
**Problem**: Importing from `app.py` in tool files causes circular dependencies.
**Solution**: Use `from superset_core.mcp.decorators import tool` for tools/prompts. Only import `from superset.mcp_service.app import mcp` in resource files.

### 10. Missing event_logger Instrumentation
**Problem**: Tool operations are invisible to observability.
**Solution**: Wrap key operations with `event_logger.log_context(action="mcp.tool_name.step")`.

## Quick Checklist for New Tools

- [ ] Created tool file in `{module}/tool/{tool_name}.py`
- [ ] Added ASF license header
- [ ] Used `@tool(tags=[...], class_permission_name="...", annotations=ToolAnnotations(...))` decorator
- [ ] Import: `from superset_core.mcp.decorators import tool, ToolAnnotations`
- [ ] Created Pydantic request/response schemas in `{module}/schemas.py`
- [ ] Used DAO classes instead of direct queries
- [ ] Added `event_logger.log_context()` instrumentation
- [ ] Used `await ctx.info/error/debug()` for context logging
- [ ] Exported from `{module}/tool/__init__.py`
- [ ] Added tool import to `app.py` at the bottom
- [ ] Created async unit tests in `tests/unit_tests/mcp_service/{module}/tool/`
- [ ] Updated `DEFAULT_INSTRUCTIONS` in `app.py` if adding new capability

## Quick Checklist for New Prompts

- [ ] Created prompt file in `{module}/prompts/{prompt_name}.py`
- [ ] Added ASF license header
- [ ] Used `@prompt("prompt_name")` from `superset_core.mcp.decorators`
- [ ] Made function async: `async def prompt_handler(...) -> str`
- [ ] Added import to `{module}/prompts/__init__.py`
- [ ] Verified module import exists in `app.py`

## Quick Checklist for New Resources

- [ ] Created resource file in `{module}/resources/{resource_name}.py`
- [ ] Added ASF license header
- [ ] Used `@mcp.resource("superset://{path}")` decorator
- [ ] Added `@mcp_auth_hook` decorator
- [ ] Added import to `{module}/resources/__init__.py`
- [ ] Verified module import exists in `app.py`

## Getting Help

- Check existing tool implementations for patterns (chart/tool/, dashboard/tool/)
- Review core classes in `mcp_core.py` for reusable functionality
- See `CLAUDE.md` in project root for general Superset development guidelines
- Consult Superset documentation: https://superset.apache.org/docs/