A comprehensive Python-based MCP (Model Context Protocol) server for interacting with Microsoft Fabric APIs, featuring advanced PySpark notebook development, testing, and optimization capabilities with LLM integration.
- β Workspace, lakehouse, warehouse, and table management
- β Delta table schemas and metadata retrieval
- β SQL query execution and data loading
- β Report and semantic model operations
- π Intelligent notebook creation with 6 specialized templates
- π§ Smart code generation for common PySpark operations
- β Comprehensive validation with syntax and best practices checking
- π― Fabric-specific optimizations and compatibility checks
- π Performance analysis with scoring and optimization recommendations
- π Real-time monitoring and execution insights
- π€ Natural language interface for PySpark development
- π§ Context-aware assistance with conversation memory
- π¨ Intelligent code formatting and explanations
- π Smart optimization suggestions based on project patterns
graph TB
subgraph "Developer Environment"
IDE[IDE/VSCode]
DEV[Developer]
PROJ[Project Files]
end
subgraph "AI Layer"
LLM[Large Language Model
Claude/GPT/etc.]
CONTEXT[Conversation Context]
REASONING[AI Reasoning Engine]
end
subgraph "MCP Layer"
MCP[MCP Server]
TOOLS[PySpark Tools]
HELPERS[PySpark Helpers]
TEMPLATES[Template Manager]
VALIDATORS[Code Validators]
GENERATORS[Code Generators]
end
subgraph "Microsoft Fabric"
API[Fabric API]
WS[Workspace]
LH[Lakehouse]
NB[Notebooks]
TABLES[Delta Tables]
SPARK[Spark Clusters]
end
subgraph "Operations Flow"
CREATE[Create Notebooks]
VALIDATE[Validate Code]
GENERATE[Generate Code]
ANALYZE[Analyze Performance]
DEPLOY[Deploy to Fabric]
end
%% Developer interactions
DEV --> IDE
IDE --> PROJ
%% LLM interactions
IDE <--> LLM
LLM <--> CONTEXT
LLM --> REASONING
%% MCP interactions
LLM <--> MCP
MCP --> TOOLS
TOOLS --> HELPERS
TOOLS --> TEMPLATES
TOOLS --> VALIDATORS
TOOLS --> GENERATORS
%% Fabric interactions
MCP <--> API
API --> WS
WS --> LH
WS --> NB
LH --> TABLES
NB --> SPARK
%% Operation flows
TOOLS --> CREATE
TOOLS --> VALIDATE
TOOLS --> GENERATE
TOOLS --> ANALYZE
CREATE --> DEPLOY
%% Data flow arrows
REASONING -.->|"Intelligent Decisions"| TOOLS
CONTEXT -.->|"Project Awareness"| VALIDATORS
%% Styling
classDef devEnv fill:#e1f5fe
classDef aiLayer fill:#fff9c4
classDef mcpLayer fill:#f3e5f5
classDef fabricLayer fill:#e8f5e8
classDef operations fill:#fff3e0
class IDE,DEV,PROJ devEnv
class LLM,CONTEXT,REASONING aiLayer
class MCP,TOOLS,HELPERS,TEMPLATES,VALIDATORS,GENERATORS mcpLayer
class API,WS,LH,NB,TABLES,SPARK fabricLayer
class CREATE,VALIDATE,GENERATE,ANALYZE,DEPLOY operations
- Developer requests assistance in IDE
- IDE communicates with LLM (Claude/GPT)
- LLM analyzes using context and reasoning
- LLM calls MCP server tools intelligently
- MCP tools interact with Fabric API
- Results flow back through LLM with intelligent formatting
- Developer receives contextual, smart responses
- Python 3.12+
- Azure credentials for authentication
- uv (from astral): Installation instructions
- Azure CLI: Installation instructions
- Optional: Node.js for MCP inspector: Installation instructions
-
Clone the repository:
git clone https://github.com/your-repo/fabric-mcp.git cd fabric-mcp
-
Set up virtual environment:
uv sync
-
Install dependencies:
pip install -r requirements.txt
- Using STDIO
az login --scope https://api.fabric.microsoft.com/.default
uv run --with mcp mcp dev fabric_mcp.py
This starts the server with inspector at http://localhost:6274
.
Add to your launch.json
:
{
"mcp": {
"servers": {
"ms-fabric-mcp": {
"type": "stdio",
"command": "\\.venv\\Scripts\\python.exe" ,
"args": ["\\fabric_mcp.py" ]
}
}
}
}
- Using HTTP
uv run python .\fabric_mcp.py --port 8081
Add to your launch.json
:
{
"mcp": {
"servers": {
"ms-fabric-mcp": {
"type": "http",
"url": "http://:8081/mcp/" ,
"headers": {
"Accept": "application/json,text/event-stream",
}
}
}
}
}
List all available Fabric workspaces.
# Usage in LLM: "List all my Fabric workspaces"
Set the current workspace context for the session.
set_workspace(workspace="Analytics-Workspace")
List all lakehouses in a workspace.
list_lakehouses(workspace="Analytics-Workspace")
Create a new lakehouse.
create_lakehouse(
name="Sales-Data-Lake",
workspace="Analytics-Workspace",
description="Sales data lakehouse"
)
Set current lakehouse context.
set_lakehouse(lakehouse="Sales-Data-Lake")
List all warehouses in a workspace.
list_warehouses(workspace="Analytics-Workspace")
Create a new warehouse.
create_warehouse(
name="Sales-DW",
workspace="Analytics-Workspace",
description="Sales data warehouse"
)
Set current warehouse context.
set_warehouse(warehouse="Sales-DW")
List all tables in a lakehouse.
list_tables(workspace="Analytics-Workspace", lakehouse="Sales-Data-Lake")
Get schema for a specific table.
get_lakehouse_table_schema(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake",
table_name="transactions"
)
Get schemas for all tables in a lakehouse.
get_all_lakehouse_schemas(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake"
)
Set current table context.
set_table(table_name="transactions")
Get SQL endpoint for lakehouse or warehouse.
get_sql_endpoint(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake",
type="lakehouse"
)
Execute SQL queries.
run_query(
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake",
query="SELECT COUNT(*) FROM transactions",
type="lakehouse"
)
Load data from URL into tables.
load_data_from_url(
url="https://example.com/data.csv",
destination_table="new_data",
workspace="Analytics-Workspace",
lakehouse="Sales-Data-Lake"
)
List all reports in a workspace.
list_reports(workspace="Analytics-Workspace")
Get specific report details.
get_report(workspace="Analytics-Workspace", report_id="report-id")
List semantic models in workspace.
list_semantic_models(workspace="Analytics-Workspace")
Get specific semantic model.
get_semantic_model(workspace="Analytics-Workspace", model_id="model-id")
List all notebooks in a workspace.
list_notebooks(workspace="Analytics-Workspace")
Retrieve notebook content.
get_notebook_content(
workspace="Analytics-Workspace",
notebook_id="notebook-id"
)
Update specific notebook cells.
update_notebook_cell(
workspace="Analytics-Workspace",
notebook_id="notebook-id",
cell_index=0,
cell_content="print('Hello, Fabric!')",
cell_type="code"
)
Create notebooks from basic templates.
create_pyspark_notebook(
workspace="Analytics-Workspace",
notebook_name="Data-Analysis",
template_type="analytics" # Options: basic, etl, analytics, ml
)
Create Fabric-optimized notebooks.
create_fabric_notebook(
workspace="Analytics-Workspace",
notebook_name="Fabric-Pipeline",
template_type="fabric_integration" # Options: fabric_integration, streaming
)
Generate code for common operations.
generate_pyspark_code(
operation="read_table",
source_table="sales.transactions",
columns="id,amount,date"
)
# Available operations:
# - read_table, write_table, transform, join, aggregate
# - schema_inference, data_quality, performance_optimization
Generate Fabric-specific code.
generate_fabric_code(
operation="read_lakehouse",
lakehouse_name="Sales-Data-Lake",
table_name="transactions"
)
# Available operations:
# - read_lakehouse, write_lakehouse, merge_delta, performance_monitor
Validate PySpark code syntax and best practices.
validate_pyspark_code(code="""
df = spark.table('transactions')
df.show()
""")
Validate Fabric compatibility.
validate_fabric_code(code="""
df = spark.table('lakehouse.transactions')
df.write.format('delta').saveAsTable('summary')
""")
Comprehensive performance analysis.
analyze_notebook_performance(
workspace="Analytics-Workspace",
notebook_id="notebook-id"
)
Clear current session context.
clear_context()
- basic: Fundamental PySpark operations and DataFrame usage
- etl: Complete ETL pipeline with data cleaning and Delta Lake
- analytics: Advanced analytics with aggregations and window functions
- ml: Machine learning pipeline with MLlib and feature engineering
- fabric_integration: Lakehouse connectivity and Fabric-specific utilities
- streaming: Real-time processing with Structured Streaming
# β
Use managed tables
df = spark.table("lakehouse.my_table")
# β
Use Delta Lake format
df.write.format("delta").mode("overwrite").saveAsTable("my_table")
# β
Leverage notebookutils
import notebookutils as nbu
workspace_id = nbu.runtime.context.workspaceId
# β
Cache frequently used DataFrames
df.cache()
# β
Use broadcast for small tables
from pyspark.sql.functions import broadcast
result = large_df.join(broadcast(small_df), "key")
# β
Partition large datasets
df.write.partitionBy("year", "month").saveAsTable("partitioned_table")
# β
Define explicit schemas
schema = StructType([
StructField("id", IntegerType(), True),
StructField("name", StringType(), True)
])
# β
Handle null values
df.filter(col("column").isNotNull())
Human: "Create a PySpark notebook that reads sales data, cleans it, and optimizes performance"
LLM Response:
1. Creates Fabric-optimized notebook with ETL template
2. Generates lakehouse reading code
3. Adds data cleaning transformations
4. Includes performance optimization patterns
5. Validates code for best practices
Human: "My PySpark notebook is slow. Help me optimize it."
LLM Response:
1. Analyzes notebook performance (scoring 0-100)
2. Identifies anti-patterns and bottlenecks
3. Suggests specific optimizations
4. Generates optimized code alternatives
5. Provides before/after comparisons
- Authentication: Ensure
az login
with correct scope - Context: Use
clear_context()
to reset session state - Workspace: Verify workspace names and permissions
- Templates: Check available template types in documentation
- Use validation tools for code issues
- Check performance analysis for optimization opportunities
- Leverage LLM natural language interface for guidance
The analysis tools provide:
- Operation counts per notebook cell
- Performance issues detection and flagging
- Optimization opportunities identification
- Scoring system (0-100) for code quality
- Fabric compatibility assessment
This project welcomes contributions! Please see our contributing guidelines for details.
This project is licensed under the MIT License. See the LICENSE file for details.
Inspired by: https://github.com/Augustab/microsoft_fabric_mcp/tree/main
Ready to supercharge your Microsoft Fabric development with intelligent PySpark assistance! π