使用DuckDB查询Fabric湖仓数据

v20260619

using-duckdb

该技能允许用户使用DuckDB查询OneLake中存储的Delta Lake表和原始文件。它支持两种运行模式：本地命令行或Fabric Notebook环境。适用于数据探索、数据质量验证、Schema发现以及跨Fabric生态系统的即时数据分析。

DuckDB Fabric OneLake Delta Lake 查询数据分析 Python 命令行

获取技能

280 次下载

概览

Using DuckDB with Fabric

Query Delta Lake tables and raw files in OneLake using DuckDB. Works both locally (CLI/Python) and inside Fabric notebooks. Read-only; for writes, use the executing-spark skill.

Two Modes

Mode	Where it runs	Auth	Best for
Local	Developer machine	Azure CLI (`az login`)	Exploration, validation, ad-hoc analysis
In-notebook	Fabric Spark container	`notebookutils.credentials.getToken('storage')`	Combining DuckDB speed with Spark write-back

Local: Prerequisites

DuckDB installed (brew install duckdb on macOS)
Azure CLI authenticated (az login)
Extensions installed: INSTALL delta; INSTALL azure; (one-time)

Local: Querying Delta Tables

WS_ID=$(fab get "Workspace.Workspace" -q "id" | tr -d '"')
LH_ID=$(fab get "Workspace.Workspace/LH.Lakehouse" -q "id" | tr -d '"')

duckdb -c "
LOAD delta; LOAD azure;
CREATE SECRET (TYPE azure, PROVIDER credential_chain, CHAIN 'cli');

SELECT * FROM delta_scan(
  'abfss://${WS_ID}@onelake.dfs.fabric.microsoft.com/${LH_ID}/Tables/schema/table'
) LIMIT 10;
"

The CHAIN 'cli' parameter uses Azure CLI credentials. Without it, DuckDB tries managed identity first (fails on local machines).

Local: Querying Raw Files

BASE="abfss://${WS_ID}@onelake.dfs.fabric.microsoft.com/${LH_ID}/Files"

duckdb -c "
LOAD azure;
CREATE SECRET (TYPE azure, PROVIDER credential_chain, CHAIN 'cli');

SELECT * FROM read_csv('${BASE}/data.csv') LIMIT 10;
SELECT * FROM read_parquet('${BASE}/facts.parquet') LIMIT 10;
SELECT * FROM read_json('${BASE}/events/*.json');
"

Glob patterns (*, **) work for reading multiple files.

In-Notebook: Attaching DuckDB to a Lakehouse

Inside a Fabric notebook, DuckDB can query lakehouse Delta tables directly using a storage token. This approach is faster than Spark SQL for analytical queries on single-node data.

import duckdb
import time

# Get storage token from notebook context
token = notebookutils.credentials.getToken('storage')

# Create DuckDB connection
con = duckdb.connect(f'temp_{time.time_ns()}.duckdb')
con.sql('SET enable_object_cache=true')

# Register OneLake secret
con.sql(f"""
    CREATE OR REPLACE SECRET onelake (
        TYPE AZURE,
        PROVIDER ACCESS_TOKEN,
        ACCESS_TOKEN '{token}'
    )
""")

# Query Delta tables
workspace = "<workspace-id>"
lakehouse = "<lakehouse-name>"
path = f"abfss://{workspace}@onelake.dfs.fabric.microsoft.com/{lakehouse}.Lakehouse/Tables"

df = con.sql(f"""
    SELECT * FROM delta_scan('{path}/schema/table_name') LIMIT 100
""").df()
print(df)

Auto-Discovering Tables

Dynamically find all Delta tables in a lakehouse:

tables = con.sql(f"""
    SELECT DISTINCT split_part(file, '_delta_log', 1) as table_path
    FROM glob('{path}/*/*/*_delta_log/*.json')
""").df()['table_path'].tolist()

for t in tables:
    view_name = t.split('/')[-1]
    con.sql(f"CREATE OR REPLACE VIEW {view_name} AS SELECT * FROM delta_scan('{t}')")
    print(f"Created view: {view_name}")

OneLake Path Format

abfss://<workspace-id>@onelake.dfs.fabric.microsoft.com/<item-id>/Tables/<schema>/<table>
abfss://<workspace-id>@onelake.dfs.fabric.microsoft.com/<item-id>/Files/<path>

Item type	ID source
Lakehouse	`fab get "ws/LH.Lakehouse" -q "id"`
Warehouse	`fab get "ws/WH.Warehouse" -q "id"`
SQL Database	`fab get "ws/DB.SQLDatabase" -q "id"`

Cross-item joins work in a single DuckDB query; use different abfss:// paths.

Common Patterns

For data freshness checks, quality validation, schema discovery, cross-table joins, and row count audits, see references/common-patterns.md.

References

references/common-patterns.md -- Data freshness, quality, schema discovery, cross-joins
references/in-notebook-setup.md -- Full notebook setup with auto-discovery and write-back patterns
DuckDB Azure Extension
DuckDB Delta Extension
djouallah/Fabric_Notebooks_Demo -- Original notebook-attachment approach

信息

Category 数据科学

Name using-duckdb

版本 v20260619

大小 4.03KB

Source data-goblin/power-bi-agentic-development

更新时间 2026-06-20