达特布里克斯快速入门

v20260311

databricks-hello-world

创建最简Databricks集群与笔记本，运行示例任务并检查输出，确认认证、集群、Delta 写入和运行情况，为后续项目打基础。

Databricks ApacheSpark Delta 集群笔记本命令行 SDK 快速入门

获取技能

92 次下载

概览

Databricks Hello World

Overview

Create your first Databricks cluster and notebook to verify setup.

Prerequisites

Completed databricks-install-auth setup
Valid API credentials configured
Workspace access with cluster creation permissions

Instructions

Step 1: Create a Cluster

# Create a small development cluster via CLI
databricks clusters create --json '{
  "cluster_name": "hello-world-cluster",
  "spark_version": "14.3.x-scala2.12",
  "node_type_id": "Standard_DS3_v2",
  "autotermination_minutes": 30,
  "num_workers": 0,
  "spark_conf": {
    "spark.databricks.cluster.profile": "singleNode",
    "spark.master": "local[*]"
  },
  "custom_tags": {
    "ResourceClass": "SingleNode"
  }
}'

Step 2: Create a Notebook

# hello_world.py - upload as notebook
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create notebook content
notebook_content = """
# Databricks Hello World

# COMMAND ----------

# Simple DataFrame operations
data = [("Alice", 28), ("Bob", 35), ("Charlie", 42)]
df = spark.createDataFrame(data, ["name", "age"])
display(df)

# COMMAND ----------

# Delta Lake example
df.write.format("delta").mode("overwrite").save("/tmp/hello_world_delta")

# COMMAND ----------

# Read it back
df_read = spark.read.format("delta").load("/tmp/hello_world_delta")
display(df_read)

# COMMAND ----------

print("Hello from Databricks!")
"""

import base64
w.workspace.import_(
    path="/Users/your-email/hello_world",
    format="SOURCE",
    language="PYTHON",
    content=base64.b64encode(notebook_content.encode()).decode(),
    overwrite=True
)
print("Notebook created!")

Step 3: Run the Notebook

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, RunNow

w = WorkspaceClient()

# Create a one-time run
run = w.jobs.submit(
    run_name="hello-world-run",
    tasks=[
        Task(
            task_key="hello",
            existing_cluster_id="your-cluster-id",
            notebook_task=NotebookTask(
                notebook_path="/Users/your-email/hello_world"
            )
        )
    ]
)

# Wait for completion
result = w.jobs.get_run(run.response.run_id).result()
print(f"Run completed with state: {result.state.result_state}")

Step 4: Verify with CLI

# List clusters
databricks clusters list

# Get cluster status
databricks clusters get --cluster-id your-cluster-id

# List workspace contents
databricks workspace list /Users/your-email/

# Get run output
databricks runs get-output --run-id your-run-id

Output

Development cluster created and running
Hello world notebook created in workspace
Successful notebook execution
Delta table created at /tmp/hello_world_delta

Error Handling

Error	Cause	Solution
`Cluster quota exceeded`	Workspace limits	Terminate unused clusters
`Invalid node type`	Wrong instance type	Check available node types
`Notebook path exists`	Duplicate path	Use `overwrite=True`
`Cluster pending`	Startup in progress	Wait for `RUNNING` state
`Permission denied`	Insufficient privileges	Request workspace admin access

Examples

Interactive Cluster (Cost-Effective Dev)

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.compute import ClusterSpec

w = WorkspaceClient()

# Create single-node cluster for development
cluster = w.clusters.create_and_wait(
    cluster_name="dev-cluster",
    spark_version="14.3.x-scala2.12",
    node_type_id="Standard_DS3_v2",
    num_workers=0,
    autotermination_minutes=30,
    spark_conf={
        "spark.databricks.cluster.profile": "singleNode",
        "spark.master": "local[*]"
    }
)
print(f"Cluster created: {cluster.cluster_id}")

SQL Warehouse (Serverless)

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Create SQL warehouse for queries
warehouse = w.warehouses.create_and_wait(
    name="hello-warehouse",
    cluster_size="2X-Small",
    auto_stop_mins=15,
    warehouse_type="PRO",
    enable_serverless_compute=True
)
print(f"Warehouse created: {warehouse.id}")

Quick DataFrame Test

# Run in notebook or Databricks Connect
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

# Create sample data
df = spark.range(1000).toDF("id")  # 1000: 1 second in ms
df = df.withColumn("value", df.id * 2)

# Show results
df.show(5)
print(f"Row count: {df.count()}")

Resources

Next Steps

Proceed to databricks-local-dev-loop for local development setup.

信息

Category 编程开发

Name databricks-hello-world

版本 v20260311

大小 5.38KB

Source jeremylongshore/claude-code-plugins-plus-skills

更新时间 2026-03-12