技能 数据科学 派兰蒂尔基金会最佳实践架构

派兰蒂尔基金会最佳实践架构

v20260423
palantir-reference-architecture
本指南提供了一套全面的Palantir Foundry参考架构,用于指导构建生产级的企业级数据应用。它详细涵盖了从原始数据摄取、数据清洗、模型构建到最终本体模型(Ontology)的完整数据流转过程,并提供了项目布局、外部API集成和多层安全机制的最佳实践,适用于规划和优化复杂的数据基础设施。
获取技能
199 次下载
概览

Palantir Reference Architecture

Overview

Production-ready architecture for Foundry-integrated applications. Covers the standard data pipeline pattern (ingest > clean > model > serve), Ontology design, external API integration, and multi-repo project layout.

Prerequisites

  • Foundry enrollment with project access
  • Understanding of Ontology concepts (object types, link types, actions)
  • Familiarity with palantir-core-workflow-a (transforms) and palantir-core-workflow-b (Ontology)

Instructions

Step 1: Data Pipeline Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐     ┌───────────┐
│  Raw Layer   │────>│  Clean Layer │────>│ Model Layer │────>│ Ontology  │
│ (ingested)   │     │  (validated) │     │ (enriched)  │     │ (objects) │
└─────────────┘     └──────────────┘     └─────────────┘     └───────────┘
  ↑ Connectors        @transform_df       @transform_df       Object types
  ↑ REST sync          null checks         joins, aggs         Link types
  ↑ File upload        type casting        ML features         Actions

Step 2: Project Layout (Foundry)

Foundry Project: "Customer Analytics"
├── Datasets/
│   ├── raw/                    # Ingested from sources
│   │   ├── raw_orders          # REST connector → CRM
│   │   ├── raw_customers       # JDBC connector → DB
│   │   └── raw_products        # File upload (CSV/Parquet)
│   ├── clean/                  # Validated, typed
│   │   ├── clean_orders        # Nulls removed, dates parsed
│   │   ├── clean_customers     # Deduped, normalized
│   │   └── clean_products      # Schema enforced
│   └── model/                  # Enriched, analytics-ready
│       ├── order_enriched      # Joined with customer + product
│       ├── customer_360        # Aggregated customer view
│       └── daily_summary       # Time-series aggregation
├── Code Repositories/
│   ├── pipeline-ingestion/     # Connectors and raw → clean
│   ├── pipeline-analytics/     # Clean → model transforms
│   └── ontology-actions/       # Action implementations
└── Ontology/
    ├── Object Types: Customer, Order, Product
    ├── Link Types: Customer→Orders, Order→Products
    └── Actions: createOrder, updateCustomerSegment

Step 3: External API Integration Pattern

# External app consuming Foundry Ontology via Platform SDK
my-external-app/
├── src/
│   ├── foundry/
│   │   ├── client.py           # Singleton FoundryClient
│   │   ├── objects.py          # Object query helpers
│   │   ├── actions.py          # Action wrappers
│   │   └── cache.py            # TTL cache layer
│   ├── api/
│   │   ├── routes.py           # REST endpoints
│   │   └── webhooks.py         # Foundry event handlers
│   └── main.py
├── tests/
│   ├── conftest.py             # Mocked FoundryClient
│   ├── test_objects.py
│   └── test_actions.py
├── .env                        # FOUNDRY_HOSTNAME, credentials
└── requirements.txt

Step 4: Ontology Design Patterns

Pattern When to Use Example
Hub-and-spoke Central entity with many relationships Customer → Orders, Tickets, Payments
Event sourcing Audit trail needed OrderEvent (created, shipped, delivered)
Computed properties Derived values totalRevenue on Customer (sum of orders)
Composite actions Multi-step mutations processReturn: update order + create credit + notify

Step 5: Security Layers

┌──────────────────────────────────────────┐
│ Layer 1: Network (VPN/private link)       │
├──────────────────────────────────────────┤
│ Layer 2: OAuth2 (service user per app)    │
├──────────────────────────────────────────┤
│ Layer 3: Scopes (minimum per app)         │
├──────────────────────────────────────────┤
│ Layer 4: Project roles (Viewer/Editor)    │
├──────────────────────────────────────────┤
│ Layer 5: Marking (data classification)    │
└──────────────────────────────────────────┘

Output

  • Standard 3-layer data pipeline (raw > clean > model)
  • Ontology design with typed objects, links, and actions
  • External app architecture with caching and webhooks
  • Security model with 5 defense layers

Error Handling

Architecture Issue Symptom Fix
Circular dependencies Builds fail Restructure pipeline DAG
Missing clean layer Bad data in model Always validate between raw and model
Monolithic transforms Slow builds Split into focused transforms
No caching API rate limits Add TTL cache layer

Resources

Next Steps

For data handling and compliance, see palantir-data-handling.

信息
Category 数据科学
Name palantir-reference-architecture
版本 v20260423
大小 6.53KB
更新时间 2026-04-28
语言