技能 数据科学 Foundry数据治理与安全处理

Foundry数据治理与安全处理

v20260423
palantir-data-handling
本技能指导如何在Palantir Foundry环境中处理敏感数据,确保数据合规和安全性。内容涵盖了PII(个人身份信息)脱敏、应用数据标记(Markings)实现分级权限控制,并确保符合GDPR、CCPA等隐私法规。用户可以学习设置数据保留策略、实现列级安全以及构建可审计的删除流程。
获取技能
337 次下载
概览

Palantir Data Handling

Overview

Handle sensitive data in Foundry using markings (data classifications), column-level security, PII redaction in transforms, and GDPR/CCPA deletion workflows.

Prerequisites

  • Foundry enrollment with Markings enabled
  • Understanding of your organization's data classification policy
  • Familiarity with transforms (palantir-core-workflow-a)

Instructions

Step 1: Data Classification with Markings

Foundry Markings control who can access data at the dataset, column, or row level.

Marking Access Use Case
PUBLIC All users Aggregated reports, reference data
INTERNAL Employees only Business metrics, operational data
CONFIDENTIAL Specific groups Customer PII, financial data
RESTRICTED Named individuals Compensation, legal, M&A

Step 2: PII Redaction in Transforms

from transforms.api import transform_df, Input, Output
from pyspark.sql import functions as F

@transform_df(
    Output("/Company/datasets/customers_safe"),
    customers=Input("/Company/datasets/raw_customers"),
)
def redact_pii(customers):
    """Create an analytics-safe view with PII removed."""
    return (
        customers
        .withColumn("email", F.sha2(F.col("email"), 256))           # Hash email
        .withColumn("phone", F.lit("***-***-****"))                   # Mask phone
        .withColumn("ssn", F.lit(None).cast("string"))                # Remove SSN
        .withColumn("name", F.concat(
            F.substring("first_name", 1, 1), F.lit("***")            # First initial only
        ))
        .drop("first_name", "last_name", "address", "date_of_birth")
    )

Step 3: GDPR Right to Erasure

def delete_user_data(client, user_id: str):
    """GDPR Article 17: delete all data for a specific user."""
    datasets_with_pii = [
        "/Company/datasets/raw_customers",
        "/Company/datasets/raw_orders",
        "/Company/datasets/customer_communications",
    ]
    for dataset_path in datasets_with_pii:
        # Trigger a transform that filters out the user
        client.ontologies.Action.apply(
            ontology="my-company",
            action_type="gdprDeleteUser",
            parameters={"userId": user_id, "datasetPath": dataset_path},
        )
    # Log the deletion for compliance
    client.ontologies.Action.apply(
        ontology="my-company",
        action_type="logDeletionRequest",
        parameters={
            "userId": user_id,
            "requestedAt": datetime.utcnow().isoformat(),
            "status": "completed",
        },
    )

Step 4: Column-Level Security in Ontology

# Define object type with restricted properties
# In Ontology Manager:
# - fullName: marking = CONFIDENTIAL
# - email: marking = CONFIDENTIAL  
# - department: marking = INTERNAL
# - employeeId: marking = INTERNAL

# Users without CONFIDENTIAL marking see:
# employeeId, department (but NOT fullName, email)

Step 5: Data Retention Policy

@transform_df(
    Output("/Company/datasets/events_retained"),
    events=Input("/Company/datasets/raw_events"),
)
def apply_retention(events):
    """Keep only last 2 years of data per retention policy."""
    from pyspark.sql import functions as F
    from datetime import datetime, timedelta

    cutoff = (datetime.utcnow() - timedelta(days=730)).strftime("%Y-%m-%d")
    return events.filter(F.col("event_date") >= cutoff)

Output

  • PII-redacted datasets safe for analytics
  • GDPR deletion workflow with audit trail
  • Column-level security via Foundry Markings
  • Automated data retention enforcement

Error Handling

Compliance Risk Detection Mitigation
PII in analytics dataset Column scan Apply redaction transform
Stale data beyond retention Date filter Schedule retention transforms
Missing deletion audit Log review Always log GDPR actions
Over-permissive markings Access audit Review marking assignments quarterly

Resources

Next Steps

For access control, see palantir-enterprise-rbac.

信息
Category 数据科学
Name palantir-data-handling
版本 v20260423
大小 4.81KB
更新时间 2026-04-28
语言