Download

Skill UI

Browse and discover 10318+ curated skills

All Development Artificial Intelligence Design & Creative Product & Business Data Science Marketing Soft Skills Productivity Engineering Languages

Search Big Data , found 24 results

Default Newest Most Downloaded

Azure Data Lake Storage Python SDK

azure-storage-file-datalake-py

sickn33/antigravity-awesome-skills

This SDK provides comprehensive Python support for Azure Data Lake Storage Gen2. It enables developers to interact with hierarchical file systems, performing essential big data operations such as creating, listing, uploading, downloading, and managing metadata for files and directories. Ideal for building data analytics pipelines and cloud-based data processing workflows.

Rube Big Data Automation

big-data-cloud-automation

ComposioHQ/awesome-claude-skills

Automates Big Data Cloud operations through Composio’s toolkit by routing discovery, connection checks, and multi-tool execution via Rube MCP commands so you always work with current schemas before running workflows.

BigQuery Scheduled Query

bigquery-scheduled-query

jeremylongshore/claude-code-plugins-plus-skills

Automates guidance for setting up and optimizing BigQuery scheduled queries within GCP, offering best practices, configuration templates, and validation for production-ready data workflows.

BigQuery Metabase Automation

googlebigquery-automation

ComposioHQ/awesome-claude-skills

Automates Google BigQuery workflows via Rube MCP and Metabase, letting you run native SQL/MBQL queries, fetch metadata, and explore schemas before querying.

Granola Meeting Analytics and Observability

granola-observability

jeremylongshore/claude-code-plugins-plus-skills

This skill provides comprehensive guidelines for monitoring Granola usage, tracking team meeting patterns, and building detailed analytics dashboards. It guides users in defining critical metrics (e.g., adoption rate, efficiency score) and establishing robust data pipelines. Users can stream meeting metadata from Granola to data warehouses like BigQuery, enabling deep, customizable reporting and automated weekly insights via services like Zapier and Slack.

MaintainX Data Sync and ETL Patterns

maintainx-data-handling

jeremylongshore/claude-code-plugins-plus-skills

Provides comprehensive patterns for data synchronization, ETL processes, and data migration specifically for MaintainX. Use this when needing to keep data consistent between MaintainX and external databases, data warehouses (like BigQuery), or when performing bulk data exports to formats like CSV. It covers incremental syncing, full exports, and data reconciliation checks.

OpenEvidence Migration Deep Dive Guide

openevidence-migration-deep-dive

jeremylongshore/claude-code-plugins-plus-skills

This deep dive guide provides comprehensive strategies and detailed checklists for migrating to the OpenEvidence platform. It covers various migration methods, such as parallel run, Strangler fig, and Big bang approaches. The guide assists technical teams in planning the transition by addressing key areas including API mapping, data migration plans, rollback procedures, and performance baselines, ensuring a secure and smooth transition for healthcare IT systems.

Hybrid Parsing: Regex vs LLM Decision Framework

regex-vs-llm-structured-text

affaan-m/everything-claude-code

A practical decision framework for determining whether to use regular expressions or Large Language Models (LLMs) when parsing structured text (e.g., forms, quizzes, invoices). It recommends starting with deterministic, low-cost regex for 95-98% of structured data, and only invoking expensive LLM calls for identifying and validating ambiguous edge cases, optimizing both cost and accuracy.

Enterprise Scala Development Expert

sickn33/antigravity-awesome-skills

A comprehensive skill set for mastering enterprise-grade Scala development. Expertise covers functional programming, distributed systems, and big data processing using technologies like Pekko, Akka, Spark, ZIO, and Cats Effect. Ideal for designing and implementing resilient, scalable microservices based on Domain-Driven Design principles.

Schema Optimization Workflow Orchestrator

schema-optimization-orchestrator

jeremylongshore/claude-code-plugins-plus-skills

An advanced workflow orchestrator designed to perform multi-phase, deep-dive schema optimization on large datasets (e.g., BigQuery exports). It systematically analyzes schema structure, identifies unused or redundant fields, assesses potential impact, and generates a comprehensive, actionable report with recommended optimization strategies and estimated savings.

Spark Performance Engineer

Jeffallan/claude-skills

Guides building, tuning, and validating production Apache Spark jobs, covering DataFrame/RDD choices, partitioning, broadcast joins, skew handling, caching, and cluster configuration for big data pipelines.

Vaex Big Data

K-Dense-AI/claude-scientific-skills

Vaex is a high-performance Python library for lazy out-of-core DataFrames that process billions of rows without RAM. Use it to analyze massive CSV/HDF5/Arrow/Parquet tables, compute fast statistics, visualize big data, and build scikit-learn pipelines.

Language