Comprehensive migration strategies for moving to Databricks from Hadoop, Snowflake, Redshift, Synapse, or legacy data warehouses.
| Source | Pattern | Complexity | Timeline |
|---|---|---|---|
| On-prem Hadoop | Lift-and-shift + modernize | High | 6-12 months |
| Snowflake | Parallel run + cutover | Medium | 3-6 months |
| AWS Redshift | ETL rewrite + data copy | Medium | 3-6 months |
| Legacy DW (Oracle/Teradata) | Full rebuild | High | 12-18 months |
Inventory all source tables with metadata (size, partitions, dependencies, data classification). Generate prioritized migration plan with wave assignments.
Convert source schemas to Delta Lake compatible types. Handle type conversions (char->string, tinyint->int). Enable auto-optimize on target tables.
Batch large tables by partition. Validate row counts and schema match after each table migration.
Convert spark-submit/Oozie jobs to Databricks jobs. Update paths, remove Hive metastore references, adapt for Unity Catalog.
Execute 6-step cutover: validate -> disable source -> final sync -> enable Databricks -> update apps -> monitor. Each step has rollback procedure.
See detailed implementation for assessment scripts, schema conversion, data migration with batching, ETL conversion, and cutover plan generation.
| Error | Cause | Solution |
|---|---|---|
| Schema incompatibility | Unsupported types | Use type conversion mappings |
| Data loss | Truncation during migration | Validate counts at each step |
| Performance issues | Large tables | Use partitioned migration |
| Dependency conflicts | Wrong migration order | Analyze dependencies first |
SELECT 'source' as system, COUNT(*) FROM hive_metastore.db.table
UNION ALL SELECT 'target' as system, COUNT(*) FROM migrated.db.table;
Provides coverage for Databricks platform migrations.