ETL to ELT Migration Blueprint: Refactor Informatica/SSIS in Delta Live Tables
Summary
Learn a practical blueprint for ETL pipeline development, migrating Informatica and SSIS workflows to Delta Live Tables and Unity Catalog at scale.
Last Updated
Published
Authored By
Technical Director
Modern data teams feel the gap between old ETL tools and the speed the business now expects. ETL pipeline development on Informatica or SSIS once did the job, but tight budgets, AI projects, and stricter regulation are making those platforms feel heavy and slow. This guide walks through how to turn that legacy estate into a Lakehouse advantage with Delta Live Tables, Unity Catalog, and the Medallion architecture on Databricks.
We will look at why ETL is blocking AI, how the Bronze, Silver, Gold Lakehouse pattern lines up with what you already have, and a stepwise blueprint for refactoring. The aim is simple: keep risk low, avoid a big bang rewrite, and end up with pipelines that are easier to run, easier to govern, and ready for production-grade AI.
Turn Legacy ETL Into a Lakehouse Advantage
Old Informatica or SSIS workflows can feel like a sunk cost, especially when H2 planning is coming and budgets are under review. But that estate is also a clear map of what your business really cares about: the feeds, rules, joins, and schedules that keep reporting and operations alive. Instead of throwing it away, we can treat it as the design input for a stronger Lakehouse.
Right now the pressure is rising from three sides:
- Cloud spend is under a microscope
- AI projects need clean, fresh data
- Regulators expect stronger control and audit trails
The Lakehouse Medallion architecture, with Delta Live Tables for ETL pipeline development and Unity Catalog for governance, gives a practical way forward. You can refactor area by area, line up with budget cycles, and avoid risky overnight cutovers that could affect trading or month-end.
Why ETL Is Holding Back Your AI Ambitions
Traditional ETL tools were built for fixed schemas and batch loads into data warehouses. That leads to:
- Rigid mappings that are hard to change
- Long, fragile dependency chains
- Schedulers that struggle with cloud and streaming sources
For AI teams, this is painful. Data lands late, scattered across different servers, with custom business logic buried in XML or GUI jobs. It is hard to trace which rules fed a feature table, or to explain a model prediction when you cannot see clear lineage.
Across EMEA enterprises we often see the same mix of problems:
- Tight vendor lock-in and licence renewals approaching
- Data centre exits pushing timelines for migration
- GDPR and other regulations demanding better access controls and audit logs
- Leadership pressure to show generative AI pilots before the year ends
Old ETL stacks can still move data, but they do not support the kind of flexible, code-driven, AI-ready platform that teams now expect from the cloud.
Designing a Medallion Lakehouse to Replace Legacy ETL
The Medallion pattern breaks the Lakehouse into three clear layers:
- Bronze: raw ingested data, as close as possible to source
- Silver: cleaned, conformed, and joined data for common use
- Gold: curated data products, often aligned to departments or KPIs
This lines up cleanly with the staging, integration, and data mart layers you likely have in Informatica or SSIS. The difference is that each layer is a Delta table in the Lakehouse, with schema evolution, time travel, and efficient storage built in.
Delta Live Tables shifts ETL pipeline development into code or declarative configs. Expectations act as data quality rules, so instead of hand-coded validation flows and complex error branches, you define rules like:
- Columns that must not be null
- Allowed value sets
- Range checks and referential integrity
Unity Catalog then becomes the single place for governance. It gives:
- Fine-grained access control on tables, columns, and views
- Data lineage across pipelines and workspaces
- Audit logs that support compliance checks when reporting season gets busy
For teams spread across countries and clouds, this central control is especially helpful.
A Stepwise Migration Blueprint From Informatica and SSIS
Jumping from legacy ETL to a Lakehouse in one go is risky. A phased blueprint works better and fits the way EMEA businesses usually plan around quarters and half-years.
Start with discovery:
- Inventory existing ETL jobs and workflows
- Group them into domains, for example finance, supply chain, customer
- Score them by business impact, complexity, and risk
From there, pick a first wave that lines up with your roadmap, for example a set of reports that support regulatory filings or a high-visibility analytics product. For each workflow, refactor patterns, not exact tools. Mappings and transformations become DLT pipelines defined as code, with:
- Clear input and output tables per Medallion layer
- Version control in Git
- CI/CD for promotion across dev, test, and prod
Dual-running is your safety net. Legacy ETL and new DLT pipelines run side by side, feeding the same outputs. You compare row counts, checksums, and key metrics. At cutover time you:
- Switch consumers to the DLT-backed tables
- Keep legacy pipelines paused but ready
- Have a rollback plan if downstream issues appear in busy trading periods
This keeps risk low while still showing clear progress to stakeholders.
Modern ETL Pipeline Development with Delta Live Tables
DLT changes how pipelines are built and managed. Instead of long, nested workflows, you declare how tables relate and let the platform handle the ordering, retries, and incremental loads.
Key benefits include:
- Declarative definitions for tables and dependencies
- Native support for streaming and batch in one design
- Incremental processing so only new or changed data is handled
Expectations put data quality at the heart of ETL pipeline development. Failed records can be quarantined for review, while healthy data keeps flowing. This is a big shift from old ETL jobs that often stopped on error or buried issues in log files.
To keep things healthy at scale, teams add:
- Automated tests around key transformations
- Data quality dashboards for business owners
- SLAs for pipeline runtimes and freshness
- Monitoring alerts to avoid surprise cost spikes during peak seasons
Together with Unity Catalog, DLT pipelines become reusable patterns. Different teams can build on shared, governed data products instead of each crafting their own one-off feeds.
Unlocking Unity Catalog for Secure, Shared AI Data
Unity Catalog gives a shared, business-friendly view of data across workspaces, business units, and regions. That matters when data for one model may live across several countries, each with its own privacy rules.
A good design usually includes:
- Clear top-level catalogs for environments or regions
- Schemas mapped to domains like finance, HR, or operations
- Consistent naming for tables and views so users can find what they need
Row- and column-level security help protect sensitive fields, like personal data, while still letting analysts and AI teams work at scale. Policies ensure only the right roles see the right slice of data.
For AI, Unity Catalog links data and models through:
- Feature tables that are discoverable and reusable
- Training and fine-tuning datasets with clear lineage
- Traceability from model outputs back to the exact tables and versions used
This level of control is key for explaining AI decisions and passing audits.
Turning Your ETL Estate Into a Databricks Lakehouse Roadmap
Legacy Informatica and SSIS workflows do not need to be a blocker. They can be the blueprint for a steady, low-risk move to a Lakehouse built on Delta Live Tables and Unity Catalog. The sequence is clear: inventory your jobs, design your Medallion layers, refactor into DLT with CI/CD, and roll out Unity Catalog as the shared governance layer.
A simple self-check can help decide what to tackle before the next budget cycle:
- Which workflows drive high-impact reports or AI projects?
- Where are current SLAs most at risk?
- Which jobs are hardest to change or support?
- Where do compliance or audit teams struggle to get clear lineage today?
At Cosmos Thrace, as a Databricks Silver Partner working with EMEA enterprises, we focus on turning those answers into a practical migration path. The result is a Lakehouse where data engineering, analytics, and AI teams can build with confidence, even when the pressure is on and the weather outside is as grey as a London winter.
Get Started With Your Project Today
If you are ready to move from fragile data processes to a robust, scalable foundation, we are here to help. At Cosmos Thrace, our expert team can guide you through every stage of ETL pipeline development, from initial assessment to full production deployment. Share your requirements with us via our contact page and we will propose a tailored approach that fits your data landscape and business goals.