ELT Pipeline with Medallion Architecture

Value statement: Production ELT pipeline implementing medallion architecture with dbt transformations, Kimball dimensional modeling, and 99%+ data quality through automated testing.

Overview

Built a production-grade ELT pipeline using medallion architecture (Bronze/Silver/Gold) with dbt transformations orchestrated by Apache Airflow. The pipeline features dimensional models with star schema design, comprehensive dbt testing, and incremental processing that reduces run times from 45 minutes to 8 minutes.

Architecture

ELT Pipeline Architecture

Technology Stack

LayerTechnologies
Transformationdbt Core, Jinja, SQL
Data WarehouseSnowflake
OrchestrationApache Airflow
Testingdbt tests (schema + data quality)
ModelingKimball dimensional modeling

Implementation Details

Medallion Layers

Bronze (Raw): Immutable source data with _loaded_at and _source_system audit columns. Preserves full history for reprocessing.

Silver (Cleansed): Type casting, deduplication via row_number(), and business rule validation. Uses incremental models with merge strategy.

Gold (Analytics): Fact tables (fact_sales, fact_orders) and dimensions (dim_customer, dim_product, dim_date) following Kimball methodology. SCD Type 2 preserves historical changes.

Data Quality Testing

Comprehensive dbt test suite achieving 99%+ accuracy:

Orchestration

Airflow DAG coordinates the pipeline:

Metrics

MetricValue
Silver Models30+ staging models
Gold Models15+ fact/dimension tables
dbt Tests150+ automated tests
Accuracy99%+
Full Refresh45 minutes
Incremental8 minutes
Data Volume50M+ rows

Reliability

Lessons Learned

Incremental strategy: Switched from unique_key to partitioned merge with date filtering, reducing processing time 70%.

Test early: Moved custom business logic tests to Silver layer to catch issues before Gold.

Star schema simplicity: Removed snowflaking from dimensions, improving BI query performance 3x.

Future Improvements