Data Platforms
Customer Parts Movement History Loader & Azure SQL De-Dupe Pipeline
CSV-to-Azure SQL ingestion pipeline that normalizes customer movement history, validates period and quantity fields, prevents duplicate records, and loads append-ready data into a reporting foundation.
Movement-history loader view
Customer History Loader
CSV source files, header mapping, validation, de-dupe, and Azure SQL loading
Read
CSV
Map
Headers
Validate
Month / qty
De-dupe
Natural key
Load
Azure SQL
Data quality controls
Load preview
append-only| Customer | Part | Period | Qty | Status |
|---|---|---|---|---|
| Customer A | 100-4412 | 2026-05 | 14 | New |
| Customer B | 225-0198 | 2026-05 | 8 | Exists |
| Customer C | 300-7710 | 2026-06 | 21 | New |
| Customer D | 410-2250 | 2026-06 | 3 | Review |
Loader flow
Business problem
Customer movement history required reliable loading, duplicate prevention, and cleaner structure before it could support reporting or downstream analytics. Repeated files, inconsistent headers, and period fields created risk for bad or duplicated history.
The process needed a loader that could normalize incoming CSV data, validate records, prevent duplicate inserts, and preserve a clean Azure SQL history table.
System built
Built a Python and Pandas loader with header mapping, month/year validation, quantity coercion, part-number cleanup, natural-key de-dupe logic, Azure SQL table creation, SQLAlchemy loading, and append-only inserts.
The system turns customer movement CSV files into a cleaner, de-duplicated database layer ready for reporting, movement analytics, and historical analysis.
Load signals
Signals reviewed
The loader evaluates file readiness, source structure, validation rules, and existing SQL records before inserting new movement-history data.
Ingestion flow
How it works
Read
Load customer movement-history CSV files and inspect the incoming structure before transformation.
The loader starts by turning source files into a controlled dataframe so the rest of the pipeline can apply validation and mapping rules.
Map
Apply header mapping and alias handling so inconsistent source names align to the expected schema.
This makes the loader more resilient when source files use slightly different column names or export formats.
Validate
Validate month, year, quantity, required fields, and source readiness before records move into SQL.
The validation layer protects the target table from malformed dates, bad quantities, missing fields, and unusable rows.
De-dupe
Build canonical natural keys and compare incoming records against existing SQL history to prevent duplicate inserts.
The de-dupe logic gives the loader memory and makes repeated runs safer by avoiding duplicate movement history.
Load
Create or validate the Azure SQL table and insert only new, clean, append-ready records.
The final layer delivers a clean movement-history table that can support analytics, reporting, and downstream business logic.
Data engineering layers
What the loader coordinates
Header mapping
Normalizes inconsistent CSV column names into a stable schema that the loader can trust.
Validation layer
Checks month, year, quantity, required fields, and conversion readiness before SQL loading.
Canonical de-dupe
Uses natural-key logic to identify records that already exist and prevent duplicate movement history.
Azure SQL loading
Creates or validates the target table and performs append-only inserts through SQLAlchemy.
Impact signals
What the loader improved
Canonical de-duplication for safer repeated loads
SQLAlchemy loading into Azure SQL
Header alias handling for source-file flexibility
Month / year and quantity validation before insert
Scheduled-loader readiness for repeatable movement-history ingestion
Operational value
Movement history turned into a reliable SQL foundation
Cleaner history loading
Moves customer movement history from manual file handling into a repeatable ingestion pipeline.
Duplicate prevention
Natural-key logic helps prevent repeated source files from inflating movement history.
More reliable SQL tables
Validation and type coercion reduce the chance of malformed records landing in the database.
Analytics-ready foundation
Creates a cleaner history layer that can support customer behavior, movement, sales, and inventory analysis.
Why this project matters
Customer movement history converted into trusted database history.
This project shows how raw historical files can become a controlled data platform layer. Header mapping, validation, type cleanup, canonical de-dupe, and append-only SQL loading create a stronger foundation for analytics.
The value is not just loading rows. The value is protecting historical movement data from duplication, malformed records, and inconsistent source-file structure.
Confidentiality note
Visuals and descriptions are sanitized conceptual representations. They do not expose private company data, customer records, credentials, raw exports, internal pricing, operational screenshots, or proprietary source files.