Back to case studies

Data Platforms

OnParts OEM Lookup Ingestion & Azure SQL Incremental Load Pipeline

OEM reference-data ingestion pipeline that validates lookup files, normalizes OEM numbers, parses fitment years, prevents duplicate SQL records, and incrementally loads clean reference data into Azure SQL.

PythonPandasAzure SQLSQLAlchemyReference DataOEM NormalizationIncremental LoadingDe-Dupe Logic

Reference data command view

OnParts OEM Lookup Ingestion

OEM lookup intake, validation, year parsing, normalization, existing-key comparison, and append-only Azure SQL loading

lookup-ready
01

Receive

Local / Blob

02

Validate

Columns / Schema

03

Normalize

OEM / Years

04

Compare

Existing keys

05

Load

Azure SQL

Validation checks

Source → SQL
Required columns
OEM number cleanup
YearStart / YearEnd
PingedCount normalization
Schema validation
Existing-key comparison

Reference load preview

Append-only
FieldCheckOutputStatus
OEM NumberNormalizeCanonical keyReady
Year RangeParseYearStart / YearEndMapped
PingedCountCleanOptional metricValidated
Existing KeyCompareAppend decisionDe-duped

SQL-ready outputs

OEM Lookup RecordsCanonical OEM KeysYearStart / YearEndAppend-Only InsertsDuplicate ReviewReference Table
OEM Reference
Lookup data ingestion
YearStart / YearEnd
Fitment-year parsing
Append-Only
Incremental SQL loading

Business problem

OEM lookup data needed repeatable ingestion while preventing duplicates. Source files could contain inconsistent OEM formatting, year ranges that needed parsing, optional fields that required cleanup, and records that may already exist in SQL.

The workflow needed a controlled reference-data loader that could validate the incoming file, normalize key fields, compare against existing records, and only append clean new rows into the database.

System built

Built local and blob ingestion, required-column validation, OEM-number cleanup, YearStart and YearEnd parsing, optional PingedCount normalization, schema validation, SQLAlchemy loading, and existing-key comparison.

The system turns OEM lookup files into a safer incremental SQL load process that supports downstream reference lookups, matching workflows, reporting, and future application logic.

Reference-data controls

Signals reviewed

The pipeline evaluates source readiness, schema quality, OEM key structure, fitment-year parsing, duplicate risk, and append-only load status before publishing lookup records into SQL.

Local / Blob source readiness
Required column coverage
OEM number normalization
Lookup description fields
YearStart parsing
YearEnd parsing
PingedCount normalization
Schema validation
Existing-key comparison
Duplicate candidate detection
Append-only insert readiness
Reference-data publication state

Incremental loading workflow

How it works

01

Receive

Collect OEM lookup files from local or blob-based intake paths and prepare them for controlled loading.

The pipeline starts by treating lookup files as managed reference-data inputs instead of one-off spreadsheet uploads.

02

Validate

Check required columns, source structure, schema readiness, and field-level quality before transformation.

Validation protects the reference table from incomplete files, missing identifiers, and malformed records.

03

Normalize

Standardize OEM numbers, parse year ranges, normalize optional PingedCount values, and clean lookup fields.

Normalization turns inconsistent reference rows into predictable records that can be compared and loaded safely.

04

Compare

Compare incoming records against existing SQL keys to prevent duplicate reference-data inserts.

The comparison layer gives the loader memory so repeated runs can add only new records instead of reloading the same OEM references.

05

Load

Append clean, new OEM lookup records into Azure SQL for downstream reporting and lookup workflows.

The final layer publishes reference data into a stable SQL structure that can support analytics, matching, and operational lookup use cases.

System layers

What the loader coordinates

Source intake

Accepts local or blob-based lookup files and brings them into a controlled reference-data ingestion workflow.

Validation layer

Checks required columns, schema readiness, year fields, optional counters, and field-level quality before loading.

Normalization layer

Standardizes OEM numbers, parses YearStart and YearEnd values, and cleans fields used by downstream lookup logic.

Incremental load layer

Compares incoming keys against existing SQL records and inserts only new, clean reference-data rows.

Impact signals

What the pipeline improved

Incremental Azure SQL loading for OEM reference data

OEM number de-duplication and existing-key comparison

YearStart and YearEnd parsing for cleaner fitment context

Required-column and schema validation before publication

Reference-data readiness for downstream lookup and matching

Operational value

Reference data made safer for downstream use

Cleaner reference data

Turns messy lookup files into structured OEM records that are easier to search, match, and use downstream.

Safer repeated loads

Existing-key comparison and append-only logic reduce the risk of duplicate inserts across recurring runs.

Better fitment context

YearStart and YearEnd parsing gives the lookup table more useful structure for reporting and matching.

Operational SQL readiness

Publishes clean records into Azure SQL so the data can support analytics, APIs, lookup tools, and future pipelines.

Why this project matters

Reference-data pipelines matter because lookup quality affects every downstream match, report, and application workflow.

This project shows how OEM lookup data can move from file-based reference material into a controlled SQL asset. Validation, normalization, fitment-year parsing, duplicate prevention, and append-only loading make the reference table more reliable over time.

The value is not only loading a file. The value is protecting the quality of the reference layer that other systems depend on.

Confidentiality note

Visuals and descriptions are sanitized conceptual representations. They do not expose private company data, OEM source files, customer records, internal pricing, Azure SQL credentials, operational screenshots, proprietary lookup tables, or source pipeline logic.