Data Platforms

OnParts OEM Lookup Ingestion & Azure SQL Incremental Load Pipeline

OEM reference-data ingestion pipeline that validates lookup files, normalizes OEM numbers, parses fitment years, prevents duplicate SQL records, and incrementally loads clean reference data into Azure SQL.

PythonPandasAzure SQLSQLAlchemyReference DataOEM NormalizationIncremental LoadingDe-Dupe Logic

Reference data command view

OnParts OEM Lookup Ingestion

OEM lookup intake, validation, year parsing, normalization, existing-key comparison, and append-only Azure SQL loading

lookup-ready

Receive

Local / Blob

Validate

Columns / Schema

Normalize

OEM / Years

Compare

Existing keys

Load

Azure SQL

Validation checks

Source → SQL

Required columns

OEM number cleanup

YearStart / YearEnd

PingedCount normalization

Schema validation

Existing-key comparison

Reference load preview

Append-only

Field	Check	Output	Status
OEM Number	Normalize	Canonical key	Ready
Year Range	Parse	YearStart / YearEnd	Mapped
PingedCount	Clean	Optional metric	Validated
Existing Key	Compare	Append decision	De-duped

SQL-ready outputs

OEM Lookup RecordsCanonical OEM KeysYearStart / YearEndAppend-Only InsertsDuplicate ReviewReference Table

OEM Reference

Lookup data ingestion

YearStart / YearEnd

Fitment-year parsing

Append-Only

Incremental SQL loading

Business problem

OEM lookup data needed repeatable ingestion while preventing duplicates. Source files could contain inconsistent OEM formatting, year ranges that needed parsing, optional fields that required cleanup, and records that may already exist in SQL.

The workflow needed a controlled reference-data loader that could validate the incoming file, normalize key fields, compare against existing records, and only append clean new rows into the database.

System built

Built local and blob ingestion, required-column validation, OEM-number cleanup, YearStart and YearEnd parsing, optional PingedCount normalization, schema validation, SQLAlchemy loading, and existing-key comparison.

The system turns OEM lookup files into a safer incremental SQL load process that supports downstream reference lookups, matching workflows, reporting, and future application logic.

Reference-data controls

Signals reviewed

The pipeline evaluates source readiness, schema quality, OEM key structure, fitment-year parsing, duplicate risk, and append-only load status before publishing lookup records into SQL.

Local / Blob source readiness

Required column coverage

OEM number normalization

Lookup description fields

YearStart parsing

YearEnd parsing

PingedCount normalization

Schema validation

Existing-key comparison

Duplicate candidate detection

Append-only insert readiness

Reference-data publication state

Incremental loading workflow

How it works

Receive

Collect OEM lookup files from local or blob-based intake paths and prepare them for controlled loading.

The pipeline starts by treating lookup files as managed reference-data inputs instead of one-off spreadsheet uploads.

Validate

Check required columns, source structure, schema readiness, and field-level quality before transformation.

Validation protects the reference table from incomplete files, missing identifiers, and malformed records.

Normalize

Standardize OEM numbers, parse year ranges, normalize optional PingedCount values, and clean lookup fields.

Normalization turns inconsistent reference rows into predictable records that can be compared and loaded safely.

Compare

Compare incoming records against existing SQL keys to prevent duplicate reference-data inserts.

The comparison layer gives the loader memory so repeated runs can add only new records instead of reloading the same OEM references.

Load

Append clean, new OEM lookup records into Azure SQL for downstream reporting and lookup workflows.

The final layer publishes reference data into a stable SQL structure that can support analytics, matching, and operational lookup use cases.

System layers

What the loader coordinates

Source intake

Accepts local or blob-based lookup files and brings them into a controlled reference-data ingestion workflow.

Validation layer

Checks required columns, schema readiness, year fields, optional counters, and field-level quality before loading.

Normalization layer

Standardizes OEM numbers, parses YearStart and YearEnd values, and cleans fields used by downstream lookup logic.

Incremental load layer

Compares incoming keys against existing SQL records and inserts only new, clean reference-data rows.

Impact signals

What the pipeline improved

Incremental Azure SQL loading for OEM reference data

OEM number de-duplication and existing-key comparison

YearStart and YearEnd parsing for cleaner fitment context

Required-column and schema validation before publication

Reference-data readiness for downstream lookup and matching

Operational value

Reference data made safer for downstream use

Cleaner reference data

Turns messy lookup files into structured OEM records that are easier to search, match, and use downstream.

Safer repeated loads

Existing-key comparison and append-only logic reduce the risk of duplicate inserts across recurring runs.

Better fitment context

YearStart and YearEnd parsing gives the lookup table more useful structure for reporting and matching.

Operational SQL readiness

Publishes clean records into Azure SQL so the data can support analytics, APIs, lookup tools, and future pipelines.

Why this project matters

Reference-data pipelines matter because lookup quality affects every downstream match, report, and application workflow.

This project shows how OEM lookup data can move from file-based reference material into a controlled SQL asset. Validation, normalization, fitment-year parsing, duplicate prevention, and append-only loading make the reference table more reliable over time.

The value is not only loading a file. The value is protecting the quality of the reference layer that other systems depend on.

Confidentiality note

Visuals and descriptions are sanitized conceptual representations. They do not expose private company data, OEM source files, customer records, internal pricing, Azure SQL credentials, operational screenshots, proprietary lookup tables, or source pipeline logic.