Home/ Services/ Data Integration & Pipelines
SERVICE · 04 / DATA PIPELINES

Connect every
source. Land it
clean.

Modern ELT, change-data-capture, and real-time streaming wired into one governed lakehouse — so the model on top reads from a foundation that's always current and always trustworthy.

Stack FABRIC PIPELINES DATAFLOWS GEN2 EVENTSTREAM AZURE DATA FACTORY DBT PYTHON
14 CONNECTORS · LIVE
CDC // LATENCY ~12s
SCHEMA DRIFT · TRACKED
PIPELINE · GRAPH
ERP
STAGE
CLEAN
JOIN
GOLD
CRM
STAGE
DEDUP
ENRICH
SAAS
STAGE
VALIDATE
MERGE
FILES
PARSE
NORM
JOIN
API
STAGE
HASH
UPSERT
THROUGHPUT · 24h
CONNECTORS
D365SAPSFSQL S3ADLSRESTFILE KAFKACDCWBXLSX
LIVE · EVENTS
EVENTS / SEC
2,148/s
WHY NOW · 01

Pipelines are load-bearing.

Every model on top of your data — every dashboard, every forecast, every agent — is only as good as the pipeline feeding it. When integration is brittle, everything downstream feels brittle too.

{ } /01 · BRITTLE

Hand-written, hand-fixed

SSIS packages, scheduled SQL jobs, and a Power Automate flow nobody wants to touch. Every source change is a ticket and an apology.

/02 · STALE

Yesterday's data, today's question

Overnight ETL means the dashboard people open at 9am misses the order that came in at 2am. Streaming is the obvious fix — except no one owns the platform.

/03 · UNGOVERNED

No lineage, no trust

When finance asks "where does this number come from?" the answer involves four hops, three nicknames, and a Friday-afternoon notebook on someone's laptop.

CAPABILITIES · 02

What we build.

Eight focused capabilities. We pick the smallest set that lands clean data into your gold layer — and skip the rest.

/01 — INGESTION

Source connectors & ingestion

ERP, CRM, SaaS, files, APIs, Kafka, and lakehouses — wired into Fabric, ADF, or your tool of choice with retries, schema checks, and lineage.

/02 — ELT

Modern ELT & transformation

Bronze → Silver → Gold patterns in dbt, Power Query, Spark, or T-SQL — version-controlled and tested like the rest of your code.

/03 — CDC

Change data capture

Sub-minute change tracking from operational databases without putting load on production — log-based, snapshot-safe, idempotent.

/04 — STREAM

Real-time streaming

Eventstream, Event Hubs, and Kafka pipelines for the use cases that genuinely need sub-second freshness — no streaming-for-streaming's-sake.

/05 — ORCHESTRATION

Pipeline orchestration

Dependencies, retries, alerting, and SLA tracking via Fabric Pipelines, Azure Data Factory, or Airflow when the stack calls for it.

/06 — QUALITY

Data quality & testing

Great Expectations, dbt tests, and contract checks at the source boundary. Bad rows quarantined, not silently swallowed.

/07 — LINEAGE

Catalog, lineage & governance

Microsoft Purview or Unity Catalog wired into the build process so lineage is generated, not maintained.

/08 — MIGRATION

Legacy ETL migration

Retire SSIS, Informatica, and bespoke .NET jobs onto a maintainable modern stack — without a Big Bang weekend.

ARCHITECTURE · 03

Sources to decisions,
in five hops.

The reference flow we build on top of. Substitute any one block for your stack — the contract between layers stays the same.

/01 SOURCES /02 INGEST /03 LAKEHOUSE · ONELAKE /04 TRANSFORM /05 SERVE D ERP · D365 ORDERS · GL S CRM · Salesforce ACCOUNTS · OPP {} SaaS APIs REST · GRAPHQL F Files · FTP · S3 CSV · PARQUET Streams · Kafka EVENTS · IoT FABRIC · ADF Pipelines ORCHESTRATION Dataflows Gen2 POWER QUERY Eventstream REAL-TIME CDC LOG-BASED + CONNECTORS ONELAKE BRONZE Raw DELTA · IMMUTABLE SILVER Cleaned · joined DEDUP · CONFORMED GOLD Modeled · governed STAR · MEASURES DBT · SPARK · SQL dbt models SQL · TESTED Spark notebooks PYTHON · SCALA Power Query M · SELF-SERVE QUALITY GATES TESTS ALERTS SLAs Power BI DIRECT LAKE X Excel · Acterys PLAN · WRITE-BACK APIs · Apps EMBEDDED Agents · ML LLM · COPILOT $ Decisions BOARD · OPS
FLOW · LEFT TO RIGHT HIGHLIGHTED · ARKIMETRIX DEFAULT · · · ANIMATED DATA FLOW
PATTERNS · 04

Three flavours.
Pick the right one, once.

Most data integration mistakes come from picking the wrong pattern. Here's how we think about the choice — and when each one earns its keep.

/01 · Batch ELT

Batch ELT

Hourly or daily extracts to bronze, governed transforms to gold. Simple, cheap, debuggable. The right default for the 80% of analytics that don't need sub-minute latency.

USE WHEN · BUDGETS · CLOSE · MGMT REPORTING
/02 · CDC

Change data capture

Log-based replication from operational databases — minutes-fresh data without taxing the source. The middle ground that solves most "we need it faster" requests.

USE WHEN · OPS DASHBOARDS · CUSTOMER-360
/03 · Streaming

Real-time streaming

Event-driven pipelines on Eventstream, Event Hubs, or Kafka. Sub-second freshness for the workloads that actually need it: fraud, IoT, real-time inventory.

USE WHEN · IoT · FRAUD · REAL-TIME OPS
PROCESS · 05

How we engage.

Four phases with one sign-off each. Pipelines in production by the end of week six.

/01 · DAYS 1–4

Source audit

Inventory of sources, current jobs, schemas, and the questions they're meant to answer.

/02 · WEEK 1–2

Architecture

Reference design — pick the pattern (batch / CDC / streaming) per source. One readable diagram.

/03 · WEEK 2–6

Build & ship

Three priority pipelines into production with tests, lineage, and alerting from day one.

/04 · ONGOING

Migrate & retire

Sunset legacy ETL incrementally. Hand off the runbook.

OUTCOMES · 06

What you can expect to change.

Outcomes worth aiming for. Specific numbers depend on your tenant; we baseline them in discovery.

One governed gold layer

Every downstream model — reports, forecasts, agents — pulls from the same trusted dataset.

FOUNDATION
Fresh data, on time

The right freshness for each use case — daily where that's enough, sub-minute where it isn't.

LATENCY
Lineage you can read

Source-to-report lineage in Purview. "Where does this number come from?" is a one-click answer.

TRUST
Quiet on-call

Tested transforms, idempotent loads, sane alerting. Pipelines that survive a long weekend.

RELIABILITY
A maintainable stack

Source-controlled, peer-reviewed code. Onboarding a new analyst takes days, not months.

CONTINUITY
STARTER ENGAGEMENT

Pipeline Foundations.

A four-to-six-week engagement that lands three production pipelines into a governed lakehouse — with tests, lineage, and a runbook your team can extend.

Source auditInventory · schemas · current jobs
ArchitecturePattern per source · reference design
3 pipelinesBronze → Silver → Gold in production
Quality gatesTests · contracts · alerting
LineagePurview wired · auto-generated
Enablement2 sessions · runbook · pattern docs

Engagements scoped per tenant. Fixed-price options available for foundations work; T&M for build-out and migration.

Scope this for us → Other services
FAQ · 07

Common questions.

The questions we get most often during scoping. If yours isn't here, write to info@arkimetrix.com.

Do we have to use Microsoft Fabric?

No. Fabric is our default because most of our clients already live in the Microsoft estate, but we've built happy stacks on Snowflake + dbt, Databricks, and BigQuery too. The pattern transfers.

What happens to our existing SSIS / Informatica jobs?

We catalogue them, group by source system, and migrate in slices — not all at once. Each new pipeline replaces a clearly-defined set of legacy jobs, which then get retired. No big-bang weekends.

Do we really need streaming?

Probably not. Most "we need real-time" requests are actually "we need fresher than overnight" — and CDC solves that for a fraction of the operational cost. We'll pick the pattern per source during architecture.

How do you handle schema drift from upstream sources?

Schema contracts at the source boundary, automated drift detection in CI, and quarantine for non-conforming rows. Sources change — pipelines shouldn't break silently.

Can you work alongside our internal data team?

Yes — most engagements are co-builds. We bring patterns and a senior pair-programmer; your engineers own the code by the end.

NEXT STEP

Need pipelines that
don't break on Monday?

A 30-minute scoping call. We'll look at your top three sources, the worst job you have, and the freshness your stakeholders actually need. We'll tell you whether we're a fit — and what we'd do first.

info@arkimetrix.com → Schedule a 30-min intro
Practice leadData Integration & Pipelines
Typical engagement4 — 6 weeks · foundations
CoverageFabric · Synapse · Databricks · custom
What you keepPipelines · DAGs · runbooks