The Challenge

The company was a card payment processor operating entirely on AWS — S3, Redshift, Lambda, the full stack. Their finance team had no visibility on what individual payment transactions actually cost. Every month-end, an analyst would spend 4–5 days manually pulling fee invoices from 10+ acquiring partners, normalising the formats in Excel, joining them against transaction exports, and calculating cost-per-transaction by card scheme, acquirer, and region.

The process was fragile. A single formatting change from one acquirer's fee file broke the entire model. There was no audit trail, no version control, and no confidence in the numbers until the analyst had personally verified every join. Leadership had no intra-month visibility at all — by the time the cost figures landed, they were already a month stale.

The ask was straightforward: make this automatic, make it accurate, and make it available in near real-time. The constraint was equally clear: the company was not moving off AWS. Any solution had to work alongside the existing infrastructure, not replace it.

Why Microsoft Fabric

The obvious path would have been to build inside the existing AWS stack — Glue for ingestion, Redshift or Athena for transformation, QuickSight for reporting. But the company already had a Microsoft 365 and Power BI licence footprint, and the finance team was comfortable in Excel and Power BI. Introducing Fabric meant giving them a unified platform they could actually own.

The critical capability was Fabric Shortcuts to AWS S3. Rather than migrating data out of S3 — which would have required infrastructure changes, data pipeline re-plumbing, and a protracted sign-off process — Shortcuts allowed Fabric to treat the existing S3 buckets as first-class lakehouse storage. Transaction data stayed in AWS. Fabric read it directly.

"We didn't move the data. We moved the intelligence layer on top of it — which is exactly what the business needed without the migration risk."

Architecture & Implementation

Medallion Lakehouse on AWS S3

The architecture followed a standard bronze/silver/gold medallion pattern, with each layer serving a distinct purpose:

CI/CD via Azure DevOps and Fabric Git Integration

Every notebook, pipeline, and semantic model definition was version-controlled through Fabric's native Git integration, connected to Azure DevOps. This was non-negotiable — the old Excel model had no history, no rollback, and no way to understand why a number changed. The new platform needed to be fully auditable from day one.

Pull requests were required for any change to the gold layer. Deployments from development to production ran through an automated Azure DevOps pipeline, with schema validation checks before each promotion. The finance team could see exactly when any calculation changed and why.

Power BI Semantic Model

The semantic model was built on Direct Lake mode — no import, no scheduled refresh, no extract. Reports queried the gold Delta tables in Fabric directly, meaning the finance team always saw data that was at most a few hours old rather than a day behind a refresh schedule. Cost-per-transaction sliced by acquirer, card scheme, currency, region, and time period — all available interactively without waiting for a monthly Excel run.

Technologies Used
Microsoft Fabric Fabric Shortcuts (AWS S3) Medallion Architecture PySpark Delta Lake Power BI Direct Lake Azure DevOps Fabric Git Integration DAX

The Result

The platform went live three months after kickoff. On the first month-end after go-live, the finance team's cost close completed the same morning the period ended — compared to the 4–5 day manual process it replaced. The Excel cost model was retired entirely at handover.

Beyond the time saving, the quality improvement was significant. The previous process had no way to detect when an acquirer changed their fee file format mid-month — it only became visible when the analyst tried to run the model at month-end. The new pipeline surfaces parsing failures within hours of ingestion, giving the team time to respond before close.

The finance team received full documentation and a handover session covering the pipeline architecture, how to add new acquirer formats, and how to interpret the audit logs. They were fully self-sufficient from day one of go-live.

Key Decisions

Shortcuts over migration. Using Fabric Shortcuts to read S3 directly meant zero changes to the existing AWS infrastructure. The company avoided a migration project, an infrastructure review, and the associated risk of disrupting live payment processing data flows. The trade-off — a dependency on the Shortcuts connection remaining stable — was acceptable given the maturity of the feature and the cost saving on scope.

PySpark over Dataflows Gen2. With 10+ different flat-file formats from acquiring partners — each with its own column naming, encoding quirks, and schema drift patterns — PySpark notebooks gave the parsing layer the flexibility to handle edge cases that a low-code tool would have struggled with. Each acquirer has its own versioned notebook, making it straightforward to update when formats change.

Direct Lake over Import mode. The finance team specifically wanted intra-day visibility. Import mode with a daily refresh would have delivered yesterday's data at best. Direct Lake delivered near-real-time without the infrastructure overhead of a live connection or the query performance cost of DirectQuery.

Working on a similar problem?

I take on a small number of engagements per year. If you need cost intelligence, a Fabric implementation, or a migration from a legacy BI stack — let's talk.