Intermediate Architecture

Budgets, Cost Caps, and the Illusion of 'Spending Limits'

There is a question that comes up at every cloud architecture review when finance is in the room: can we set a hard cap on cloud spend so we never exceed budget? The answer everyone wants is yes. The honest answer is no — not in the way they mean.

Both Azure and OCI use phrases like “spending limits” and “cost controls” in documentation and product interfaces. The platforms are largely transparent in their technical docs — the confusion is more interpretive than deceptive. But the terminology creates an expectation of hard billing caps that commercial enterprise contracts do not provide. The useful thing a cloud architect can communicate to the CFO: neither cloud provides a universal billing-level hard cap in any commercial enterprise contract, but each offers different enforcement mechanisms with meaningfully different properties, and knowing which ones work is the entire game.

This article is about what cost mechanisms genuinely exist on each cloud, what they actually do, and how to assemble them into something that comes close to a hard cap without pretending you have one.

What “spending limit” really means on each cloud

Azure Spending Limit is a real feature with a specific scope: it applies only to credit-based subscriptions. Azure Free Trial, Visual Studio (MSDN) benefit subscriptions, and similar credit-pool offers. When the spending limit is on, the subscription stops accepting deployments once the credit runs out. Resources are disabled.

What it does not apply to: Pay-As-You-Go, Enterprise Agreement (EA), Microsoft Customer Agreement (MCA). Every commercial contract type. Removing the spending limit is a one-way action — once removed, it cannot be re-enabled, and the subscription becomes uncapped Pay-As-You-Go.

So in any organisation paying for Azure with a real contract: the spending limit feature is unavailable.

OCI’s billing cap story has the same gap at the billing layer. There is no “spending limit” feature for tenancies. Free tier accounts have implicit resource caps (Always Free quotas hold; trial credits expire); commercial Pay-As-You-Go and Universal Credit tenancies have no billing-level equivalent.

Where OCI diverges significantly from Azure is in resource enforcement. OCI’s compartment quota system — entirely separate from budgets — is a first-class platform enforcement control designed specifically for this purpose. This distinction matters and is covered in detail below.

Both clouds have Budgets, and budgets are alerts. They generate notifications when spend crosses configured thresholds. They do not stop deployments. They do not pause resources. They send emails and webhooks.

This is the gap. Marketing pages talk about “controlling spend.” The mechanism is “telling you when spend got too high.” Those are different.

Azure Budgets — what they actually do

Azure Cost Management Budgets work at multiple scopes:

  • Management group
  • Subscription
  • Resource group
  • Billing scope (EA / MCA / billing account)

Budget periods: monthly, quarterly, or annual. Auto-renewing or one-shot. Two threshold types:

Actual cost — alert fires when accumulated spend reaches the configured percentage of budget.

Forecasted cost — alert fires when the predicted end-of-period spend exceeds the threshold, based on current spend velocity. This is more useful for early warning. The forecast model needs a few months of history to stabilise.

Budgets are evaluated every 24 hours. Cost data lags by 8–24 hours. Combined, that means alerts can lag spend by up to 48 hours in the worst case. For “we’re trending over budget” this is fine; for “stop spending right now” it is not.

# Azure: simple budget creation via CLI
az consumption budget create \
  --budget-name "Monthly-Subscription-Budget" \
  --amount 10000 \
  --category Cost \
  --time-grain Monthly \
  --start-date 2026-05-01 \
  --subscription <subscription-id>

Budgets can trigger Action Groups, which are Azure’s notification fan-out: email, SMS, push notification, voice call, webhook, ARM Logic App, Azure Function, Automation Runbook. The webhook and runbook options are where you wire up actual remediation — the budget alert fires, the action group calls a function, the function takes some action.

That action is the hard part. What can the function do?

  • Send a sterner email to the workload owner.
  • Tag resources for review.
  • Stop / deallocate non-production VMs.
  • Disable a service principal so deployments stop.
  • Set quotas to zero on the subscription (limited; some quotas cannot go below platform minimums).
  • Suspend the subscription (technically possible but not a designed FinOps control — it takes every resource offline, requires elevated permissions, and is operationally dangerous outside emergency scenarios or isolated test environments).

None of this is built in. You write it. The function lives in your platform code; the runbook is yours to maintain. The “automation” Microsoft references in cost docs is the capability, not the implementation.

OCI Budgets — what they actually do

OCI Budgets are scoped to compartments or cost-tracking tags. They fire on actual spend, evaluate periodically, and notify via OCI Notifications (which can fan out to email, OCI Streaming, OCI Functions, PagerDuty via webhook, Slack via webhook).

resource "oci_budget_budget" "payments_prod_monthly" {
  compartment_id     = var.tenancy_ocid   # always at tenancy level
  amount             = 10000
  reset_period       = "MONTHLY"
  description        = "Monthly budget for payments-prod compartment"
  target_type        = "COMPARTMENT"
  targets            = [oci_identity_compartment.payments_prod.id]
}

resource "oci_budget_alert_rule" "payments_prod_80pct" {
  budget_id      = oci_budget_budget.payments_prod_monthly.id
  threshold      = 80
  threshold_type = "PERCENTAGE"
  type           = "ACTUAL"
  recipients     = "platform-finance@example.com"
  message        = "Payments-prod is at 80% of monthly budget."
}

Same general shape as Azure budgets: alerting only, no auto-stop. The auto-action pattern requires hooking the notification to an OCI Function that calls the OCI APIs to do something — same shape as Azure’s runbook approach.

The structural differentiator for OCI: compartment quotas. This is not a workaround or a proxy — it is a first-class enforcement control, designed for exactly this purpose. A compartment with Set compute quota standard-e4-core-count to 100 cannot deploy more than 100 standard-E4 cores. The platform rejects the deployment at the API layer, synchronously, with no alert processing lag and no automation to write. This is qualitatively different from Azure’s budget-triggered automation: OCI quotas are declarative, platform-enforced resource ceilings. Azure budgets plus automation are reactive spend notifications with an action attached.

One important distinction: compartment quotas operate at the resource layer, not the billing layer. They do not cap a euro amount directly — they cap the resources that produce spend. In steady-state environments with predictable workloads, this is functionally equivalent to a hard spend ceiling, and it is the only such mechanism either cloud offers natively at enterprise scale.

The honest mechanism for “hard cap”

Putting it together:

GoalAzure mechanismOCI mechanism
Soft alert at 50%Budget with 50% threshold, emailBudget alert rule, threshold 50%
Soft alert at 100%Budget with 100% threshold, email + Action Group → functionBudget alert rule, threshold 100% → Function
Forecast alertBudget with forecast threshold(no native forecast in budgets; build via cost analysis API)
Hard cap by resource countSubscription quotas (limited control)Compartment quotas (declarative, full control)
Hard cap by total spendCustom: Function that revokes RBAC or suspends subscriptionCustom: Function that zeros compartment quotas
Truly hard kill switch (billing)None built inNone built in

At the billing layer, neither cloud offers a true kill switch — and this is the honest answer to the CFO’s question. At the resource layer, OCI quotas pre-emptively enforce ceilings without any automation required; Azure has no direct equivalent. The closest each cloud can reach for emergency shutdown:

  • Azure: budget alert at 100% triggers a function that disables deployments — service principal revocation, selective quota reduction, or subscription suspension. None of these are designed FinOps controls; they are automation you build, test, and own.
  • OCI: set compartment quotas to the resource ceiling you want enforced before you approach the budget limit. For emergency shutdown, a function can zero all quotas at alert time — and because quotas are the designed enforcement mechanism, this is clean, synchronous, and reversible via a single Terraform apply.

The asymmetry matters: OCI quota management operates synchronously at the control-plane level. Azure budget automation is a reactive pattern that runs asynchronously after a 24–48 hour cost data ingestion lag. For spend-sensitive environments, this is an architectural difference, not a cosmetic one.

In both cases: you have to build the emergency automation. For OCI, the proactive quota setting requires no automation at all. The platforms ship the alerting; the hard enforcement action is yours — except in OCI, where declarative quotas let you set the ceiling before the alarm fires.

Reality Check

This conversation comes up at almost every cloud architecture review where finance is in the room. The request is always some version of “set a hard cap so we cannot exceed €X per month.” The honest answer is always the same: that mechanism does not exist on a real commercial contract on either cloud, and anyone who tells you otherwise has not tried to implement it. What you can build is a soft cap with automated remediation that approximates a hard cap with two-day worst-case lag. Finance teams who hear that clearly and accept it once always end up better-prepared than the ones who keep looking for the button that does not exist.

Cost anomaly detection — the more useful tool

Both clouds have anomaly detection that is more useful day-to-day than budgets. Anomaly detection notices unusual spending patterns: a service that normally costs €50/day suddenly costing €500/day, a region where spend appears that has not had spend before, a SKU that is suddenly active.

Azure Cost Management has built-in anomaly detection on the Cost Analysis blade. Daily anomalies are surfaced; you can configure alerts. The detection is automatic and doesn’t require setup beyond the alert wiring.

OCI Cost Analysis has cost trends visualisation and exposes daily cost reports, usage reports, and API-based access to historical spend data. The built-in anomaly detection UI is less mature than Azure’s. The practical path is to export cost data daily to OCI Object Storage, ingest into OCI Logging Analytics or a third-party FinOps tool, and run anomaly detection there. The underlying data quality is sufficient; it is the out-of-the-box experience that requires more assembly.

For active cost management, anomaly detection catches things budgets miss. A workload that has been running €500/day for months is invisible to a monthly budget alert until late in the month; it is visible to anomaly detection only if you fed it the previous trend, which is harder. The two mechanisms catch different problems.

Discounts, commitments, and the cost-of-saving paradox

Both clouds offer significant discounts in exchange for commitments. The names differ but the shapes are similar:

MechanismAzureOCI
1- or 3-year compute commitmentReservations (Reserved VM Instances)Universal Credits with annual commitment
Per-account commitment, broader scopeSavings Plans for Compute(less of a direct equivalent)
Spot / preemptibleSpot VMsPreemptible Instances
Pay-as-you-goStandardPay-as-you-go

The trap with commitments is overcommitting. A 3-year reservation for a VM family you stop using in 18 months is a sunk cost. The discount is meaningful (often 30–60% off pay-as-you-go) but only realises if utilisation stays high. Workloads change. Architectures change. Reserve carefully, in chunks, and only for stable baseline capacity.

The Microsoft pattern that holds up: reserve 60–80% of baseline capacity, run pay-as-you-go for variable, supplement with spot for non-critical workloads. The OCI equivalent: commit Universal Credits to the level of expected baseline spend, with 20–40% headroom in pay-as-you-go credit.

The mistake organisations make with commitments: treating them as straightforward savings without accounting for the lock-in. A 3-year reservation is a 3-year forecast you have committed to in money. Wrong forecast, expensive lesson.

One structural difference worth understanding: Azure Reservations are SKU-specific commitments — you commit to a particular instance family in a specific region (e.g. Standard_D4s_v5, West Europe), and if your workload moves to a different family, the reservation’s value is stranded. Exchange and Return policies exist but add operational overhead. OCI Universal Credits operate as a flexible consumption pool — you commit to a spend level, and the credits apply across eligible OCI services and regions without SKU lock-in. OCI’s model is more forgiving for organisations whose architecture is still evolving. Azure Savings Plans for Compute partially close this gap — they apply across instance families and regions — but Reservations remain more common for workloads with stable, predictable compute shapes. The financial instrument your procurement team signs matters when forecasting commitment utilisation.

Architectural Pro Tip

Treat budgets, anomaly detection, and commitments as three separate tools that do different jobs. Budgets are for “are we on track this month.” Anomaly detection is for “did something just spike.” Commitments are for “we are confident this baseline is real for the next 1–3 years.” Conflating them — using budgets to manage anomalies, or commitments to control budget — produces poor outcomes from each. Three tools, three jobs, three reviews.

What to actually wire up on day one

A pragmatic minimum viable cost program:

  1. Budgets at the right scopes. One per subscription/compartment that has meaningful spend. Forecasted alert at 80% of monthly budget; actual alert at 100%.
  2. Anomaly detection on. Azure: enabled by default in Cost Management. OCI: build via daily exports into Logging Analytics or a third-party tool.
  3. Tag-based cost attribution. Make sure cost allocation tags (cost-center, workload, environment, owner) are applied at creation. Cost reports without tags are useless for chargeback.
  4. Daily cost summaries by team. A Slack message or email per team showing yesterday’s spend and the trend. Visibility to the team that owns the spend is what changes behaviour.
  5. Quarterly commitment review. Look at baseline utilisation; adjust reservations / Universal Credits commitments. Do not let them age silently.
  6. Sandbox-grade hard caps for spend-prone environments. For OCI: set compartment quotas as the primary enforcement tool — declarative, platform-enforced, no automation required. For Azure: no single control is sufficient; combine subscription-level isolation (one subscription per workload class), Azure Policy to restrict allowed resource types and SKUs, and budget-triggered automation for breach response. The combination approximates what OCI quotas provide natively.

The list above takes a couple of weeks to implement and saves more money than it costs in maintenance. Above it is FinOps practice — capacity planning, rightsizing reviews, savings plan optimisation — which is its own discipline and worth a separate article.

Multicloud factor

A unified FinOps view across Azure and OCI is harder than it sounds. Both clouds export billing data; the schemas differ; the cost categories differ; the tag application behaviour differs.

The pattern that works:

  • Native tools first. Azure Cost Management for Azure, OCI Cost Analysis for OCI. Each cloud’s tooling knows its own data best.
  • Daily exports to a unified store. Azure: cost export to a storage account. OCI: cost report export to Object Storage. Both into a single data lake or warehouse.
  • Unified reporting on top. The FOCUS specification (FinOps Open Cost and Usage Specification) is becoming the lingua franca for cross-cloud billing data. Both Azure and OCI now publish cost data in FOCUS-compatible formats.
  • Tag schema consistent across both. The same cost-center, workload, environment keys with the same values, validated by the platform team. This is what makes cross-cloud chargeback possible.

The trap: trying to do all FinOps reporting in a third-party tool that abstracts both clouds. The abstractions are getting better but they leak. For executive reporting, the abstraction is fine. For investigation (“why is OCI spend up 40% this month”), you go back to OCI Cost Analysis.

One underappreciated reality of multicloud cost unification: the same “service” on two clouds does not produce the same billing unit. Azure meters compute to the minute by subscription and region, decomposing a VM into separate line items for compute, OS disk, network, and managed disk. OCI meters by OCPU-hour at the tenancy-compartment level, with a different resource decomposition. FOCUS normalisation brings the schemas into alignment, but the underlying billing granularity difference means like-for-like comparisons require deliberate data curation, not just schema mapping. For high-accuracy cross-cloud chargeback, budget for the data engineering work — schema normalisation is a weekend project; semantic equivalence is a quarter of engineering.

Closing checklist

  • Communicate the spend reality clearly: there is no universal billing-level hard cap in enterprise commercial contracts. Budgets alert. OCI quotas enforce resource ceilings natively. Azure enforcement requires automation you build.
  • Set budgets at every meaningful scope. Forecasted alert at 80%, actual alert at 100%.
  • Wire alerts to action groups (Azure) or notifications (OCI) that trigger functions for automated remediation. On Azure this is the primary enforcement path; on OCI it is a complement to quota enforcement.
  • For OCI, set compartment quotas as the proactive enforcement mechanism — not just “the closest thing to a hard cap” but the only natively designed deterministic resource ceiling either cloud offers at enterprise scale. They operate at the resource layer, not the billing layer, which makes them synchronous and reliable.
  • Turn on anomaly detection. Catches things budgets miss.
  • Tag for cost attribution from day one. Backfilling is expensive and incomplete.
  • Daily cost summaries to the teams that own the spend. Visibility changes behaviour.
  • Review commitments quarterly. They lock in 1–3 year forecasts; the world moves faster than that.
  • For multicloud, native tools per cloud + unified data lake + FOCUS-compatible exports + consistent tag schema. No magic abstraction layer.

References