Wednesday, April 22, 2026

Chargeback and Showback in Azure: Building a Cost Allocation Model

As organisations mature their FinOps practice on Azure, cost visibility alone is no longer sufficient. Finance teams need to allocate cloud costs to the correct business units, and engineering teams need to understand the financial impact of their architectural decisions. This is where chargeback and showback models become essential.

This post covers the difference between the two models and provides a practical approach to implementing cost allocation in Azure.

1. Showback vs Chargeback

Both models serve the same purpose of attributing cloud costs to the teams or business units that generate them, but they differ in consequence:

  • Showback: teams are shown their costs for awareness and accountability, but the costs are not transferred to their budget. This is the appropriate starting point for most organisations.
  • Chargeback: costs are formally allocated and transferred to the consuming team's budget. This requires financial systems integration and strong tagging discipline before it is viable.

I recommend starting with showback for at least one full quarter before introducing chargeback. Showback surfaces tagging gaps and data quality issues that would cause chargeback disputes if unaddressed.

2. Prerequisites: Tagging Strategy

A cost allocation model is only as accurate as the tagging on the resources being measured. Before building any reports, confirm the following tag coverage:

Navigate to Azure Policy > Compliance and filter for any tag-related policy assignments. Look for non-compliant resources and remediate them before proceeding.

The minimum required tags for a cost allocation model are:

TagPurpose
cost-centerFinance reference code for the owning business unit
teamEngineering team responsible for the resource
environmentSeparates operational (prod) from non-operational (stagingdev) costs
workloadThe product or service the resource supports

3. Using Azure Cost Allocation Rules

Azure Cost Management supports Cost Allocation Rules, which allow shared costs (such as a shared networking subscription, a centralised Log Analytics workspace, or a shared API Management instance) to be split and attributed to consuming subscriptions or resource groups.

  1. Navigate to Cost Management + Billing > Cost Management > Cost allocation (preview)
  2. Select + Add
  3. Under Source, select the subscription or resource group containing the shared cost
  4. Under Targets, define the allocation split, either by fixed percentage or proportional to each target's existing spend
  5. Select Save

Following is an example allocation scenario:

Shared ResourceTotal Monthly CostAllocated ToSplit
Hub VNet + Firewall$1,200Production (70%), Non-Prod (30%)Fixed
Centralised Log Analytics$800By each team's ingestion volumeProportional

4. Exporting Chargeback Data for Finance Systems

Once allocation rules are configured, the resulting cost data can be exported for integration with finance systems.

Navigate to Cost Management > Exports with the subscription or management group as scope. Create a monthly scheduled export that includes the allocated cost data. The exported CSV includes the allocation split fields, enabling downstream processing to attribute costs to the correct cost centre.

For organisations using Power BI, the Azure Cost Management connector in Power BI Desktop connects directly to the Cost Management API and reflects cost allocation rules in real time.

5. Communicating Results to Stakeholders

The final step is delivering the showback or chargeback report to the relevant teams. Following is a practical distribution approach:

  • Engineering teams: monthly cost summary by resource group, shared via a Power BI report or a Teams message generated by a Logic App triggered on export delivery
  • Finance: monthly CSV export delivered to a shared storage account, consumed by the existing financial reporting process
  • Leadership: a quarterly Workbook summary at management group scope showing total cloud spend by business unit and trend over time

Summary

A robust cost allocation model on Azure requires clean tagging, cost allocation rules for shared resources, and a consistent export and distribution process. Starting with showback builds the data quality and organisational habits needed to make chargeback viable, ensuring there are no disputes when costs are formally transferred to business unit budgets.

Thursday, March 26, 2026

Azure Functions Durable Orchestrations: Chaining, Fan-Out, and Human Approval Patterns

The Durable Functions extension for Azure Functions enables stateful, long-running workflows without managing any state infrastructure. Three patterns cover the majority of production use cases: sequential chaining (execute steps in order, passing outputs forward), fan-out/fan-in (run multiple steps in parallel and aggregate results), and external event waits (pause a workflow until a human or external system sends a signal).

This post covers how to implement each pattern and how to monitor running orchestrations in production.

1. Setting Up the Durable Functions Extension

Durable Functions requires the Microsoft.Azure.WebJobs.Extensions.DurableTask NuGet package. Add it to your Functions project:

dotnet add package Microsoft.Azure.WebJobs.Extensions.DurableTask

Three function types work together:

TypeRole
OrchestratorDefines the workflow logic; must be deterministic and replay-safe
ActivityExecutes a single unit of work; can call external APIs, write to databases, send messages
ClientStarts, queries, or terminates orchestration instances; typically HTTP or timer-triggered

Orchestrator functions have a strict constraint: they must be deterministic and replay-safe. This means no direct I/O calls, no DateTime.UtcNow, no random values, and no awaiting anything outside of context.CallActivityAsync. All side effects must be delegated to activity functions. Violating this constraint produces non-deterministic replay errors that are difficult to diagnose.

2. Function Chaining: Sequential Step Execution

Chaining executes a series of activity functions in a defined order, where each step receives the output of the previous one. This is the right pattern for multi-step processes where each step depends on the result of the last.

Following is a C# orchestrator that chains three order processing steps:

[FunctionName("ProcessOrderOrchestrator")]
public static async Task<string> RunOrchestrator(
    [OrchestrationTrigger] IDurableOrchestrationContext context)
{
    var orderId = context.GetInput<string>();

    var isValid = await context.CallActivityAsync<bool>("ValidateOrder", orderId);
    if (!isValid)
        return "Validation failed — order rejected.";

    var reservationId = await context.CallActivityAsync<string>("ReserveInventory", orderId);
    var confirmationId = await context.CallActivityAsync<string>("ConfirmPayment", reservationId);

    return confirmationId;
}

If any activity throws an unhandled exception, the orchestration moves to a failed state. Add retry logic with CallActivityWithRetryAsync for steps that may encounter transient failures:

var retryOptions = new RetryOptions(
    firstRetryInterval: TimeSpan.FromSeconds(5),
    maxNumberOfAttempts: 3
);

var reservationId = await context.CallActivityWithRetryAsync<string>(
    "ReserveInventory", retryOptions, orderId);

3. Fan-Out/Fan-In: Parallel Processing with Aggregation

Fan-out/fan-in runs multiple activity function calls in parallel and waits for all of them to complete before aggregating the results. This pattern is appropriate when you have a collection of independent work items that do not depend on each other's output.

Following is an orchestrator that processes a batch of items in parallel:

[FunctionName("BatchProcessOrchestrator")]
public static async Task<List<string>> RunOrchestrator(
    [OrchestrationTrigger] IDurableOrchestrationContext context)
{
    var items = context.GetInput<List<string>>();

    var tasks = items
        .Select(item => context.CallActivityAsync<string>("ProcessItem", item))
        .ToList();

    var results = await Task.WhenAll(tasks);
    return results.ToList();
}

For large batches, process in chunks to avoid overwhelming downstream dependencies:

const int chunkSize = 20;
var allResults = new List<string>();

foreach (var chunk in items.Chunk(chunkSize))
{
    var tasks = chunk
        .Select(item => context.CallActivityAsync<string>("ProcessItem", item))
        .ToList();

    var chunkResults = await Task.WhenAll(tasks);
    allResults.AddRange(chunkResults);
}

Fan-out parallelism is bounded by the activity concurrency limit of the Function App host, configurable via maxConcurrentActivityFunctions in host.json. The default is 10 concurrent activities per host instance.

4. Human Approval with WaitForExternalEvent

The external event pattern pauses an orchestration at a defined point until an external signal is received. The most common use case is a human approval gate: the orchestration sends a notification, then waits for a response before continuing. A configurable timeout handles the case where no response arrives.

Following is an orchestrator that implements an approval step with a 24-hour timeout:

[FunctionName("ApprovalOrchestrator")]
public static async Task<string> RunOrchestrator(
    [OrchestrationTrigger] IDurableOrchestrationContext context)
{
    var request = context.GetInput<ApprovalRequest>();

    // Send the approval notification (activity function sends the email)
    await context.CallActivityAsync("SendApprovalNotification", new
    {
        ApproverEmail = request.ApproverEmail,
        InstanceId = context.InstanceId,
        Details = request.Summary
    });

    using var cts = new CancellationTokenSource();
    var approvalTask = context.WaitForExternalEvent<bool>("ApprovalDecision");
    var timeoutTask = context.CreateTimer(
        context.CurrentUtcDateTime.AddHours(24), cts.Token);

    var winner = await Task.WhenAny(approvalTask, timeoutTask);

    if (winner == approvalTask)
    {
        cts.Cancel();
        return approvalTask.Result ? "Approved" : "Rejected";
    }

    return "Timed out — escalated to manager.";
}

The notification email contains a link to an HTTP-triggered function that raises the event:

[FunctionName("RecordApprovalDecision")]
public static async Task<IActionResult> RecordDecision(
    [HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequest req,
    [DurableClient] IDurableOrchestrationClient client)
{
    string instanceId = req.Query["instanceId"];
    bool approved = bool.Parse(req.Query["approved"]);

    await client.RaiseEventAsync(instanceId, "ApprovalDecision", approved);
    return new OkObjectResult("Decision recorded.");
}

5. Monitoring Orchestrations

Durable Functions stores orchestration state in Azure Storage (table and blob storage) by default. The Durable Functions HTTP management API provides endpoints to query any running or completed instance:

GET /runtime/webhooks/durabletask/instances/{instanceId}

In the portal:

  1. Navigate to Function App > Functions > [orchestrator function name]
  2. Select Monitor
  3. The invocation list shows each instance with its current status: RunningCompletedFailed, or Terminated

For production observability, enable Application Insights on the Function App. Durable Functions emits structured traces for each activity call, timer, and external event. Following is a KQL query that lists failed orchestration instances in the past 24 hours:

traces
| where timestamp > ago(24h)
| where customDimensions["Category"] == "DurableTask.AzureStorage"
| where message contains "failed"
| project
    timestamp,
    instanceId = tostring(customDimensions["InstanceId"]),
    message
| order by timestamp desc

Summary

Durable Functions handles the state management, retry mechanics, and replay infrastructure that make long-running workflows reliable at scale. Chaining covers ordered multi-step processes; fan-out/fan-in handles parallel workloads with aggregation; and the external event pattern enables human-in-the-loop workflows with built-in timeouts. Application Insights integration provides the instance-level visibility needed to diagnose failures in production without querying raw storage tables. Together, these three patterns handle the majority of workflow requirements without introducing an external workflow engine.