Buzen's Algorithm

Queueing theory studies systems where jobs arrive, wait, get served, and depart. A queueing network connects multiple queues - jobs flow from one to the next, like packets through routers or requests through microservices. Analyzing these networks means computing steady-state probabilities: in the long run, what fraction of time does the system spend in each configuration?

Fifty jobs circulating among ten servers can occupy over 12 billion distinct configurations. Yet Buzen’s 1973 algorithm computes the exact probability of every state with only a few hundred operations - a reduction from 1010\sim 10^{10} to 500500. This post derives the algorithm from first principles and implements it.

Queueing Basics

A single queue has jobs arriving at rate λ\lambda (lambda) and a server processing them at rate μ\mu (mu). The ratio ρ=λ/μ\rho = \lambda/\mu is the utilization - the fraction of time the server is busy. For stability, ρ<1\rho < 1; otherwise jobs accumulate faster than they’re served.

The simplest model is the M/M/1 queue: Poisson arrivals (memoryless, hence “M”), exponential service times (also “M”), and one server. In steady state, the probability of having exactly kk jobs in the system is:

P(k)=(1ρ)ρkP(k) = (1-\rho)\rho^k

A geometric distribution - most likely to be empty, exponentially less likely to have more jobs.

Queueing Networks

Real systems chain queues together. A job might visit a CPU queue, then a disk queue, then return to the CPU. The routing matrix R=[rij]R = [r_{ij}] specifies transition probabilities: rijr_{ij} is the probability a job leaving queue ii goes to queue jj.

In open networks, jobs arrive from outside and eventually depart. Jackson (1963) proved that under Poisson arrivals and exponential service, the steady-state distribution has product form - the joint probability factors into independent terms for each queue.

In closed networks, a fixed population of NN jobs circulates forever. No arrivals, no departures - just jobs moving between queues. Gordon and Newell (1967) extended Jackson’s result: closed networks also have product-form solutions, but with a normalization constant that’s expensive to compute.

Notation

SymbolMeaning
NNTotal number of jobs in the closed network
MMNumber of queues (service centers)
μi\mu_iService rate at queue ii
eie_iVisit ratio - relative frequency of visits to queue ii
Xi=ei/μiX_i = e_i/\mu_iRelative load at queue ii (service demand per visit)
kik_iNumber of jobs at queue ii in a given state
G(N)G(N)Normalization constant for NN jobs

The visit ratios eie_i come from solving the traffic equations ej=ieirije_j = \sum_i e_i r_{ij}. They represent how often each queue is visited relative to some reference queue.

The Normalization Problem

For a closed network with NN jobs and MM queues, the steady-state probability of configuration (k1,,kM)(k_1, \ldots, k_M) where iki=N\sum_i k_i = N is:

π(k1,,kM)=1G(N)i=1MXiki\pi(k_1, \ldots, k_M) = \frac{1}{G(N)} \prod_{i=1}^M X_i^{k_i}

The product Xiki\prod X_i^{k_i} captures the relative likelihood of each configuration - states that concentrate jobs at queues with large XiX_i contribute more to the un-normalized sum. The normalization constant G(N)G(N) ensures probabilities sum to 1:

G(N)=kS(N,M)i=1MXikiG(N) = \sum_{\mathbf{k} \in S(N,M)} \prod_{i=1}^M X_i^{k_i}

where S(N,M)S(N,M) is the state space - all ways to distribute NN jobs among MM queues:

S(N,M)={(k1,,kM):ki0,i=1Mki=N}S(N,M) = \{(k_1, \ldots, k_M) : k_i \geq 0, \sum_{i=1}^M k_i = N\}

For N=50N=50 and M=10M=10, the state space contains (599)1.25×1010\binom{59}{9} \approx 1.25 \times 10^{10} configurations. Computing G(N)G(N) by summing over all of them is infeasible.

The Naive Approach

Direct enumeration iterates over every state in S(N,M)S(N,M), computes Xiki\prod X_i^{k_i}, and accumulates the sum. The state space size is the number of ways to place NN indistinguishable jobs into MM queues - a stars-and-bars combinatorial problem:

S(N,M)=(N+M1M1)|S(N,M)| = \binom{N + M - 1}{M - 1}

NetworkStatesTime
N=10N=10, M=5M=51001103\sim 10^3
N=50N=50, M=5M=53162513×105\sim 3 \times 10^5
N=20N=20, M=10M=10107\sim 10^7107\sim 10^7
N=50N=50, M=10M=101.26×1010\sim 1.26 \times 10^{10}1010\sim 10^{10}

A network of 10 queues and 50 jobs requires iterating over tens of billions of states. Even modern hardware chokes on this.

The structure of the sum suggests redundancy. Each term iXiki\prod_i X_i^{k_i} shares factors with many others. The sum is essentially an MM-fold convolution - a hint that dynamic programming might help.

Recursive Decomposition

Buzen recognized that G(N)G(N) can be computed recursively by adding one queue at a time.

Define g(n,m)g(n, m) as the normalization constant for nn jobs distributed among the first mm queues:

g(n,m)=k:i=1mki=ni=1mXikig(n, m) = \sum_{\mathbf{k}: \sum_{i=1}^m k_i = n} \prod_{i=1}^m X_i^{k_i}

By definition, g(N,M)=G(N)g(N, M) = G(N).

Now partition the states based on queue mm‘s occupancy. Either queue mm has zero jobs, or it has at least one:

  • Zero at queue mm: Queue mm contributes Xm0=1X_m^0 = 1. The remaining nn jobs are distributed among queues 11 through m1m-1. This contributes g(n,m1)g(n, m-1).

  • At least one at queue mm: Factor out XmX_m for one job at queue mm. The remaining n1n-1 jobs (including any additional ones at queue mm) distribute among all mm queues. This contributes Xmg(n1,m)X_m \cdot g(n-1, m).

The recurrence:

g(n,m)=g(n,m1)+Xmg(n1,m)g(n, m) = g(n, m-1) + X_m \cdot g(n-1, m)

Base cases:

  • g(0,m)=1g(0, m) = 1 for all mm (one way to place zero jobs - all queues empty)
  • g(n,0)=0g(n, 0) = 0 for n>0n > 0 (no queues means no way to place jobs)

This recurrence computes g(N,M)g(N, M) using only O(NM)O(NM) operations instead of enumerating all (N+M1M1)\binom{N+M-1}{M-1} states - a count that grows as O(NM1)O(N^{M-1}) for fixed MM.

Why Dynamic Programming is Optimal

The recurrence has optimal substructure: g(n,m)g(n,m) depends only on g(n1,m)g(n-1,m) and g(n,m1)g(n,m-1). Without memoization, we’d recompute values exponentially.

An alternative formulation suggests itself. The sum G(N)G(N) is the NN-th coefficient of the generating function:

i=1M11Xiz=n=0G(n)zn\prod_{i=1}^M \frac{1}{1-X_i z} = \sum_{n=0}^\infty G(n) z^n

Using FFT-based polynomial multiplication, we could compute this product in O(MNlogN)O(MN \log N) time. But Buzen’s recurrence achieves O(MN)O(MN) by exploiting the special structure - each factor is a geometric series, so the convolution simplifies to a single pass.

The O(MN)O(MN) bound is tight for this recurrence. We must fill an N×MN \times M table, each entry requiring O(1)O(1) work.

Implementation

The algorithm maintains a single array g[0N]g[0 \ldots N] representing the current “column” g(,m)g(\cdot, m). Each iteration incorporates one more queue:

def buzen(X: list[float], N: int) -> list[float]:
    """Compute normalization constants G(0), G(1), ..., G(N)"""
    g = [0.0] * (N + 1)
    g[0] = 1.0

    for x in X:
        for n in range(1, N + 1):
            g[n] += x * g[n - 1]

    return g

That’s it. After processing all MM queues, g[n] contains G(n)G(n) for each nn from 00 to NN.

Walkthrough

For M=2M=2 queues with X1=2X_1=2, X2=3X_2=3, computing G(3)G(3):

Initialize: g=[1,0,0,0]g = [1, 0, 0, 0]

Process queue 1 (X1=2X_1 = 2):

  • g[1]=0+21=2g[1] = 0 + 2 \cdot 1 = 2
  • g[2]=0+22=4g[2] = 0 + 2 \cdot 2 = 4
  • g[3]=0+24=8g[3] = 0 + 2 \cdot 4 = 8

After: g=[1,2,4,8]g = [1, 2, 4, 8] which matches [20,21,22,23][2^0, 2^1, 2^2, 2^3].

Process queue 2 (X2=3X_2 = 3):

  • g[1]=2+31=5g[1] = 2 + 3 \cdot 1 = 5
  • g[2]=4+35=19g[2] = 4 + 3 \cdot 5 = 19
  • g[3]=8+319=65g[3] = 8 + 3 \cdot 19 = 65

Final: g=[1,5,19,65]g = [1, 5, 19, 65]

Verify directly: G(3)=k1+k2=32k13k2=2033+2132+2231+2330=27+18+12+8=65G(3) = \sum_{k_1 + k_2 = 3} 2^{k_1} 3^{k_2} = 2^0 3^3 + 2^1 3^2 + 2^2 3^1 + 2^3 3^0 = 27 + 18 + 12 + 8 = 65

Computing Performance Metrics

The real payoff comes from using G(n)G(n) values to extract performance metrics without revisiting the state space.

Marginal Queue Length Distribution

The probability that queue ii has at least kk jobs:

P(nik)=XikG(Nk)G(N)P(n_i \geq k) = X_i^k \cdot \frac{G(N-k)}{G(N)}

Intuition: fix kk jobs at queue ii (contributing XikX_i^k), distribute the remaining NkN-k arbitrarily (summing to G(Nk)G(N-k)), normalize by G(N)G(N).

def prob_at_least(g: list[float], X: list[float], i: int, k: int) -> float:
    """P(queue i has >= k jobs)"""
    N = len(g) - 1
    if k > N:
        return 0.0
    return (X[i] ** k) * g[N - k] / g[N]

Expected Queue Length

The expected number of jobs at queue ii:

E[ni]=k=1NP(nik)=k=1NXikG(Nk)G(N)E[n_i] = \sum_{k=1}^{N} P(n_i \geq k) = \sum_{k=1}^{N} X_i^k \cdot \frac{G(N-k)}{G(N)}

def expected_queue_length(g: list[float], X: list[float], i: int) -> float:
    """E[jobs at queue i]"""
    N = len(g) - 1
    total = 0.0
    x_power = X[i]
    for k in range(1, N + 1):
        total += x_power * g[N - k]
        x_power *= X[i]
    return total / g[N]

Throughput

The throughput (completion rate) at queue ii follows from the arrival theorem:

λi=eiG(N1)G(N)\lambda_i = e_i \cdot \frac{G(N-1)}{G(N)}

where eie_i is the visit ratio. The ratio G(N1)/G(N)G(N-1)/G(N) acts as a system-wide throughput factor.

def throughput(g: list[float], e: list[float]) -> list[float]:
    """Throughput at each queue"""
    N = len(g) - 1
    ratio = g[N - 1] / g[N]
    return [e_i * ratio for e_i in e]

All these metrics come essentially free once we have the GG array - no state enumeration required.

Numerical Considerations

For large NN or extreme XiX_i values, G(N)G(N) can overflow or underflow floating-point. Two approaches:

Scaling: Periodically divide all entries by g[N]g[N] during computation. This keeps values bounded but requires care when computing ratios.

Log-domain: Work with logG(n)\log G(n) instead. The recurrence becomes:

logg(n,m)=log(elogg(n,m1)+elogXm+logg(n1,m))\log g(n,m) = \log\left(e^{\log g(n,m-1)} + e^{\log X_m + \log g(n-1,m)}\right)

Use the log-sum-exp trick for numerical stability. This adds overhead but handles extreme cases.

For most practical networks (moderate NN, reasonable utilizations), standard double precision suffices.

Complexity Comparison

ApproachTimeSpace
Naive enumerationS(N,M)=O(NM1)\lvert S(N,M) \rvert = O(N^{M-1})O(1)O(1)
Buzen’s algorithmO(NM)O(NM)O(N)O(N)
FFT convolutionO(NMlogN)O(NM \log N)O(N)O(N)

For N=50N=50, M=10M=10: naive requires 1010\sim 10^{10} operations, Buzen requires 500500.

The algorithm also produces all intermediate values G(1),,G(N1)G(1), \ldots, G(N-1) as a byproduct - exactly what we need for performance metrics.

Historical Note

Buzen developed this algorithm during his PhD at Harvard (1971) while studying computer system performance. At the time, Gordon-Newell networks were known to have elegant theoretical properties, but practitioners couldn’t compute anything with them - the state spaces were too large.

Buzen’s convolution algorithm made closed queueing network analysis practical. It became a foundation of computer performance modeling throughout the 1970s and 1980s, enabling capacity planning for mainframes and early distributed systems.

The same year (1973), Buzen founded BGS Systems, one of the first companies focused on computer performance analysis software - built on these algorithms.

Conclusion

Buzen’s algorithm reduces a sum over billions of states to a few hundred multiply-adds. The trick is recognizing that the normalization constant G(N)G(N) decomposes into sequential convolutions - add one queue at a time, and the combinatorial explosion collapses into a simple recurrence.

The algorithm exemplifies a broader pattern: when a massive sum has product structure, dynamic programming often finds a shortcut. Here, the product-form solution of Gordon-Newell networks provides exactly that structure. What looks like an intractable enumeration problem becomes O(NM)O(NM) arithmetic.

For practitioners in the 1970s, this was transformative. Closed queueing networks had elegant theory but were computationally useless until Buzen’s algorithm made them practical. The same ideas - convolution, dynamic programming, exploiting algebraic structure - appear throughout performance modeling, from Mean Value Analysis to modern cloud capacity planning.

Further Reading

References

  • Buzen, J. P. (1973). Computational algorithms for closed queueing networks with exponential servers. Communications of the ACM, 16(9), 527-531.
  • Gordon, W. J., & Newell, G. F. (1967). Closed queuing systems with exponential servers. Operations Research, 15(2), 254-265.
  • Jackson, J. R. (1963). Jobshop-like queueing systems. Management Science, 10(1), 131-142.
  • Reiser, M., & Lavenberg, S. (1980). Mean value analysis of closed multichain queueing networks. Journal of the ACM, 27(2), 313-322.