Smaller Classes with Fewer Teachers

Teacher Misallocation in South African Primary Schools

Stellenbosch University & Tinbergen Institute

2026-02-02

Introduction

The Class Size Puzzle

Student-Teacher Ratio = 32.5

Experienced Class Size = 43.3


Teachers are employed but not fully deployed.

Decomposing the Gap: STR ≠ Class Size

\[\text{STR} = \frac{\text{SCR}}{\text{TCR}}\]

Metric Definition National Average
STR Student-Teacher Ratio 32.5
SCR Student-Class Ratio 39.3
TCR Teacher-Class Ratio 1.22


TCR = 1.22 → Teachers timetabled for only 82% of periods

Same STR of 30 can mean classes of 30 (TCR=1) or classes of 60 (TCR=2).

Scheduled vs Unscheduled Absence

Key distinction: Not timetabled ≠ Absent when timetabled

World Bank SDI conflates these:

\[\text{Class Absence} = 1 - \frac{\text{Teachers in Classrooms}}{\text{Total Employed}}\]

  • 44% of SSA teachers “absent” from class (Bold et al. 2017)
  • Of these, 48% are at school but not timetabled


Policy implication: Administrative timetabling reforms ≠ behavioural accountability interventions.

Literature: Four Gaps

1. Misallocation

Focuses on between-school variation (Walter 2020).

My finding: Within-school slack > spatial gains.

3. Input Elasticity

Why hiring fails to improve learning (Hanushek 1986).

My contribution: Hiring → absorbed into slack.

2. Teacher Utilization

TCR “remarkably understudied” (Bennell 2022).

Gap: No rigorous large-scale primary analysis (Tanzania TCR = 2.5).

4. Management

Focuses on talent management (Bloom et al. 2015).

Gap: Overlooks extensive margin of deployment.

Data

  • 5.4 million learners in 9,512 public primary schools
  • 2023 cross-section; 2018–2023 panel
  • Final sample: ≈78% of all public primary learners

Novel feature: Reconstructed class-level timetables from learner-class assignments in administrative data.

Direct measurement of teacher deployment—not survey self-reports.

Attrition details 🔎

Measurement Framework

Optimization Framework: Allocative Efficiency

Scope: Optimality is defined here strictly as resource distribution, holding pedagogical models constant.


  1. The Primal Objective (Policy Preference Maximisation) Minimise Experienced Class Size given fixed staff establishment.


  1. The Dual Objective (Cost Rationalisation) Minimise Staff Establishment required to meet a specific class size target.

The Measurement Innovation

Core idea: Use constrained optimisation to quantify efficiency gaps.

For each school, I solve:

\[\min_{\{k_g\}} \text{ECS} = \min_{\{k_g\}} \frac{\sum_g n_g^2 / k_g}{\sum_g n_g}\]

subject to progressively relaxed constraints on \(\sum_g k_g\).

The gap between observed and optimal = measurable inefficiency.

By comparing optima under different constraints, I decompose where inefficiency lies:

  • Technical (timetabling) vs Allocative (capital vs labour) vs Spatial

The Allocation Algorithm

Objective: Minimise \(\sum_g n_g^2/k_g\) (equivalent to minimising ECS)

Method: Greedy algorithm with max-priority queue—provably optimal (convex objective)

Algorithm:

  1. Assign one class per non-empty grade
  2. Calculate marginal gain of adding one class to each grade: \(\Delta = \frac{n^2}{k(k+1)}\)
  3. Add class to grade with highest gain
  4. Repeat until all classes allocated

Shadow Price:

After allocation, the shadow price = marginal gain of the next class.

This tells us the value of an additional teacher:

  • High shadow price → binding constraint
  • Low shadow price → diminishing returns

Shadow prices reveal which schools would benefit most from expansion.

Optimisation as Measurement

I solve for minimum ECS under nested constraints to quantify efficiency gaps.

Scenario Constraint What it measures Example 🔎
S0 Status quo Baseline S0
S1–S2 Balance classes Technical efficiency S1, S2
S3 Bind by classrooms Capital slack S3
S4 Bind by teachers Labour slack S4
S5–S6 Pool across space Spatial misallocation S5, S6


“Optimal” = minimum class size for given teachers (or equivalently, minimum teachers for given class size).

Gaps in Efficiency Could Close Gaps in Equity

Systematic Patterns in Class Size and Inefficiency

Main Result: Where the Inefficiency Lies

Key Findings: Descriptive

  • S4 (full teacher utilisation): −18.8% ECS. Core inefficiency.
  • S5 (district pooling): Most spatial gains captured locally.
  • S6 (province pooling): Negligible additional benefit (+0.4 pp).

Causal Identification

The Identification Question

Question: Does exogenously providing a school with additional teachers lead to smaller classes or increased slack?

Three sources of bias in OLS:

  1. Reverse causality: Inefficient schools receive compensatory staffing
  2. Selection: Management quality affects both staffing and deployment
  3. Omitted variables: Neighbourhood effects, principal quality, union strength


Solution: Instrument STR with bureaucratic teacher allocation rules (Post Provisioning Norm).

Demographic shock elasticities 🔎

Instrument: Post Provisioning Norm (PPN)

PPN formula: f(enrolment, grade-mix, quintile, provincial norms)

Grade Max Class Weight
R 35 0
1–4 35 1.190
5–6 40 1.042
7 37 1.126

Identification: Within-school grade composition changes trigger mechanical allocation adjustments.

  • Schools don’t control cohort composition
  • School FE + enrolment controls absorb time-invariant quality

Specification

\[ \begin{aligned} \text{First stage:} \quad & \Delta \text{STR}_{\text{actual}} = \alpha + \beta \cdot \Delta \overline{\text{PPN-Weight}}_{\text{leaver}} + \delta_t + \varepsilon \\[0.5em] \text{Second stage:} \quad & \Delta \text{Inefficiency} = \alpha + \delta \cdot \Delta \widehat{\text{STR}} + \delta_t + \varepsilon \end{aligned} \]

First-differences specification. School-level clustering. N = 28,703 school-years.

Identifying assumptions:

  1. Relevance: PPN strongly predicts actual STR (F > 60)
  2. Exclusion: PPN affects inefficiency only through STR
  3. Exogeneity: Grade-mix changes ⊥ time-varying management quality

IV Results

First-Differences
2SLS Coefficient (δ) −4.5*
Standard Error 0.111
First-stage F-statistic 60.4
N (school-years) 28,703


Interpretation: Exogenously gaining teachers → increased slack, not smaller classes.

Schools absorb additional teachers into lighter workloads. Why? 🔎

Robustness

Instrument validity: ✓

  • Balance on 15 pre-determined covariates
  • No pre-trends (future shocks don’t predict past outcomes)
  • F > 30 across all specifications

Specification robustness: ✓

  • 7 fixed effects combinations
  • 4 clustering schemes
  • 5 functional forms

Heterogeneity: Effect stronger in small schools, low-STR contexts

Full robustness table 🔎

Implications

But Isn’t That Slack Needed for Preparation?

The professional day extends beyond instruction

Component Description
Formal school day 7 hours at school (PAM requirement)
Instructional periods ~6 periods × 45 min = ~4.5 hours
Breaks Recess + lunch = ~1 hour (rest, not prep)
Within-day non-contact ~1.5 hours (admin, assembly, free time)
Informal school day ~400 hours/year = ~2 hours/day explicitly for preparation


Key point: The “informal school day” is mandated time outside formal hours for planning and marking.

Even at TCR = 1.0, teachers have ~3.5 hours daily for preparation (1.5h in-day + 2h informal).

Current TCR = 1.22 gives additional free periods during instruction—beyond what’s needed.

Why would schools choose this? 🔎

Fiscal Leakage

  • R22.3bn annually: Cost of S4 teacher under-utilisation (18.2% of time)
  • R29.6bn annually: Including spatial misallocation
  • Exceeds combined spending on school nutrition + learning materials

Assumptions 🔎

Policy Implications

1. Enforce contact-time norms

  • −18.8% ECS or free ≈18% educator capacity (≈R22bn)

2. Activate idle classrooms

  • −10.2% ECS from capital under-utilisation

3. District-level pooling

  • Shift hiring from school to district level
  • Captures spatial gains (−24.2% cumulative) without provincial disruption


Key insight: The problem is deployment incentives, not fiscal constraints.

Summary

1. Measurement innovation: Constrained optimisation to quantify efficiency gaps under nested constraints—technical, allocative, spatial.

2. Descriptive finding: Within-school labour slack (−18.8%) > spatial misallocation (−5.8 pp).

3. Causal evidence: Exogenously adding teachers increases slack, not reduces class sizes (δ = −4.5).

4. Policy implication: R22bn annually—deployment reform, not fiscal expansion.

References

Bennell, Paul. 2022. “Teaching Too Little to Too Many: Teaching Loads and Class Size in Secondary Schools in Sub-Saharan Africa.” International Journal of Educational Development 94 (October): 102651. https://doi.org/10.1016/j.ijedudev.2022.102651.
Bloom, Nicholas, Renata Lemos, Raffaella Sadun, and John Van Reenen. 2015. “Does Management Matter in Schools?” The Economic Journal 125 (584): 647–74. https://doi.org/10.1111/ecoj.12267.
Bold, Tessa, Deon Filmer, Gayle Martin, Ezequiel Molina, Christophe Rockmore, Brian Stacy, Jakob Svensson, and Waly Wane. 2017. What Do Teachers Know and Do? Does It Matter? Evidence from Primary Schools in Africa. World Bank, Washington, DC. https://doi.org/10.1596/1813-9450-7956.
Hanushek, Eric A. 1986. “The Economics of Schooling: Production and Efficiency in Public Schools.” Journal of Economic Literature 24 (3): 1141–77. https://www.jstor.org/stable/2725865.
Walter, Torsten Figueiredo. 2020. “Misallocation in the Public Sector? Cross-Country Evidence from Two Million Primary Schools.” Unpublished.

Appendix

Sample Attrition

  • Final coverage: ≈5.4 million learners (78%), 9,512 schools
  • Largest exclusion: Multigrade schools (≈12%)
  • External validity concern: Multigrade schools may face distinct challenges

↩︎️ Back

Fiscal Assumptions

  • Baseline: Mean educator salary R42,000/month; 12 months; S4 inefficiency 18.2%
  • Sensitivity: ±10% salary → R20.0–24.5bn range
  • Coverage: Primary only (R22.3bn); secondary extrapolation ≈R62bn total
  • As % of education budget: 7.6–10.1%

Comparison: R22.3bn > National School Nutrition (≈R8bn) + LTSM (≈R12bn)

↩︎️ Back

IV Robustness: Full Table

# Category Test Description
1 Instrument Validity Balance on observables Instrument orthogonal to 15 pre-determined covariates
2 Instrument Validity Falsification tests Future PPN shocks don't predict past outcomes
3 Instrument Validity Weak instrument robust CI Anderson-Rubin and CLR confidence sets
4 Instrument Validity Reduced form on placebos No effect on time-invariant school characteristics
5 Instrument Validity Exclusion restriction probes Control for PPN sub-components
6 Instrument Validity Overidentification tests Grade-specific instruments yield consistent estimates
7 Instrument Validity First-stage heterogeneity Strong F-statistics across districts and time periods
8 Specification Robustness Alternative fixed effects None, Year, District, School, Province×Year, District×Year
9 Specification Robustness Alternative clustering School, District, Two-way (School+Year), Two-way (District+Year)
10 Specification Robustness Subsample stability Balanced panel, exclude outliers, by quintile, leave-one-province-out
11 Specification Robustness Alternative functional forms Level-level, log-log, level-log, log-level, quadratic
12 Specification Robustness Asymmetry tests STR increases vs decreases (gains vs losses)
13 Specification Robustness Specification ladder Progressive inclusion of controls and FE
14 Specification Robustness Control function approach Alternative estimator using residuals as control
15 Heterogeneity Analysis Effect modification By school size, initial STR, baseline class size, quintile, urban/rural
16 Mediation Analysis Class count as mediator Tests whether STR affects inefficiency through class formation
17 Attrition & Sample Selection Sample attrition tests Validates representativeness of final sample (78% coverage)

↩︎️ Back

Demographic Shock Elasticities

Question: Are high TCRs forced by constraints or discretionary?

Test logic: If constraints bind → shocks raise inefficiency. If slack is discretionary → shocks reveal latent capacity.

  • β₁ (Grade-Mix) = −0.381*** (10 pp shift → 3.81 pp inefficiency reduction)
  • β₂ (Phase-Mix) = −0.256*** (teachers reallocated across phase boundaries)
  • All coefficients negative → adaptive capacity exists

↩︎️ Alternative explanations 🔎

Alternative Explanations Ruled Out

17 mechanisms investigated. Five structural rigidities predict β > 0 if binding:

Mechanism Prediction Finding
Grade-specific match capital β₁ > 0 Rejected: β₁ = −0.381
Qualification constraints β₂ > 0 Rejected: β₂ = −0.256
Period indivisibilities β > 0 Rejected: both < 0
Cross-phase synchronisation β₂ > 0 Rejected: β₂ = −0.256
Enrolment volatility buffering β₃ > 0 Rejected: β₃ = −0.007

Consistent with: Policy equilibria where slack is discretionary.

Caveat: Standby rosters create endogeneity—estimates may be lower bounds.

Equity Implications

  • Baseline gap: Q1–Q2 classes 32% larger than Q5
  • Paradox: Wealthier schools show largest efficiency gains (−20.2%)—hold more slack
  • Implication: Efficiency reforms reduce absolute crowding but preserve relative inequality

Racial Inequality

  • Baseline gap: Black learners 29% larger classes than white (44.3 vs 34.4)
  • Post-optimisation: Gap persists (33.4 vs 25.8)
  • White-learner schools hold more idle capacity

Why Would Schools Choose Inefficiency?

Core mechanism: Teachers value free periods above smaller classes.

  • Rigid wage schedules: Non-pecuniary compensation (lighter loads) becomes primary margin
  • Union bargaining: Workload norms weakly enforced
  • Measurement vacuum: What isn’t measured can’t be managed
  • Rational precaution: Standby rosters for unscheduled absence create endogenous slack

Political economy: TCR represents negotiated equilibrium—teachers capture rents through reduced contact time rather than higher wages.

International Comparisons

Country TCR Utilisation
Tanzania 2.5 40%
Ethiopia ≈1.4 ≈71%
South Africa 1.22 82%
  • Vietnam: Explicitly targets TCR > 1 for extended contact hours
  • SSA systems without timetabling software may face both technical and allocative inefficiency

Implication: South Africa’s problem is utilisation (S4), not timetabling (S1–S2).

Algebra Reference

Core Identity: STR = SCR / TCR

  • STR = Learners / Teachers
  • SCR = Learners / Classes
  • TCR = Teachers / Classes

Experienced Class Size (ECS): \[\text{ECS} = \frac{\sum_i \text{CS}_i^2}{\sum_i \text{CS}_i} = \overline{\text{CS}} + \frac{\sigma^2_{\text{CS}}}{\overline{\text{CS}}}\]

Example: Classes of {50, 10} average 30, but ECS = 43.3 (more learners experience the larger class).

Provincial Scenarios

  • Limpopo & Eastern Cape: High capital slack (idle classrooms)
  • Limpopo & Mpumalanga: High labour slack (idle teachers)
  • Provinces need tailored interventions

Representative School: Status Quo (S0)

Baseline timetable at 82nd percentile reducibility. School: 1,220 learners, 27 classes, 7 grades, ECS 47.8.

Within-grade dispersion modest. Between-grade imbalance small. Key: 27 classes but more teachers available.

↩︎️ Back to scenarios

Within-Grade Equalisation (S1)

↩︎️ Back to scenarios

Between-Grade Allocation (S2)

↩︎️ Back to scenarios

Classroom-Constrained Allocation (S3)

↩︎️ Back to scenarios

Educator-Optimal Allocation (S4)

↩︎️ Back to scenarios

District-Level Pooling (S5)

↩︎️ Back to scenarios

Province-Level Pooling (S6)

↩︎️ Back to scenarios

Variance Decomposition: What Explains Crowding?

  • S4 reducibility: 33.5% of explained variance (R² = 52.3%)
  • Traditional factors small: Province 6.3%, Quintile 6.4%, Race 5.5%
  • Latent capacity > STR in explaining crowding

Secondary School Extension

  • Primary TCR ≈ 1.22 (generalist model). Secondary TCR likely higher:
    • Subject specialisation mandated (≈15+ subjects)
    • Period indivisibilities bind harder
    • Teacher qualifications phase-specific
  • Preliminary estimates: If secondary TCR ≈ 1.5, fiscal leakage ≈R40bn additional (total ≈R62bn)
  • Data challenge: LURITS lacks subject identifiers for secondary

Policy: Subject rationalisation; pool teachers at district level for rare subjects.