Smaller Classes with Fewer Teachers

Teacher Misallocation in South African Primary Schools

Peter Courtney

Stellenbosch University & Tinbergen Institute

2026-02-02

Introduction

The Class Size Puzzle

Student-Teacher Ratio = 32.5

Experienced Class Size = 43.3

Teachers are employed but not fully deployed.

Decomposing the Gap: STR ≠ Class Size

\[\text{STR} = \frac{\text{SCR}}{\text{TCR}}\]

Metric	Definition	National Average
STR	Student-Teacher Ratio	32.5
SCR	Student-Class Ratio	39.3
TCR	Teacher-Class Ratio	1.22

TCR = 1.22 → Teachers timetabled for only 82% of periods

Same STR of 30 can mean classes of 30 (TCR=1) or classes of 60 (TCR=2).

Scheduled vs Unscheduled Absence

Key distinction: Not timetabled ≠ Absent when timetabled

World Bank SDI conflates these:

\[\text{Class Absence} = 1 - \frac{\text{Teachers in Classrooms}}{\text{Total Employed}}\]

44% of SSA teachers “absent” from class (Bold et al. 2017)
Of these, 48% are at school but not timetabled

Policy implication: Administrative timetabling reforms ≠ behavioural accountability interventions.

Literature: Four Gaps

1. Misallocation

Focuses on between-school variation (Walter 2020).

My finding: Within-school slack > spatial gains.

3. Input Elasticity

Why hiring fails to improve learning (Hanushek 1986).

My contribution: Hiring → absorbed into slack.

2. Teacher Utilization

TCR “remarkably understudied” (Bennell 2022).

Gap: No rigorous large-scale primary analysis (Tanzania TCR = 2.5).

4. Management

Focuses on talent management (Bloom et al. 2015).

Gap: Overlooks extensive margin of deployment.

Data

5.4 million learners in 9,512 public primary schools
2023 cross-section; 2018–2023 panel
Final sample: ≈78% of all public primary learners

Novel feature: Reconstructed class-level timetables from learner-class assignments in administrative data.

Direct measurement of teacher deployment—not survey self-reports.

Attrition details 🔎

Measurement Framework

Optimization Framework: Allocative Efficiency

Scope: Optimality is defined here strictly as resource distribution, holding pedagogical models constant.

The Primal Objective (Policy Preference Maximisation) Minimise Experienced Class Size given fixed staff establishment.

The Dual Objective (Cost Rationalisation) Minimise Staff Establishment required to meet a specific class size target.

The Measurement Innovation

Core idea: Use constrained optimisation to quantify efficiency gaps.

For each school, I solve:

\[\min_{\{k_g\}} \text{ECS} = \min_{\{k_g\}} \frac{\sum_g n_g^2 / k_g}{\sum_g n_g}\]

subject to progressively relaxed constraints on \(\sum_g k_g\).

The gap between observed and optimal = measurable inefficiency.

By comparing optima under different constraints, I decompose where inefficiency lies:

Technical (timetabling) vs Allocative (capital vs labour) vs Spatial

The Allocation Algorithm

Objective: Minimise \(\sum_g n_g^2/k_g\) (equivalent to minimising ECS)

Method: Greedy algorithm with max-priority queue—provably optimal (convex objective)

Algorithm:

Assign one class per non-empty grade
Calculate marginal gain of adding one class to each grade: \(\Delta = \frac{n^2}{k(k+1)}\)
Add class to grade with highest gain
Repeat until all classes allocated

Shadow Price:

After allocation, the shadow price = marginal gain of the next class.

This tells us the value of an additional teacher:

High shadow price → binding constraint
Low shadow price → diminishing returns

Shadow prices reveal which schools would benefit most from expansion.

Optimisation as Measurement

I solve for minimum ECS under nested constraints to quantify efficiency gaps.

Scenario	Constraint	What it measures	Example 🔎
S0	Status quo	Baseline	S0
S1–S2	Balance classes	Technical efficiency	S1, S2
S3	Bind by classrooms	Capital slack	S3
S4	Bind by teachers	Labour slack	S4
S5–S6	Pool across space	Spatial misallocation	S5, S6

“Optimal” = minimum class size for given teachers (or equivalently, minimum teachers for given class size).

Gaps in Efficiency Could Close Gaps in Equity

Systematic Patterns in Class Size and Inefficiency

Quintile
Race

Main Result: Where the Inefficiency Lies

Key Findings: Descriptive

S4 (full teacher utilisation): −18.8% ECS. Core inefficiency.
S5 (district pooling): Most spatial gains captured locally.
S6 (province pooling): Negligible additional benefit (+0.4 pp).

Causal Identification

The Identification Question

Question: Does exogenously providing a school with additional teachers lead to smaller classes or increased slack?

Three sources of bias in OLS:

Reverse causality: Inefficient schools receive compensatory staffing
Selection: Management quality affects both staffing and deployment
Omitted variables: Neighbourhood effects, principal quality, union strength

Solution: Instrument STR with bureaucratic teacher allocation rules (Post Provisioning Norm).

Demographic shock elasticities 🔎

Instrument: Post Provisioning Norm (PPN)

PPN formula: f(enrolment, grade-mix, quintile, provincial norms)

Grade	Max Class	Weight
R	35	0
1–4	35	1.190
5–6	40	1.042
7	37	1.126

Identification: Within-school grade composition changes trigger mechanical allocation adjustments.

Schools don’t control cohort composition
School FE + enrolment controls absorb time-invariant quality

Specification

\[ \begin{aligned} \text{First stage:} \quad & \Delta \text{STR}_{\text{actual}} = \alpha + \beta \cdot \Delta \overline{\text{PPN-Weight}}_{\text{leaver}} + \delta_t + \varepsilon \\[0.5em] \text{Second stage:} \quad & \Delta \text{Inefficiency} = \alpha + \delta \cdot \Delta \widehat{\text{STR}} + \delta_t + \varepsilon \end{aligned} \]

First-differences specification. School-level clustering. N = 28,703 school-years.

Identifying assumptions:

Relevance: PPN strongly predicts actual STR (F > 60)
Exclusion: PPN affects inefficiency only through STR
Exogeneity: Grade-mix changes ⊥ time-varying management quality

IV Results

	First-Differences
2SLS Coefficient (δ)	−4.5*
Standard Error	0.111
First-stage F-statistic	60.4
N (school-years)	28,703

Interpretation: Exogenously gaining teachers → increased slack, not smaller classes.

Schools absorb additional teachers into lighter workloads. Why? 🔎

Robustness

Instrument validity: ✓

Balance on 15 pre-determined covariates
No pre-trends (future shocks don’t predict past outcomes)
F > 30 across all specifications

Specification robustness: ✓

7 fixed effects combinations
4 clustering schemes
5 functional forms

Heterogeneity: Effect stronger in small schools, low-STR contexts

Full robustness table 🔎

Implications

But Isn’t That Slack Needed for Preparation?

The professional day extends beyond instruction

Component	Description
Formal school day	7 hours at school (PAM requirement)
Instructional periods	~6 periods × 45 min = ~4.5 hours
Breaks	Recess + lunch = ~1 hour (rest, not prep)
Within-day non-contact	~1.5 hours (admin, assembly, free time)
Informal school day	~400 hours/year = ~2 hours/day explicitly for preparation

Key point: The “informal school day” is mandated time outside formal hours for planning and marking.

Even at TCR = 1.0, teachers have ~3.5 hours daily for preparation (1.5h in-day + 2h informal).

Current TCR = 1.22 gives additional free periods during instruction—beyond what’s needed.

Why would schools choose this? 🔎

Fiscal Leakage

R22.3bn annually: Cost of S4 teacher under-utilisation (18.2% of time)
R29.6bn annually: Including spatial misallocation
Exceeds combined spending on school nutrition + learning materials

Assumptions 🔎

Policy Implications

1. Enforce contact-time norms

−18.8% ECS or free ≈18% educator capacity (≈R22bn)

2. Activate idle classrooms

−10.2% ECS from capital under-utilisation

3. District-level pooling

Shift hiring from school to district level
Captures spatial gains (−24.2% cumulative) without provincial disruption

Key insight: The problem is deployment incentives, not fiscal constraints.

Summary

1. Measurement innovation: Constrained optimisation to quantify efficiency gaps under nested constraints—technical, allocative, spatial.

2. Descriptive finding: Within-school labour slack (−18.8%) > spatial misallocation (−5.8 pp).

3. Causal evidence: Exogenously adding teachers increases slack, not reduces class sizes (δ = −4.5).

4. Policy implication: R22bn annually—deployment reform, not fiscal expansion.

References

Bennell, Paul. 2022. “Teaching Too Little to Too Many: Teaching Loads and Class Size in Secondary Schools in Sub-Saharan Africa.” International Journal of Educational Development 94 (October): 102651. https://doi.org/10.1016/j.ijedudev.2022.102651.

Bloom, Nicholas, Renata Lemos, Raffaella Sadun, and John Van Reenen. 2015. “Does Management Matter in Schools?” The Economic Journal 125 (584): 647–74. https://doi.org/10.1111/ecoj.12267.

Bold, Tessa, Deon Filmer, Gayle Martin, Ezequiel Molina, Christophe Rockmore, Brian Stacy, Jakob Svensson, and Waly Wane. 2017. What Do Teachers Know and Do? Does It Matter? Evidence from Primary Schools in Africa. World Bank, Washington, DC. https://doi.org/10.1596/1813-9450-7956.

Hanushek, Eric A. 1986. “The Economics of Schooling: Production and Efficiency in Public Schools.” Journal of Economic Literature 24 (3): 1141–77. https://www.jstor.org/stable/2725865.

Walter, Torsten Figueiredo. 2020. “Misallocation in the Public Sector? Cross-Country Evidence from Two Million Primary Schools.” Unpublished.

Appendix

Sample Attrition

Final coverage: ≈5.4 million learners (78%), 9,512 schools
Largest exclusion: Multigrade schools (≈12%)
External validity concern: Multigrade schools may face distinct challenges

↩︎️ Back

Fiscal Assumptions

Baseline: Mean educator salary R42,000/month; 12 months; S4 inefficiency 18.2%
Sensitivity: ±10% salary → R20.0–24.5bn range
Coverage: Primary only (R22.3bn); secondary extrapolation ≈R62bn total
As % of education budget: 7.6–10.1%

Comparison: R22.3bn > National School Nutrition (≈R8bn) + LTSM (≈R12bn)

↩︎️ Back

IV Robustness: Full Table

#	Category	Test	Description
1	Instrument Validity	Balance on observables	Instrument orthogonal to 15 pre-determined covariates
2	Instrument Validity	Falsification tests	Future PPN shocks don't predict past outcomes
3	Instrument Validity	Weak instrument robust CI	Anderson-Rubin and CLR confidence sets
4	Instrument Validity	Reduced form on placebos	No effect on time-invariant school characteristics
5	Instrument Validity	Exclusion restriction probes	Control for PPN sub-components
6	Instrument Validity	Overidentification tests	Grade-specific instruments yield consistent estimates
7	Instrument Validity	First-stage heterogeneity	Strong F-statistics across districts and time periods
8	Specification Robustness	Alternative fixed effects	None, Year, District, School, Province×Year, District×Year
9	Specification Robustness	Alternative clustering	School, District, Two-way (School+Year), Two-way (District+Year)
10	Specification Robustness	Subsample stability	Balanced panel, exclude outliers, by quintile, leave-one-province-out
11	Specification Robustness	Alternative functional forms	Level-level, log-log, level-log, log-level, quadratic
12	Specification Robustness	Asymmetry tests	STR increases vs decreases (gains vs losses)
13	Specification Robustness	Specification ladder	Progressive inclusion of controls and FE
14	Specification Robustness	Control function approach	Alternative estimator using residuals as control
15	Heterogeneity Analysis	Effect modification	By school size, initial STR, baseline class size, quintile, urban/rural
16	Mediation Analysis	Class count as mediator	Tests whether STR affects inefficiency through class formation
17	Attrition & Sample Selection	Sample attrition tests	Validates representativeness of final sample (78% coverage)

↩︎️ Back

Demographic Shock Elasticities

Question: Are high TCRs forced by constraints or discretionary?

Test logic: If constraints bind → shocks raise inefficiency. If slack is discretionary → shocks reveal latent capacity.

β₁ (Grade-Mix) = −0.381*** (10 pp shift → 3.81 pp inefficiency reduction)
β₂ (Phase-Mix) = −0.256*** (teachers reallocated across phase boundaries)
All coefficients negative → adaptive capacity exists

↩︎️ Alternative explanations 🔎

Alternative Explanations Ruled Out

17 mechanisms investigated. Five structural rigidities predict β > 0 if binding:

Mechanism	Prediction	Finding
Grade-specific match capital	β₁ > 0	Rejected: β₁ = −0.381
Qualification constraints	β₂ > 0	Rejected: β₂ = −0.256
Period indivisibilities	β > 0	Rejected: both < 0
Cross-phase synchronisation	β₂ > 0	Rejected: β₂ = −0.256
Enrolment volatility buffering	β₃ > 0	Rejected: β₃ = −0.007

Consistent with: Policy equilibria where slack is discretionary.

Caveat: Standby rosters create endogeneity—estimates may be lower bounds.

Equity Implications

Baseline gap: Q1–Q2 classes 32% larger than Q5
Paradox: Wealthier schools show largest efficiency gains (−20.2%)—hold more slack
Implication: Efficiency reforms reduce absolute crowding but preserve relative inequality

Racial Inequality

Baseline gap: Black learners 29% larger classes than white (44.3 vs 34.4)
Post-optimisation: Gap persists (33.4 vs 25.8)
White-learner schools hold more idle capacity

Why Would Schools Choose Inefficiency?

Core mechanism: Teachers value free periods above smaller classes.

Rigid wage schedules: Non-pecuniary compensation (lighter loads) becomes primary margin
Union bargaining: Workload norms weakly enforced
Measurement vacuum: What isn’t measured can’t be managed
Rational precaution: Standby rosters for unscheduled absence create endogenous slack

Political economy: TCR represents negotiated equilibrium—teachers capture rents through reduced contact time rather than higher wages.

International Comparisons

Country	TCR	Utilisation
Tanzania	2.5	40%
Ethiopia	≈1.4	≈71%
South Africa	1.22	82%

Vietnam: Explicitly targets TCR > 1 for extended contact hours
SSA systems without timetabling software may face both technical and allocative inefficiency

Implication: South Africa’s problem is utilisation (S4), not timetabling (S1–S2).

Algebra Reference

Core Identity: STR = SCR / TCR

STR = Learners / Teachers
SCR = Learners / Classes
TCR = Teachers / Classes

Experienced Class Size (ECS): \[\text{ECS} = \frac{\sum_i \text{CS}_i^2}{\sum_i \text{CS}_i} = \overline{\text{CS}} + \frac{\sigma^2_{\text{CS}}}{\overline{\text{CS}}}\]

Example: Classes of {50, 10} average 30, but ECS = 43.3 (more learners experience the larger class).

Provincial Scenarios

Limpopo & Eastern Cape: High capital slack (idle classrooms)
Limpopo & Mpumalanga: High labour slack (idle teachers)
Provinces need tailored interventions

Representative School: Status Quo (S0)

Baseline timetable at 82nd percentile reducibility. School: 1,220 learners, 27 classes, 7 grades, ECS 47.8.

Within-grade dispersion modest. Between-grade imbalance small. Key: 27 classes but more teachers available.

↩︎️ Back to scenarios

Within-Grade Equalisation (S1)

↩︎️ Back to scenarios

Between-Grade Allocation (S2)

↩︎️ Back to scenarios

Classroom-Constrained Allocation (S3)

↩︎️ Back to scenarios

Educator-Optimal Allocation (S4)

↩︎️ Back to scenarios

District-Level Pooling (S5)

↩︎️ Back to scenarios

Province-Level Pooling (S6)

↩︎️ Back to scenarios

Variance Decomposition: What Explains Crowding?

S4 reducibility: 33.5% of explained variance (R² = 52.3%)
Traditional factors small: Province 6.3%, Quintile 6.4%, Race 5.5%
Latent capacity > STR in explaining crowding

Secondary School Extension

Primary TCR ≈ 1.22 (generalist model). Secondary TCR likely higher:
- Subject specialisation mandated (≈15+ subjects)
- Period indivisibilities bind harder
- Teacher qualifications phase-specific
Preliminary estimates: If secondary TCR ≈ 1.5, fiscal leakage ≈R40bn additional (total ≈R62bn)
Data challenge: LURITS lacks subject identifiers for secondary

Policy: Subject rationalisation; pool teachers at district level for rare subjects.