Teacher Misallocation in South African Primary Schools
Stellenbosch University & Tinbergen Institute
2026-02-02
Student-Teacher Ratio = 32.5
Experienced Class Size = 43.3
Teachers are employed but not fully deployed.
\[\text{STR} = \frac{\text{SCR}}{\text{TCR}}\]
| Metric | Definition | National Average |
|---|---|---|
| STR | Student-Teacher Ratio | 32.5 |
| SCR | Student-Class Ratio | 39.3 |
| TCR | Teacher-Class Ratio | 1.22 |
TCR = 1.22 → Teachers timetabled for only 82% of periods
Same STR of 30 can mean classes of 30 (TCR=1) or classes of 60 (TCR=2).
Key distinction: Not timetabled ≠ Absent when timetabled
World Bank SDI conflates these:
\[\text{Class Absence} = 1 - \frac{\text{Teachers in Classrooms}}{\text{Total Employed}}\]
Policy implication: Administrative timetabling reforms ≠ behavioural accountability interventions.
1. Misallocation
Focuses on between-school variation (Walter 2020).
My finding: Within-school slack > spatial gains.
3. Input Elasticity
Why hiring fails to improve learning (Hanushek 1986).
My contribution: Hiring → absorbed into slack.
2. Teacher Utilization
TCR “remarkably understudied” (Bennell 2022).
Gap: No rigorous large-scale primary analysis (Tanzania TCR = 2.5).
4. Management
Focuses on talent management (Bloom et al. 2015).
Gap: Overlooks extensive margin of deployment.
Novel feature: Reconstructed class-level timetables from learner-class assignments in administrative data.
Direct measurement of teacher deployment—not survey self-reports.
Scope: Optimality is defined here strictly as resource distribution, holding pedagogical models constant.
Core idea: Use constrained optimisation to quantify efficiency gaps.
For each school, I solve:
\[\min_{\{k_g\}} \text{ECS} = \min_{\{k_g\}} \frac{\sum_g n_g^2 / k_g}{\sum_g n_g}\]
subject to progressively relaxed constraints on \(\sum_g k_g\).
The gap between observed and optimal = measurable inefficiency.
By comparing optima under different constraints, I decompose where inefficiency lies:
Objective: Minimise \(\sum_g n_g^2/k_g\) (equivalent to minimising ECS)
Method: Greedy algorithm with max-priority queue—provably optimal (convex objective)
Algorithm:
Shadow Price:
After allocation, the shadow price = marginal gain of the next class.
This tells us the value of an additional teacher:
Shadow prices reveal which schools would benefit most from expansion.
I solve for minimum ECS under nested constraints to quantify efficiency gaps.
| Scenario | Constraint | What it measures | Example 🔎 |
|---|---|---|---|
| S0 | Status quo | Baseline | S0 |
| S1–S2 | Balance classes | Technical efficiency | S1, S2 |
| S3 | Bind by classrooms | Capital slack | S3 |
| S4 | Bind by teachers | Labour slack | S4 |
| S5–S6 | Pool across space | Spatial misallocation | S5, S6 |
“Optimal” = minimum class size for given teachers (or equivalently, minimum teachers for given class size).
Question: Does exogenously providing a school with additional teachers lead to smaller classes or increased slack?
Three sources of bias in OLS:
Solution: Instrument STR with bureaucratic teacher allocation rules (Post Provisioning Norm).
PPN formula: f(enrolment, grade-mix, quintile, provincial norms)
| Grade | Max Class | Weight |
|---|---|---|
| R | 35 | 0 |
| 1–4 | 35 | 1.190 |
| 5–6 | 40 | 1.042 |
| 7 | 37 | 1.126 |
Identification: Within-school grade composition changes trigger mechanical allocation adjustments.
\[ \begin{aligned} \text{First stage:} \quad & \Delta \text{STR}_{\text{actual}} = \alpha + \beta \cdot \Delta \overline{\text{PPN-Weight}}_{\text{leaver}} + \delta_t + \varepsilon \\[0.5em] \text{Second stage:} \quad & \Delta \text{Inefficiency} = \alpha + \delta \cdot \Delta \widehat{\text{STR}} + \delta_t + \varepsilon \end{aligned} \]
First-differences specification. School-level clustering. N = 28,703 school-years.
Identifying assumptions:
| First-Differences | |
|---|---|
| 2SLS Coefficient (δ) | −4.5* |
| Standard Error | 0.111 |
| First-stage F-statistic | 60.4 |
| N (school-years) | 28,703 |
Interpretation: Exogenously gaining teachers → increased slack, not smaller classes.
Schools absorb additional teachers into lighter workloads. Why? 🔎
Instrument validity: ✓
Specification robustness: ✓
Heterogeneity: Effect stronger in small schools, low-STR contexts
The professional day extends beyond instruction
| Component | Description |
|---|---|
| Formal school day | 7 hours at school (PAM requirement) |
| Instructional periods | ~6 periods × 45 min = ~4.5 hours |
| Breaks | Recess + lunch = ~1 hour (rest, not prep) |
| Within-day non-contact | ~1.5 hours (admin, assembly, free time) |
| Informal school day | ~400 hours/year = ~2 hours/day explicitly for preparation |
Key point: The “informal school day” is mandated time outside formal hours for planning and marking.
Even at TCR = 1.0, teachers have ~3.5 hours daily for preparation (1.5h in-day + 2h informal).
Current TCR = 1.22 gives additional free periods during instruction—beyond what’s needed.
1. Enforce contact-time norms
2. Activate idle classrooms
3. District-level pooling
Key insight: The problem is deployment incentives, not fiscal constraints.
1. Measurement innovation: Constrained optimisation to quantify efficiency gaps under nested constraints—technical, allocative, spatial.
2. Descriptive finding: Within-school labour slack (−18.8%) > spatial misallocation (−5.8 pp).
3. Causal evidence: Exogenously adding teachers increases slack, not reduces class sizes (δ = −4.5).
4. Policy implication: R22bn annually—deployment reform, not fiscal expansion.
Comparison: R22.3bn > National School Nutrition (≈R8bn) + LTSM (≈R12bn)
| # | Category | Test | Description |
|---|---|---|---|
| 1 | Instrument Validity | Balance on observables | Instrument orthogonal to 15 pre-determined covariates |
| 2 | Instrument Validity | Falsification tests | Future PPN shocks don't predict past outcomes |
| 3 | Instrument Validity | Weak instrument robust CI | Anderson-Rubin and CLR confidence sets |
| 4 | Instrument Validity | Reduced form on placebos | No effect on time-invariant school characteristics |
| 5 | Instrument Validity | Exclusion restriction probes | Control for PPN sub-components |
| 6 | Instrument Validity | Overidentification tests | Grade-specific instruments yield consistent estimates |
| 7 | Instrument Validity | First-stage heterogeneity | Strong F-statistics across districts and time periods |
| 8 | Specification Robustness | Alternative fixed effects | None, Year, District, School, Province×Year, District×Year |
| 9 | Specification Robustness | Alternative clustering | School, District, Two-way (School+Year), Two-way (District+Year) |
| 10 | Specification Robustness | Subsample stability | Balanced panel, exclude outliers, by quintile, leave-one-province-out |
| 11 | Specification Robustness | Alternative functional forms | Level-level, log-log, level-log, log-level, quadratic |
| 12 | Specification Robustness | Asymmetry tests | STR increases vs decreases (gains vs losses) |
| 13 | Specification Robustness | Specification ladder | Progressive inclusion of controls and FE |
| 14 | Specification Robustness | Control function approach | Alternative estimator using residuals as control |
| 15 | Heterogeneity Analysis | Effect modification | By school size, initial STR, baseline class size, quintile, urban/rural |
| 16 | Mediation Analysis | Class count as mediator | Tests whether STR affects inefficiency through class formation |
| 17 | Attrition & Sample Selection | Sample attrition tests | Validates representativeness of final sample (78% coverage) |
Question: Are high TCRs forced by constraints or discretionary?
Test logic: If constraints bind → shocks raise inefficiency. If slack is discretionary → shocks reveal latent capacity.
17 mechanisms investigated. Five structural rigidities predict β > 0 if binding:
| Mechanism | Prediction | Finding |
|---|---|---|
| Grade-specific match capital | β₁ > 0 | Rejected: β₁ = −0.381 |
| Qualification constraints | β₂ > 0 | Rejected: β₂ = −0.256 |
| Period indivisibilities | β > 0 | Rejected: both < 0 |
| Cross-phase synchronisation | β₂ > 0 | Rejected: β₂ = −0.256 |
| Enrolment volatility buffering | β₃ > 0 | Rejected: β₃ = −0.007 |
Consistent with: Policy equilibria where slack is discretionary.
Caveat: Standby rosters create endogeneity—estimates may be lower bounds.
Core mechanism: Teachers value free periods above smaller classes.
Political economy: TCR represents negotiated equilibrium—teachers capture rents through reduced contact time rather than higher wages.
| Country | TCR | Utilisation |
|---|---|---|
| Tanzania | 2.5 | 40% |
| Ethiopia | ≈1.4 | ≈71% |
| South Africa | 1.22 | 82% |
Implication: South Africa’s problem is utilisation (S4), not timetabling (S1–S2).
Core Identity: STR = SCR / TCR
Experienced Class Size (ECS): \[\text{ECS} = \frac{\sum_i \text{CS}_i^2}{\sum_i \text{CS}_i} = \overline{\text{CS}} + \frac{\sigma^2_{\text{CS}}}{\overline{\text{CS}}}\]
Example: Classes of {50, 10} average 30, but ECS = 43.3 (more learners experience the larger class).
Baseline timetable at 82nd percentile reducibility. School: 1,220 learners, 27 classes, 7 grades, ECS 47.8.
Within-grade dispersion modest. Between-grade imbalance small. Key: 27 classes but more teachers available.
Policy: Subject rationalisation; pool teachers at district level for rare subjects.