Earlier this month I was in Bangkok for a regional workshop on infrastructure planning for CAPI operations — computer-assisted personal interviewing, the now-standard approach to running large household and agricultural surveys on tablets instead of paper. The room was full of statisticians who knew their sampling frames inside out but had inherited the IT side of their surveys by accident. The recurring theme was familiar: the methodology gets months of attention, and the infrastructure that has to carry it gets sorted out in the last two weeks before fieldwork. That order is backwards, and it is usually where national surveys quietly fail.
The first decision is server capacity, and it is almost always underestimated. A national survey with a few thousand enumerators syncing daily generates a surprising load — not just the interview data, but GPS traces, audio audits, photos, and the constant back-and-forth of case assignment. Size for peak sync, not average. The peak is the first week of fieldwork, when every team is online at once and every bug surfaces at the same time.
The second is cloud versus on-premise, and the honest answer is that it depends on constraints you do not fully control. Cloud gives you elasticity and someone else's uptime guarantees, but national statistical data often comes with legal requirements to keep it inside the country's borders, and not every country has a compliant regional data centre. On-premise gives you sovereignty and predictable cost, but you own every failure at 2am. In practice many programmes land on a hybrid: a hosted sync server with encrypted backups held nationally.
Third is security, which cannot be an afterthought when you are holding identifiable data on hundreds of thousands of households. Encryption in transit and at rest, role-based access, an audit trail, and a tested restore procedure — not just a backup, but a restore you have actually run. A backup you have never restored is a hope, not a plan.
None of this is glamorous, and none of it shows up in the final report. But the survey people remember is the one where the data came in clean and on time, and that outcome is decided long before the first interview — in the infrastructure choices nobody sees.