Mũrĩithi Mũriũki

Automating Survey Data Workflows using Python

Data analysts spend too much time cleaning and formatting.

Long variable labels are not truncated .Encounters that often when using Stata
Avoids manual labeling and report formatting .

I built a Python-based tool that reads an XLSForm, retrieves collected data, labels it automatically, and produces clean, formatted analytical outputs ready for reporting.

XLSForm (survey + choices)

↓

Database/Server (ODK/Kobo/SurveyCTO)

↓

Python Tool

→ Reads XLSForm structure

→ Fetches collected data

→ Applies variable & value labels

→ Generates analysis & formatted output

List what the tool does:

Reads XLSForm metadata (variable & value labels)
Connects to databases (local or via API)
Cleans and validates data automatically
Applies human-readable labels from XLSForm
Generates descriptive analysis and summary tables
Exports results in Excel or formatted HTML reports

In comparison to Stata, this scripts avoids the truncation on long variable labels in the output needing manual intervention.

This workflow has reduced the time I spend labeling and formatting M&E survey data. Next, I plan to extend it into an interactive PowerBI dashboard for key indicator monitoring.