ANDREW LAHSER
Running Meditation

Building a Personal Data Lake

genairunning

Your health data is scattered across a dozen apps. Oura knows your sleep. Strava knows your runs. COROS knows your cadence and ground contact time. Apple Health tries to be the hub but mostly just collects dust.

I wanted all of it in one place. Not a dashboard — a queryable data store I control.

The stack is deliberately simple: Python scripts pull from each API, land raw JSON in a bronze layer, then transform to Parquet files in a silver layer. DuckDB provides SQL access without a server. The whole thing runs on a machine with 512MB of memory.

Why Parquet? Columnar storage is perfect for time-series health data. Compresses well, queries fast, and you can read it with anything — Python, R, DuckDB, even Excel.

The real insight came when I started joining datasets. Overlaying sleep quality on training load. Correlating HRV trends with mileage ramps. Seeing how a bad night of sleep shows up two days later in your running power.

Your body generates incredible data. The least you can do is keep it somewhere you can actually use it.