How the Peloton data model works.

The pipeline from raw CSV to finished dashboard, explained step by step with real data.

1

Read the CSV

The organize.py script opens peloton_workouts.csv. It has 4,490 rows. Each row is one workout Peloton recorded -- a ride, a strength class, a warm up. Here are 8 real rows:

Timestamp Title Type Discipline Length Watts Instructor
2018-03-16 5 min Training Program Warm Up Warm Up Cycling 5 152 Matt Wilpers
2018-03-16 45 min Power Zone Endurance Ride Power Zone Cycling 45 209 Matt Wilpers
2018-03-21 10 min Warm Up Ride Warm Up Cycling 10 169 Ally Love
2018-03-21 45 min Metrics Ride Music Cycling 45 217 Jenn Sherman
2018-03-25 60 min Power Zone Endurance Ride Power Zone Cycling 60 223 Matt Wilpers
2018-03-27 45 min Tabata Ride Intervals Cycling 45 202 Robin Arzon
2018-03-28 5 min Cool Down Ride Cool Down Cycling 5 177 Christine D'Ercole
2018-04-06 45 min Power Zone Ride Power Zone Cycling 45 196 Matt Wilpers

The CSV has 17 columns total (full names: Workout Timestamp, Title, Type, Fitness Discipline, Length (minutes), Avg. Watts, Instructor Name). These 7 are the ones that matter for what comes next. The highlighted rows are Power Zone rides -- they'll appear in both analysis tracks.

2

Classify every row

The script loops through every row and asks three questions. The answers determine where each row's data ends up.

Is it ancillary?

Checks the Type column. If it's "Warm Up" or "Cool Down", the row is ancillary.

Type: "Cool Down"
Ancillary -- excluded from workout counts
Type: "Warm Up"
Ancillary -- but time still counts toward hours
Type: "Intervals"
Not ancillary -- this is a core workout
1,526 rows are ancillary. The remaining 2,964 are core workouts.

What discipline bucket?

Checks the Fitness Discipline column and maps it to one of three buckets.

Fitness Discipline: "Cycling"
Cycling
Fitness Discipline: "Strength"
Strength
Fitness Discipline: "Stretching"
Other
Running, Yoga, Meditation, etc. all become "Other."

Is it a performance ride?

Checks the Title for Power Zone or FTP keywords. Also requires nonzero Avg. Watts.

"60 min Power Zone Endurance Ride"
PZ Endurance, 60 min bucket
"45 min Power Zone Ride"
PZ Standard, 45 min bucket
"45 min Tabata Ride"
No -- not a Power Zone format
Only 1,474 rows qualify as performance rides.
After classification, each row feeds into one or both of the next two steps. A Power Zone ride goes into both volume (step 3) and performance (step 4). A Tabata ride goes into volume only. A warm up contributes time to volume but doesn't count as a workout.
3

Build the volume outputs

The script groups all rows by month and calculates totals. These numbers cover everything -- cycling, strength, other. This is the "how much did you train?" track.

Monthly volume

For each month: total workouts (core only), total hours (all rows including ancillary), miles, calories, and hours split by discipline.

Mar 2018: 15 workouts, 11.6 hrs, 250 mi

Heatmap

One entry per day with an effort tier (1-5) based on estimated calories that day. Covers all disciplines, not just cycling.

2018-03-28: 2 workouts, 980 cal, tier 3

Headline stats

Totals and streaks across the full history: total workouts, total hours, active days, consecutive active weeks.

2,964 core workouts · 2,630 hours · 384 consecutive weeks

Instructors

Workout count per instructor across all disciplines.

Matt Wilpers: 3,206 · Ben Alldis: 338
What counts here? Warmups and cooldowns add to hours and effort but don't count as workouts or active days. A 5-min warm up shouldn't inflate your workout count.
4

Build the performance outputs

The script takes only the 1,474 performance rides and builds a separate set of outputs. This is the "am I getting stronger?" track. The key design choice: it only compares like with like.

4a. Group by ride type + duration

Each performance ride gets placed into a cohort based on its ride type and how long it was. A 60-min PZ Endurance ride is a completely different effort than a 45-min PZ Max, so they're never mixed together.

Ride Type Duration Buckets What's being compared
PZ Endurance 60 min, 75 min, 90 min, 120 min Long steady-state rides
PZ Standard 45 min, 60 min Mixed-zone interval rides
PZ Max 20 min, 45 min High-intensity short efforts
FTP Test 20 min All-out assessment efforts
4b. For each cohort, calculate three views of the data

Each ride type + duration combination gets its own set of trend data. Here's what the script produces for each one, using "PZ Endurance / 60 min" as the example:

Scatter

Every ride as a single data point: date + average watts. This is the raw data the dashboard plots as dots.

2018-03-25: 223W
2020-12-02: 256W
2026-05-05: 248W
140 rides in this cohort

Rolling median

A 90-day rolling median of watts. Smooths out day-to-day variation to show the real trend. Only calculated when there are 3+ rides in the trailing 90-day window.

2020-12-22: 257.5W median (14 rides in window)

Annual summary

Year-level stats: average, 75th percentile, 90th percentile, and max watts. Good for comparing year over year.

2021: avg 269W, p75 281W, max 292W
4c. Pull out FTP tests separately

FTP tests are 20-minute all-out efforts that serve as fitness benchmarks. The script pulls them into their own list so the dashboard can show them as discrete events (13 tests across 8 years).

2018-05-12: 254W
2020-12-05: 363W
2026-05-10: 339W
Why only Power Zone rides? To compare watts fairly, you can only compare like with like. Mixing different ride formats would make the trend lines meaningless.
5

Write the JSON

organize.py writes everything from steps 3 and 4 to a single file: dashboard_data.json. The volume stuff goes in a "volume" section, the performance stuff in a "performance" section, plus headline stats and instructor counts at the top level.

Volume section
"volume": { "monthly": [ { "month": "2018-03", "workouts": 15, "hours": 11.6, "hours_by_disc": { "Cycling": 11.6, "Strength": 0, "Other": 0 } }, ... ], "heatmap": [ { "date": "2018-03-25", "count": 2, "tier": 4 }, ... ] }
Performance section
"performance": { "series": { "PZ Endurance": { "60 min": { "scatter": [ { "date": "2018-03-25", "watts": 223 }, ... ], "rolling_median": [ { "date": "2020-12-22", "median": 257.5 }, ... ] } } }, "ftp_tests": [ { "date": "2020-12-05", "watts": 363 }, ... ] }
6

The dashboard displays the data

Once the dashboard design was finalized, we had AI incorporate the JSON data from dashboard_data.json directly into the dashboard HTML. The data lives inside the file itself as a JavaScript variable:

const DATA = { "headline": { ... }, "volume": { ... }, "performance": { ... } };

This means the dashboard is completely self-contained. You can send the HTML file to anyone and they can open it in a browser -- no server, no database, no external files needed.

Volume charts read from "volume"

Monthly training hours (stacked by discipline), workout consistency heatmap, headline stats at the top.

Power charts read from "performance"

Scatter + rolling median trend lines, filterable by ride type and duration. FTP test history as discrete dots.

This is the core insight: the dashboard itself is simple. All the hard work -- classifying rides, deciding what counts, computing medians -- happens in the organize step. If a number on the dashboard looks wrong, you can ask AI to trace it back through the JSON to the script. The separation makes the whole thing auditable, even if you never look at the code yourself.
7

Updating with new data

When you have new data (say, a fresh export from Peloton), the update process is:

1. New export

Drop the new data file into your raw_data folder.

2. Re-run and inject

Ask AI to re-run the organize script and inject the new data into the dashboard.

3. Review

Open the updated dashboard and make sure everything looks right.

The dashboard design doesn't change -- only the data inside it gets refreshed. Because the organize script already has all your classification rules and calculations, the update is a few prompts, not a rebuild.