Peloton Data Model Explainer

1

Read the CSV

The organize.py script opens peloton_workouts.csv. It has 4,490 rows. Each row is one workout Peloton recorded -- a ride, a strength class, a warm up. Here are 8 real rows:

Timestamp	Title	Type	Discipline	Length	Watts	Instructor
2018-03-16	5 min Training Program Warm Up	Warm Up	Cycling	5	152	Matt Wilpers
2018-03-16	45 min Power Zone Endurance Ride	Power Zone	Cycling	45	209	Matt Wilpers
2018-03-21	10 min Warm Up Ride	Warm Up	Cycling	10	169	Ally Love
2018-03-21	45 min Metrics Ride	Music	Cycling	45	217	Jenn Sherman
2018-03-25	60 min Power Zone Endurance Ride	Power Zone	Cycling	60	223	Matt Wilpers
2018-03-27	45 min Tabata Ride	Intervals	Cycling	45	202	Robin Arzon
2018-03-28	5 min Cool Down Ride	Cool Down	Cycling	5	177	Christine D'Ercole
2018-04-06	45 min Power Zone Ride	Power Zone	Cycling	45	196	Matt Wilpers

The CSV has 17 columns total (full names: Workout Timestamp, Title, Type, Fitness Discipline, Length (minutes), Avg. Watts, Instructor Name). These 7 are the ones that matter for what comes next. The highlighted rows are Power Zone rides -- they'll appear in both analysis tracks.

2

Classify every row

The script loops through every row and asks three questions. The answers determine where each row's data ends up.

Is it ancillary?

Checks the Type column. If it's "Warm Up" or "Cool Down", the row is ancillary.

Type: "Cool Down"

Ancillary -- excluded from workout counts

Type: "Warm Up"

Ancillary -- but time still counts toward hours

Type: "Intervals"

Not ancillary -- this is a core workout

1,526 rows are ancillary. The remaining 2,964 are core workouts.

What discipline bucket?

Checks the Fitness Discipline column and maps it to one of three buckets.

Fitness Discipline: "Cycling"

Cycling

Fitness Discipline: "Strength"

Strength

Fitness Discipline: "Stretching"

Other

Running, Yoga, Meditation, etc. all become "Other."

Is it a performance ride?

Checks the Title for Power Zone or FTP keywords. Also requires nonzero Avg. Watts.

"60 min Power Zone Endurance Ride"

PZ Endurance, 60 min bucket

"45 min Power Zone Ride"

PZ Standard, 45 min bucket

"45 min Tabata Ride"

No -- not a Power Zone format

Only 1,474 rows qualify as performance rides.

After classification, each row feeds into one or both of the next two steps. A Power Zone ride goes into both volume (step 3) and performance (step 4). A Tabata ride goes into volume only. A warm up contributes time to volume but doesn't count as a workout.

3

Build the volume outputs

The script groups all rows by month and calculates totals. These numbers cover everything -- cycling, strength, other. This is the "how much did you train?" track.

Monthly volume

For each month: total workouts (core only), total hours (all rows including ancillary), miles, calories, and hours split by discipline.

Mar 2018: 15 workouts, 11.6 hrs, 250 mi

Heatmap

One entry per day with an effort tier (1-5) based on estimated calories that day. Covers all disciplines, not just cycling.

2018-03-28: 2 workouts, 980 cal, tier 3

Headline stats

Totals and streaks across the full history: total workouts, total hours, active days, consecutive active weeks.

2,964 core workouts · 2,630 hours · 384 consecutive weeks

Instructors

Workout count per instructor across all disciplines.

Matt Wilpers: 3,206 · Ben Alldis: 338

What counts here? Warmups and cooldowns add to hours and effort but don't count as workouts or active days. A 5-min warm up shouldn't inflate your workout count.

4

Build the performance outputs

The script takes only the 1,474 performance rides and builds a separate set of outputs. This is the "am I getting stronger?" track. The key design choice: it only compares like with like.

4a. Group by ride type + duration

Each performance ride gets placed into a cohort based on its ride type and how long it was. A 60-min PZ Endurance ride is a completely different effort than a 45-min PZ Max, so they're never mixed together.

Ride Type	Duration Buckets	What's being compared
PZ Endurance	60 min, 75 min, 90 min, 120 min	Long steady-state rides
PZ Standard	45 min, 60 min	Mixed-zone interval rides
PZ Max	20 min, 45 min	High-intensity short efforts
FTP Test	20 min	All-out assessment efforts

4b. For each cohort, calculate three views of the data

Each ride type + duration combination gets its own set of trend data. Here's what the script produces for each one, using "PZ Endurance / 60 min" as the example:

Scatter

Every ride as a single data point: date + average watts. This is the raw data the dashboard plots as dots.

2018-03-25: 223W

2020-12-02: 256W

2026-05-05: 248W

140 rides in this cohort

Rolling median

A 90-day rolling median of watts. Smooths out day-to-day variation to show the real trend. Only calculated when there are 3+ rides in the trailing 90-day window.

2020-12-22: 257.5W median (14 rides in window)

Annual summary

Year-level stats: average, 75th percentile, 90th percentile, and max watts. Good for comparing year over year.

2021: avg 269W, p75 281W, max 292W

4c. Pull out FTP tests separately

FTP tests are 20-minute all-out efforts that serve as fitness benchmarks. The script pulls them into their own list so the dashboard can show them as discrete events (13 tests across 8 years).

2018-05-12: 254W

2020-12-05: 363W

2026-05-10: 339W

Why only Power Zone rides? To compare watts fairly, you can only compare like with like. Mixing different ride formats would make the trend lines meaningless.

5

Write the JSON

organize.py writes everything from steps 3 and 4 to a single file: dashboard_data.json. The volume stuff goes in a "volume" section, the performance stuff in a "performance" section, plus headline stats and instructor counts at the top level.

Volume section

"volume": { "monthly": [ { "month": "2018-03", "workouts": 15, "hours": 11.6, "hours_by_disc": { "Cycling": 11.6, "Strength": 0, "Other": 0 } }, ... ], "heatmap": [ { "date": "2018-03-25", "count": 2, "tier": 4 }, ... ] }

Performance section

"performance": { "series": { "PZ Endurance": { "60 min": { "scatter": [ { "date": "2018-03-25", "watts": 223 }, ... ], "rolling_median": [ { "date": "2020-12-22", "median": 257.5 }, ... ] } } }, "ftp_tests": [ { "date": "2020-12-05", "watts": 363 }, ... ] }

6

The dashboard displays the data

Once the dashboard design was finalized, we had AI incorporate the JSON data from dashboard_data.json directly into the dashboard HTML. The data lives inside the file itself as a JavaScript variable:

const DATA = {
  "headline": { ... },
  "volume": { ... },
  "performance": { ... }
};

This means the dashboard is completely self-contained. You can send the HTML file to anyone and they can open it in a browser -- no server, no database, no external files needed.

Volume charts read from `"volume"`

Monthly training hours (stacked by discipline), workout consistency heatmap, headline stats at the top.

Power charts read from `"performance"`

Scatter + rolling median trend lines, filterable by ride type and duration. FTP test history as discrete dots.

This is the core insight: the dashboard itself is simple. All the hard work -- classifying rides, deciding what counts, computing medians -- happens in the organize step. If a number on the dashboard looks wrong, you can ask AI to trace it back through the JSON to the script. The separation makes the whole thing auditable, even if you never look at the code yourself.

7

Updating with new data

When you have new data (say, a fresh export from Peloton), the update process is:

1. New export

Drop the new data file into your raw_data folder.

2. Re-run and inject

Ask AI to re-run the organize script and inject the new data into the dashboard.

3. Review

Open the updated dashboard and make sure everything looks right.

The dashboard design doesn't change -- only the data inside it gets refreshed. Because the organize script already has all your classification rules and calculations, the update is a few prompts, not a rebuild.

How the Peloton data model works.

Read the CSV

Classify every row

Is it ancillary?

What discipline bucket?

Is it a performance ride?

Build the volume outputs

Monthly volume

Heatmap

Headline stats

Instructors

Build the performance outputs

Scatter

Rolling median

Annual summary

Write the JSON

The dashboard displays the data

Volume charts read from "volume"

Power charts read from "performance"

Updating with new data

1. New export

2. Re-run and inject

3. Review

Volume charts read from `"volume"`

Power charts read from `"performance"`