The pipeline from raw CSV to finished dashboard, explained step by step with real data.
The organize.py script opens peloton_workouts.csv. It has 4,490 rows. Each row is one workout Peloton recorded -- a ride, a strength class, a warm up. Here are 8 real rows:
| Timestamp | Title | Type | Discipline | Length | Watts | Instructor |
|---|---|---|---|---|---|---|
| 2018-03-16 | 5 min Training Program Warm Up | Warm Up | Cycling | 5 | 152 | Matt Wilpers |
| 2018-03-16 | 45 min Power Zone Endurance Ride | Power Zone | Cycling | 45 | 209 | Matt Wilpers |
| 2018-03-21 | 10 min Warm Up Ride | Warm Up | Cycling | 10 | 169 | Ally Love |
| 2018-03-21 | 45 min Metrics Ride | Music | Cycling | 45 | 217 | Jenn Sherman |
| 2018-03-25 | 60 min Power Zone Endurance Ride | Power Zone | Cycling | 60 | 223 | Matt Wilpers |
| 2018-03-27 | 45 min Tabata Ride | Intervals | Cycling | 45 | 202 | Robin Arzon |
| 2018-03-28 | 5 min Cool Down Ride | Cool Down | Cycling | 5 | 177 | Christine D'Ercole |
| 2018-04-06 | 45 min Power Zone Ride | Power Zone | Cycling | 45 | 196 | Matt Wilpers |
The CSV has 17 columns total (full names: Workout Timestamp, Title, Type, Fitness Discipline, Length (minutes), Avg. Watts, Instructor Name). These 7 are the ones that matter for what comes next. The highlighted rows are Power Zone rides -- they'll appear in both analysis tracks.
The script loops through every row and asks three questions. The answers determine where each row's data ends up.
Checks the Type column. If it's "Warm Up" or "Cool Down", the row is ancillary.
Checks the Fitness Discipline column and maps it to one of three buckets.
Checks the Title for Power Zone or FTP keywords. Also requires nonzero Avg. Watts.
The script groups all rows by month and calculates totals. These numbers cover everything -- cycling, strength, other. This is the "how much did you train?" track.
For each month: total workouts (core only), total hours (all rows including ancillary), miles, calories, and hours split by discipline.
One entry per day with an effort tier (1-5) based on estimated calories that day. Covers all disciplines, not just cycling.
Totals and streaks across the full history: total workouts, total hours, active days, consecutive active weeks.
Workout count per instructor across all disciplines.
The script takes only the 1,474 performance rides and builds a separate set of outputs. This is the "am I getting stronger?" track. The key design choice: it only compares like with like.
Each performance ride gets placed into a cohort based on its ride type and how long it was. A 60-min PZ Endurance ride is a completely different effort than a 45-min PZ Max, so they're never mixed together.
| Ride Type | Duration Buckets | What's being compared |
|---|---|---|
| PZ Endurance | 60 min, 75 min, 90 min, 120 min | Long steady-state rides |
| PZ Standard | 45 min, 60 min | Mixed-zone interval rides |
| PZ Max | 20 min, 45 min | High-intensity short efforts |
| FTP Test | 20 min | All-out assessment efforts |
Each ride type + duration combination gets its own set of trend data. Here's what the script produces for each one, using "PZ Endurance / 60 min" as the example:
Every ride as a single data point: date + average watts. This is the raw data the dashboard plots as dots.
A 90-day rolling median of watts. Smooths out day-to-day variation to show the real trend. Only calculated when there are 3+ rides in the trailing 90-day window.
Year-level stats: average, 75th percentile, 90th percentile, and max watts. Good for comparing year over year.
FTP tests are 20-minute all-out efforts that serve as fitness benchmarks. The script pulls them into their own list so the dashboard can show them as discrete events (13 tests across 8 years).
organize.py writes everything from steps 3 and 4 to a single file: dashboard_data.json. The volume stuff goes in a "volume" section, the performance stuff in a "performance" section, plus headline stats and instructor counts at the top level.
Once the dashboard design was finalized, we had AI incorporate the JSON data from dashboard_data.json directly into the dashboard HTML. The data lives inside the file itself as a JavaScript variable:
This means the dashboard is completely self-contained. You can send the HTML file to anyone and they can open it in a browser -- no server, no database, no external files needed.
"volume"Monthly training hours (stacked by discipline), workout consistency heatmap, headline stats at the top.
"performance"Scatter + rolling median trend lines, filterable by ride type and duration. FTP test history as discrete dots.
When you have new data (say, a fresh export from Peloton), the update process is:
Drop the new data file into your raw_data folder.
Ask AI to re-run the organize script and inject the new data into the dashboard.
Open the updated dashboard and make sure everything looks right.
The dashboard design doesn't change -- only the data inside it gets refreshed. Because the organize script already has all your classification rules and calculations, the update is a few prompts, not a rebuild.