The Development of the BikeScore

Dr.
Philip Friere Skiba

PhysFarm
Training Systems LLC

The author wishes to express his gratitude to all of the amateur, elite, and professional athletes who have made their training and racing data available to him. Without these data, the development of new technology would be a much more difï¬cult (if not impossible) enterprise.

BikeScore is a trademark of PhysFarm Training Systems LLC. Any commercial use of the algorithm requires a license from PhysFarm Training Systems, LLC. Non-commercial / academic use is permitted free of charge provided that such use is properly referenced.

TSS and NP were ï¬rst developed by Dr. Andrew Coggan, and are trademarks claimed by PeaksWare LLC. The BikeScore system does not calculate NP or TSS, nor do any of the other products of PhysFarm Training Systems.

This work is Copyright 2008 by Dr. Philip Friere Skiba and PhysFarm Training Systems, LLC.

There exists a dose-response relationship between training stimulus and adaptation of the athlete (Bannister et al 1975, Busso 2003). Training load can be expressed simply as:

Training load = Intensity Â· Duration (Eq. 1)

It is clear that different types of stimuli will effect different physiologic responses. It is less clear how to compare/quantify differing stimuli and their ability to affect the same response. A number of systems have been proposed, the most widely used of which is TRIMPS, which was devised by Dr. Eric Banister in the 1970â€™s. Simply put, Banister sought to relate an easily measured parameter (heart rate) to lactate production through the use of a population study. This made a great deal of sense, perhaps even more so today as it is now widely accepted that the work rate at lactate threshold (deï¬ned as a rise of serum lactate of 1 mmol/L over exercise baseline) is the primary determinant of endurance exercise performance (Coyle 1988, 1999)

TRIMPS = Duration Â· Average HR during exercise Â· A HR-dependant, intensity based weighting factor (Eq. 2)

The beneï¬t of Banisterâ€™s system is that it takes into consideration the observation that higher workloads are more metabolically taxing (exponentially so, via the weighting factor) than lower workloads of equivalent duration (Bannister 1996). However, it is still dependent upon the measurement of heart rate, which is variable based on factors such as hydration, rest, illness, or cardiac drift. Furthermore, though HR is dependent upon workload, it may take minutes to stabilize when that workload changes. Because of these complicating factors, it would be preferable to measure work rate directly.

In
2003, Dr. Andrew Coggan reï¬ned Banisterâ€™s
concept by developing a system that also incorporated lactate
response to workload. This system related the *change*
in
lactate concentration with the *change*
in an
*objective*
measure
of exercise intensity: power output, which can be directly measured
by on-bike power meters.

Coggan devised a mathematical algorithm similar to that of Bannister, called the Training Stress Score (TSS).

TSS = Exercise duration Â· Average power Â· Power-dependent, intensity weighting factor (Eq. 3)

The
power dependent intensity weighting factor was derived directly from
a plot of blood lactate concentration as a percentage of
concentration at threshold against % of threshold power. His work
indicated a near 4^{th}
power
relationship between the two.

The elegance of Cogganâ€™s system is that while it successfully relates lactate concentration to power output, it is not dependent upon invasive tests. In 1988, Coyle et al. illustrated that the highest power output or pace an athlete can maintain over the course of an hour long exercise task is highly correlated with LT. Thus, to determine threshold intensity, the athlete need only perform such a test and use the resulting average power in the calculations. This 1-hour power has since been dubbed â€œfunctional threshold powerâ€ (Coggan 2003, 2006)

One potential hurdle in the analysis of power meter data is the often stochastic nature of the information. This is in stark contrast to HR data, for instance, which varies with a relatively predictable half-life. For instance, upon cessation of an effort, HR falls rather slowly over a period of 30 seconds to a minute. In contrast, when a cyclist stops pedaling, the power output immediately falls to zero. However, the physiologic response to the stress applied falls with a similar time course as the HR in the above example. This must be accounted for in any effort to calculate the physiologic strain imposed by the stress of a given exercise task. To solve this problem, Coggan used a 30 second moving average to smooth the power data, sensible given the many physiologic processes that have ~30 second half-lives (e.g. HR, plasma epinephrine concentration, ventilation, etc). However, these processes decay exponentially, rather than linearly. This is an area of potential / theoretical improvement which we examined.

Another potential issue lies in the deï¬nition of threshold power. Athletes are often unwilling/unable to undertake an hour-long test / time trial in order to obtain a satisfactory measurement. Additionally, the terminology â€œthresholdâ€ can be problematic, as athletes often have their own idea of what this means to them. Dr. Coggan has previously suggested the use of Monod and Scherrerâ€™s Critical Power model in the absence of 40k TT / true 1 hour maximal power data. We examined the use of this paradigm in place of 40k TT data.

Finally, training metrics such as these have been demonstrated to be useful as the input functions for systems-based performance modeling and prediction equations. We examined differences in performance modeling / prediction based on the use of each metric.

Using the a protocol modiï¬ed from that ï¬rst described by Coggan (2003, 2006), we calculated our alternative stress metrics (xPower and BikeScore) via the following protocol.

Calculate Critical Power per the method of Monod (1960), using 3 minute and 20 minute exercise tests.

Analyze the data from a workout, computing a 25s exponentially weighted moving average for power.

Raise the values in step 2 to the 4th power.

Average for the values from step 3.

Take the 4th root of step 4. This is the xPower.

Divide xPower by Critical Power from step 1 to get the Relative Intensity (RI).

Multiply the xPower by the duration of the workout in seconds to obtain a â€œnormalized workâ€ value in joules.

Multiply value obtained in step 7 by the RI to get a raw BikeScore.

Divide the values from step 8 by the amount of work performed during an hour at Critical Power.

Multiply the number from step 9 by 100 to obtain the ï¬nal BikeScore.

This calculation appears laborious at ï¬rst glance, however, inexpensive software has been developed which automates the process (

Power Output, 30s MA, and 25s EWMA for Power Output

Figure 1: Instantaneous power output (yellow), 25s exponentially weighted moving average for power output (red), and 30s moving average for power output (blue) for a 20k TT. There is little apparent difference between smoothing methods.

When
applied to a portion of an interval workout, similar results are
observed. (Figure 2, AP=184W, xPower=200W, and NP=204). However, it
is easier to observe the differences between smoothing methods. In
this case, it would seem that while the 30s MA more closely tracks
the stress (e.g. the raw power output), the 25s EWMA might be more
representative of the *physiologic
response or strain imposed by the athleteâ€™s
effort*.

Power Output, 30s MA, and 25s EWMA for Power Output

Figure 2: Instantaneous power output (yellow), 25s exponentially weighted moving average for power output (red), and 30s moving average for power output (blue) for a portion of an interval workout. Note the difference between smoothing methods. Note the difference in decay between the red and blue lines after each interval.

The aforementioned threshold power has proven to be problematic as, at least in our experience, athletes often neglect to test their threshold power with sufï¬cient regularity. This seems to be largely due to the fact that an hour long test at maximal effort is exceedingly difï¬cult and may require two or more days to completely recover from. Alternatives have been proposed, including 95% of the power maintained for an all out 20 minute test (Allen and Coggan, 2006). However, such arbitrary treatments of the data convey a false sense of precision and may lead to inappropriate training decisions.

A viable alternative comes in from of the Critical Power algorithm, ï¬rst proposed in the 1960â€™s by Monod and Scherrer, which looks at a series of tests between a few minutes and 20 to 30 minutes. A plot is made of the number of joules generated at different time points, and a best ï¬t line connecting them is calculated. The slope of this line is calculated (j/s or watts) which is dubbed the â€œCritical Powerâ€; theoretically, a power the athlete can maintain indeï¬nitely without fading.

This model almost certainly provides an overestimate in two senses: both the purely anaerobic/nonrenewable component of the effort (equal to the y-intercept of the line), and the fact that an exercise task at Critical Power could be maintained indeï¬nitely. This said, it turns out that the Critical Power is quite close to 1 hour maximal / â€œthresholdâ€ power.

We
have found two important factors in the development of BikeScore.
First, a good result can be had by simply using the results of a
short (3 minute) and a long (20 minute) test, provided that both the
short and long test are undertaken at maximal effort. This method was
found to have an excellent correlation to measured 40k power in a
small group of elite / professional triathletes (r^{2}
>
0.95, n = 5, data not shown), and a group of amateur triathletes (r^{2}
>
0.92 , n = 10, data not shown), with Critical Power seeming to
overestimate 40k TT power in the latter more than the former. This
leads us to believe that the difference between measured 1 hour
maximal power and the calculated Critical Power are, as a practical
matter, negligible. We have further found that the Critical Power
paradigm provides a useful framework to evaluate athletes without
being overly imposing / detrimental to their training schedule.

We calculated an impulse-response model using commercially available software (RaceDay Performance Predictor, PhysFarm Training Systems LLC, Clark NJ), comparing Cogganâ€™s TSS metric with BikeScore (Figure 3a and 3b).

Figure
3a: Modeled (green) vs. predicted performance (black dots) using TSS
as the input function. A high degree of correlation may be observed
(r^{2}=
.7551).

Figure 3b: Modeled (green) vs. predicted performance (black dots) using BikeScore as the input function. Though a high degree of correlation may be observed, there is no signiï¬cant difference as compared to TSS.

As
is evident from the graphical representations above, there is no
signiï¬cant difference in model ï¬t
using the different metrics and both models delivered the same
parameters (k1=
1, k2
= 5,
T1
= 16,
T2
= 3).
This indicates either would likely deliver equal performance in
modeling applications, and in an examination of a number of
additional cases, we have yet to ï¬nd a situation
where one metric appears to be a substantial improvement over the
other. (This is not necessarily surprising given the quality of
results possible using a metric as crude as TRIMPS, which *also*
yields
similar model performance).

Although it remains to be seen whether the above changes represent a substantial (indeed any) improvement over Dr. Cogganâ€™s rather robust solution, we suspect that more formal studies will demonstrate no signiï¬cant difference between these two different solutions. In point of fact, the author was one of the ï¬rst people to validate Cogganâ€™s metric for cycling and a novel variation that permitted similar calculations for running (Skiba, 2006 & 2007a), and to advance a similar protocol for the calculation of the stress of swimming and cross country skiing (Skiba 2007b & 2007c). However, BikeScore was developed and released as a possible incremental improvement (which would also be be freely licensed for noncommercial / academic / open-source use), and for the moment seems to be at least as useful as the work which originally inspired it. We hope this will encourage further work in the ï¬eld and the investigation of methods to improve these sorts of tools.

Banister EW, Calvert TW, Savage MV. A systems model of training for athletic performance. Aust. J. Sports Med 1975; 7:57-61.

Banister EW. Modeling elite athletic performance. In: MacDougall JD, Wenger HA, Green HJ, eds. Physiological Testing of the High-Performance Athlete. Champaign, IL: Human Kinetics; 1996: 403-424.

Busso T. Variable dose-response relationship between exercise training and performance. Med Sci Sports Exerc. 2003; 35(7):1188-1195.

Coyle EF, Coggan AR, Hopper MK Determinants of endurance in well- trained cyclists. J Appl Physiol. 1988; 64 (6): 2622-30.

Coyle EF. Physiological determinants of endurance performance. Journal of Science and Medicine in Sport. 1999; 2(3): 181-189, 1999.

Coggan, Andrew R. Making sense out of apparent chaos: Analyzing on the bike power data. In: The Science of Cycling: Transforming research into practical applications for athletes and coaches. Highlighted symposium, American College of Sports Medicine 53rd Annual Meeting. May 31, 2006.

Allen, H and Coggan AR. Training and racing with a power meter. Boulder, CO: VeloPress; 2005.

Skiba, Philip Friere. Quantification of Training Stress in Distance Runners. Arch Phys Med Rehabil 87:29, 2006.

Skiba, Philip Friere. Evaluation of a Novel Training Metric in Trained Cyclists. Med Sci Sports Exerc 39:5, Supplement, 2007.

Skiba, Philip Friere. Evaluation of a Novel Training Metric in a Trained Triathlete. Clin J Sports Med (In Press). AOASM Conference Lecure, April 2007.

Skiba, Philip Friere. Calculation of Optimal Taper Characteristics in an Amateur Triathlete. In review. 2008.