Back

Comparison Overview

While at Betterfit we always strive to make things better, nine hundred times better can be hard to visualize. This is what it really looks like.

14.6x

0%
0%

more accurate

22.3x

0%
0%

more frequent

899x

0%
0%

more regions

and improving
at a rate of
44% a month

Betterfit flow vs.

Control: Government of Alberta Health Modelling, Public Releases

May 4 2020 to April 4 2021. 

Note: Betterfit flow became operational in October of 2020. For the purposes of the benchmarking study Betterfit flow was run with only access to data up to the date on which it was making a prediction, to avoid an unfair advantage.

Existing Methods

  • Infrequent: Weeks or Months to a model
  • Non-Transparent
  • Lower accuracy, especially initially
  • Low number of variables, must be gathered
  • Lots of assumptions
  • Hard to iterate
  • Low specificity: large generalizations are made to decrease errors

flow

  • Realtime modelling: updated daily at the latest
  • Transparent automatic predictions, daily
  • Higher accuracy, especially initially
  • Many variables, automatically input = Low assumptions
  • Iterate in seconds, live
  • Specific predictions made everyday
free

How are we doing lately?

March 28 Forecast of April 11
Actual
Percent Error
Betterfit Improvement
Calgary Active Cases, Control
4173
6802
38.6%
Calgary Active Cases, Betterfit flow Mark 2 automatic unadjusted prediction
7825
6802
15%
2.57x more accurate
Edmonton Active Cases, Control
1925
3688
47.8%
Edmonton Active Cases, Betterfit flow Mark 2 automatic unadjusted prediction
2975
3688
19.3%
2.48x more accurate
Alberta Active Cases, Control
8599
14720
41.6%
Alberta Active Cases, Control
9861
14720
33%
Alberta Active Cases, Betterfit flow Mark 2 automatic unadjusted prediction
14279
14720
3%
11x more accurate than the most accurate of the 2 Control predictions

 

Control: Government of Alberta Health Modelling, Public Releases

With just 3% Error, Betterfit flow accurately predicted the third wave.

Better when it counts.

Detailed Comparison

Control
Betterfit flow Mark 2 (April 4, 2021)
Betterfit Improvement
Average Error (1)
1545.6%
106.2%
14.6x more accurate
Number of Serious Errors (predictions off by 10x or more)
8.6%
2%
4.3x fewer major errors
Accuracy on Outliers (largest error)
53954%
24400%
2.2x more accurate on outliers
Number of Separate Regions
3.6 (avg)
3238
899x more specific
Number of Separate Predictions, Cumulative
116
1,104,158
9,518x more predictions
Number of ‘Models’ (including high level updates)
15
335
22.3x more frequent
Number of In Depth Models
2
335
177x more frequent

(1) The average error on both is brought up significantly by outliers.

Control: Government of Alberta Health Modelling, Public Releases

Period May 4 2020 to April 4 2021.

Is this a fair comparison?

We think so. Our values are transparency and honesty, hence why our predictions are open and available. We chose Alberta as a benchmark because they have been relatively more transparent with their modelling, although, even they have only released 15 numerical updates in the first year of the pandemic, 13 of which were only basic estimates, without showing details or methodology. Ontario did not even have this level of modelling to compare against.
If anything, this comparison is biased against Betterfit flow: Provincial estimates are made by health officers and take into account all information they see fit. Their estimates are finished estimates. Our error rate is based on our automatic ‘baseline’ estimate that does not factor into account variant information or given restrictions which can be adjusted by users in the interface. The above benchmarking is comparing Betterfit’s ‘unfinished’ estimates to existing finished estimates.
There are three additional factors that skew the results of the control modelling to appear more accurate.
1) Over half of the control estimates are above a regional level. The less specificity you give with a prediction, the more likely it is to be correct. This is true with numeric predictions as well. If you predict the average of the roll of 2 die, there is a high likelihood that one die will roll under and one will roll over what you predict. However, you will still get to average them out.
2) In addition, the detailed models that were produced by the control group give out multiple predictions for one actual value. This produces the same effect, where each prediction can be inaccurate, and then the total accuracy can be averaged, producing a seemingly higher accuracy.
In contrast to both of these approaches, Betterfit flow produces one prediction per actual value. We built our system to be able to be falsifiable, so that we can recognize our error and continually improve. This actually helps in Machine Learning. Further, this one prediction is made at the most specific level we have adequate data for, which, for every province or federal organization we have compared to, is more specific than existing methods or public predictions. Where these control groups predict at the level of countries or provinces, we predict down to counties. In other words, it is harder for us to be accurate.
3) Further, we make these predictions every day for every region, compared to broad level predictions at infrequent intervals, which could be chosen by modellers so that they release predictions when they are most confident. In other words, it is harder for us to be accurate, and we cannot omit when we are wrong.
We cannot render the effect that this lack of region specificity in predictions has on the comparison to the control group, but we can take an approach to perhaps more fairly render the accuracy of making multiple predictions for one actual value. Instead of averaging these values, we can add together the error and use this in the cumulative average, effectively counting them as ‘one’ prediction.
When we do this, our method improves another 110% in comparable accuracy, to a level of 15.7x more accurate than the Alberta Government control. The fact that we still outperform existing methods even with these disadvantages in the benchmarking study speaks to the power of this technology.