Sorting Averages

What is the best way to sort items by average rating if many of the items have only a handful of ratings?

  • Product Star Ratings: Which product is better based on their star ratings, the one with 2 5 stars reviews, or the one with 800 5 stars, 300 4 stars and 82 1 stars?
  • Player Win/Loss Rating: Who’s the better player given their win-loss record, Bob (1-0) or Jack (82-29)?
  • Thumbs Up/Down Voting Systems

Here’s a Google Sheet I made to play with a few different scoring systems.

Additive Smoothing

With additive smoothing, you assume everything starts with a default value. For example, all posts start with 1 upvote and 2 downvotes, or all players start with 20 matches. How you control that default is where is gets interesting.

Laplace Smoothing

See my Cocktail App - Recommendation section for a real example of using Laplace Smoothing to rate cocktails based on ingredients and your taste preferences.

This basically makes an assumption that all the things you’re ranking start with some number of upvotes and downvotes and greatly simplifies the math required to get a good scoring system going.

$$rating=\frac{(up + default)}{((up + default) + (down + default))}$$

You can just plug in a constant for default.

So for something with no votes, and a default of 7:

$$rating=\frac{(0 + default)}{((0 + default) + (0 + default))}$$

$$rating=\frac{default}{(default + default)}$$



Lidstone Smoothing

Lidstone smoothing is essentially Laplace smoothing where the default value is not hardcoded, but a variable parameter you can tweak.

Example: using different defaults for up and down votes

Similar to the Smoothing used above, but uses different default values for up and down votes.

$$rating=\frac{(up + upAvg)}{((up + upAvg) + (down + downAvg))}$$

As an example for something with 13 up votes and 2 down votes. We’ll assume everything by default has 4 upvotes and 7 downvotes.

$$\text{score}=\frac{(up + 4)}{((up + 4) + (down + 7))}$$

$$\text{score}=\frac{(13 + 4)}{((13 + 4) + (2 + 7))}$$

$$\text{score}=\frac{17}{26} = 0.6538$$

Wilson Confidence

This is probably the first thing people come across when they try to learn about sorting averages. Popularized from Evan Miller’s How Not toSort By Average Rating post, but the math is complicated, something like:

$$\frac{{\hat{p} + \frac{{z^2}}{2n} - z \sqrt{\frac{{\hat{p}(1 - \hat{p})}}{n} + \frac{{z^2}}{4n^2}}}}{{1 + \frac{{z^2}}{n}}}$$


Related Notes