Wednesday, July 18, 2012

Why We Don't Rate Games

Chances are, if you’ve read a review or two from us, you’ll notice that we don’t score games. We have no “x monocles out of y” system to apply to our reviewed content. A few people have asked me why that is, given the propensity of larger review sites to lean heavily on whatever number scale they use.. The answer is a long one, so I thought it best to cover it in this week’s opinion piece.

 7.5, 8, 78%. Those are all scores given to Quantum Conundrum by Gamespot, IGN, and metacritic respectively. Since all reviewed the same game, it would be ideal for the scores to vary solely on what each reviewer appreciated or disliked about the game. However, there are too many external factors for that to be solely the case. Let’s first take a look at Gamespot and IGN. At first glance, it looks like the possibility exists that Gamespot has a higher score granularity than IGN, allowing for an arguably more precise score grading. That launches a debate about what is an appropriate grading system. 0-10? Should it be in increments of 0.5? Maybe 0.1? Or percentages? Perhaps metacritic is the most precise since its granularity is down to single digit percentiles? There’s simply no good way to determine how precise you need to be in a game review. And when you decide on a system, what distinguishes an 8.0 game from an 8.1 game? Or 8.5? How much is half a point worth?

This leads to a very specific problem I’ve had with trying to come up with a point system for reviews: how to grade a good game against other good games. If Uncharted 3 was slightly less good than Uncharted 2, but better than Uncharted 1, and both those games got a 9.5 and 8.5 respectively, that means Uncharted 3 should get a 9.0. But other games in the 9.0 range include Assassin’s Creed Revelations, and I thought Uncharted 3 was more fun. So does it get a 9.1 now? The scores are subjective and making distinctions like that gives a false sense of accuracy, when it actually functions to make the score appear more precise. The difference is subtle, but important. Accuracy defines the correctness of a value, or its closeness to what its true value is. Precision is defined by the number’s reproducibility. For example, if I played NCAA 13 twelve times and gave it an average score of 6.081, my score would be precise, but because I dislike football games, the number is lower than it should be because of bias, making it precise but inaccurate. Here's a great picture that helped me understand the difference from this blog:

Upon investigating further, you’ll find that both Gamespot and IGN actually have similar score granularity, so that must mean that the differences lie with the subjectivity of reviewing games, correct? Not quite. Now we get into the topic of score inflation. On a scale of 0-10, the average score should be 5, meaning neither good nor bad. However, that is not the case with either Gamespot or IGN. This article points to the average score of Gamespot to be a 7, whereas IGN’s is 8. So with respect to their averages, assuming identical granularity, Quantum Conundrum actually received a better score from Gamespot than IGN, with respect to its average. This problem is one that affects nearly everyone who has a score system at some point or another. Be it due to playing only good games, score inflation because of publisher relations, or simply confusion as to what the score deserves when compared to other games in its genre, average scores for a site tend to be not the average of their scales.

According to the review sites, Quantum Conundrum is an average to good game, and has a score as such. IGN then scores NCAA Football 13 with the same score, and Gamespot gives it a 6.5. That can arguably be due to one reviewer enjoying sports games more than the other. And that’s where the fatal flaw in scoring becomes evident. Let’s say we’ve normalized our score, so we’re on a percentage basis, we have an appropriate granularity, and we have an average rating right in the middle, effectively eliminating bias and making our score the most accurate on the planet. Scoring Quantum Conundrum with the same system as NCAA Football 13 makes absolutely no sense. The games have absolutely nothing to do with one another and grading them as such is unfair to both titles.

 NCAA Football 13 should be compared to other NCAA titles, and other football titles. I don’t even think scoring it against FIFA makes sense since both games are so different. Even if you were to give each game a specific genre, you could never have enough genres to properly differentiate titles from one another and give an accurate score in the genre. With a scoring system, it’s simply impossible to rate games and objectively review them.
So what do we do at AristoGamer? At the end of every review, we have a verdict:  a short paragraph summing up the positives and negatives of the game to make it quickly digestible if you were just looking to see if the game was good and don’t really care why. In addition, we include similar games and genres to the game being reviewed. This addition gives an anchor point for our review so that you can decide if this game is worth your time based on your experience with other titles. Not only is this far more useful than a score, it also provides a grounding point for the review. “If you like third-person shooters like Dead Space, online multiplayer games like Uncharted, or strategy games like Starcraft, you’ll love Starhawk” provides far more useful information to the reader than 8/10 and still maintains a sense of brevity.

It also adds a deeper level of insight into the game than a score system by providing context. When you say “x is like y,” you invite personal experience with x into the consideration of y, which brings with it tons of information exclusive to each reader. It also conjures a more useful mental picture of the new game by providing a link to something you’ve already done.  Comparing new titles to similar games also makes the review more inherently true to every individual’s experience because if the review made the game sound good, but you see that it’s like three titles you hate, you’ll be much more wary about rushing in to purchase.

Our system is imperfect—it is inherently subjective, as all reviews are, and I’m sure has more problems I haven’t thought about yet. I strongly believe that this style of reviewing and likening games to one another is a vast improvement over non-grounded, often dishonest numbers. If I asked a friend what he thought about a movie, I’d much rather hear “It was good, kind of like Shawshank Redemption, but with sharks” than “I’d give it a 4/5.”

What do you think about the review system on not only our site, but other sites you frequent?

Related Posts Plugin for WordPress, Blogger...