They're (Almost) All Dirty: The State of Cheating in Android Benchmarks

Refreshing. That is what AnandTech's coverage of the mobile segment is. In fact, that is what most of their web coverage brings to the (e-)table. Judging by the replies we find in our "Meaningless award"-thread, a lot of people are sick and tired of media presenting products in the most positive light possible, ignoring some of the most obvious faults and issues. AnandTech however, calls out even the biggest of players in the mobile segment. Oh, so refreshing!

The story of the day is a continuation on the Samsung cheating scandal from a couple of weeks ago. In a short news article, AT points out that Samsung is not alone. In fact, many companies are involved with cheating the benchmarks. As Preetam from Nextpowerup writes: "Well, we aren't mad at you Samsung. In fact, companies like HTC, ASUS and LG were also caught cheating in benchmarks. The point is that we consumers would rather have your developers spend their time enhancing the functionality of the phone's software, bring new features, tweak existing ones and work towards providing an even better experience. This is instead of spending all those resources optimizing the SoC in the device to perform better in certain benchmarks to be able to score an edge over the competition in reviews. It's partly the fault of phone reviewers and their heavy emphasis on the mobile benchmarks which has forced OEMs to get to tuning, to be able to out-do phones from other manufacturers."

In the meantime, Samsung is trying to control the situation. After all, not only AnandTech is covering this issue, Arstechnica is in it as well. In a statement to CNET UK, Samsung defends the Note 3. It says, ""The Galaxy Note 3 maximises its CPU/GPU frequencies when running features that demand substantial performance. This was not an attempt to exaggerate particular benchmarking results. We remain committed to providing our customers with the best possible user experience."

I'd say this hilarious meme from Nextpowerup (below) sums it up quite nicely.

The conclusive lines of AnandTech's coverage makes all this also relevant to the industry we're mostly tied to. He writes,

"The hilarious part of all of this is we're still talking about small gains in performance. The impact on our CPU tests is 0 - 5%, and somewhere south of 10% on our GPU benchmarks as far as we can tell. I can't stress enough that it would be far less painful for the OEMs to just stop this nonsense and instead demand better performance/power efficiency from their silicon vendors. Whether the OEMs choose to change or not however, we’ve seen how this story ends. We're very much in the mid-1990s PC era in terms of mobile benchmarks. What follows next are application based tests and suites. Then comes the fun part of course. Intel, Qualcomm and Samsung are all involved in their own benchmarking efforts, many of which will come to light over the coming years. The problem will then quickly shift from gaming simple micro benchmarks to which "real world" tests are unfairly optimized which architectures. This should all sound very familiar. To borrow from Brian's Galaxy Gear review (and BSG): "all this has happened before, and all of it will happen again.".

Interesting note: Apple is apparently not cheating the benchmarks.


8

Belgium Massman says:

This might be a good time to have a critical look at what's going on at our side of the spectrum. Tesselation is going to be a topic of discussion again once the new AMD cards hit the market.

Finland FM_Jarnis says:

Massman said: This might be a good time to have a critical look at what's going on at our side of the spectrum. Tesselation is going to be a topic of discussion again once the new AMD cards hit the market.


You know FM's stance; Disabling / modifying tessellation modifies the workload and is against FM policy.

If AMD would release a driver that did this by default, the driver would obviously be never approved (and none of the AMD drivers were approved for a time until tessellation settings check was implemented so we could reject just the results where the setting was modified.

Belgium Massman says:

I had a very interesting chat about this Tessellation problem with a fellow overclocker the other day. We were wondering why tessellation is so explicitly forbidden, but Nvidia's lod is not even checked on. Both reduce image quality, both should be banned.

K404 says:

You answeed this ages ago: LOD is not enforceable. Tesselation is yes/no with one big boost in performance. Anyway, this is partly off-topic. Smartphone makers should have learned from the mistakes of PC component makers. Smart people learn from the mistakes of others.

Finland FM_Jarnis says:

Massman said: I had a very interesting chat about this Tessellation problem with a fellow overclocker the other day. We were wondering why tessellation is so explicitly forbidden, but Nvidia's lod is not even checked on. Both reduce image quality, both should be banned.


Last I checked, the LOD thing is not an open setting on the driver, but a registry hack. Has this changed?

There is an infinite swamp of undocumented or poorly documented registry things if we go that way. The key bit was that Tessellation is an openly available user-definable setting on the driver. A "normal user" could modify that without considering it to be odd or wrong, get a score and think it was real.

LOD modding would require explicit intent to cheat, with specific knowledge. You cannot "accidentally" LOD-tweak yourself a bogus score.

Belgium Massman says:

You are right, it has not changed. It still is the same registry hack. Since it has not changed, why not check for it? Of course, in theory there is an unlimited amount of undocumented registry adjustments possible. But in reality it is just the single one. I don't see why it would be so outrageous to scan for that. If AMD takes Tessellation control out of the Catalyst driver, and just publishes the exact registry key where the user can adjust it, would that make it legal again? The underlying problem is that the benchmark results are not checked for image quality. If you'd compare the rendered image to a reference image, and check if the image quality is within the legal range, there would be no debate on Tessellation versus LOD. Both would be allowed as long as the rendered image quality is within the legal deviation of the reference image quality.

Norway knopflerbruce says:

Bye bye LSD style artifact runs :D Christian Ney will kill you both when he reads this.

Belgium Massman says:

ArsTechnica: Note 3’s benchmarking “adjustments” inflate scores by up to 20%

The inclusion of GFXBench is surprising given that it shows no unusual idling behavior in System Monitor. Between the inclusion of that and the suspicious "frame rate adjustment" string, it's clear that Samsung is doing something to the GPU as well, though those clock speeds are more difficult to access than the CPU speeds (a method used by AnandTech on the international S 4 no longer works on the Note 3). The "DVFS" in "DVFSHelper" stands for "Dynamic frequency scaling," also known as CPU throttling, which has many legitimate uses to manage both heat and power draw. This file contains a few special settings for the camera, Gallery, and some other packed-in apps, but nothing like what is in the above section. Benchmarking apps are the only type of app that is systematically called out and boosted. To see how some other benchmarks are affected, we made "stealth" versions of those, too—the exact same app, just with a different package name. These results back up the Geekbench findings: we're seeing artificial benchmark increases across the board of about 20 percent; Linpack showed a boosted variance of about 50 percent.
Samsung is covering their tracks instead of resolving the issue. Not unexpected, though.

Please log in or register to comment.

Leave a Reply: (BBCODE allowed: [B], [QUOTE], [I], [URL], [IMG],...)