3DMark Time Spy-Gate: In Summary

What do you get if take a pair of new GPU architectures, add a new API and a new benchmark…. Answer? A whole load of debate and plenty of discordant noise. It all started when some clued up people on the Overclock.net started debating the relative pros and cons of Futuremark’s decision to not implement any vendor specific architectural optimizations in its latest Time Spy benchmark which was rolled out earlier this week.

Claims from the AMD fanboy estate fiercely point out that AMD has invested a great deal in making sure that its architecture is optimized for the arrival of DX12, and more specifically, ‘asynchronous computing’. Asynchronous computing in DX12 allows for applications to make specific choices about how the workload is distributed across GPU cores. This means that a 3D video game application can a) know which GPU is being used and b) make workload and queuing decisions to optimize the experience.

One example that is used to illustrate just how AMD have worked hard and succeeded at making its Polaris GPUs optimized in a specific title is the massively improved experienced in Doom when using the Vulkan API (an alternative to DX12 which also supports asynchronous computing). According to German tech reviewers Computebase, a true implementation of asynchronous compute would give AMD a significant performance boost, whereas Nvidia would see significantly less improvement.

So why did Futuremark decide to not implement vendor specific optimizations for asynchronous computing? After all it is a key feature of DX12, and Time Spy is billed as the first DX12 benchmark? A statement released by the company states:

“Asynchronous compute is one of the most interesting new features in DirectX 12…. The implementation is the same regardless of the underlying hardware. In the benchmark at large, there are no vendor specific optimizations in order to ensure that all hardware performs the same amount of work. This makes benchmark results from all vendors comparable across multiple generations of hardware. Whether work placed in the COMPUTE queue is executed in parallel or in serial is ultimately the decision of the underlying driver.”

So Futuremark are clearly trying to dispel any accusations of bias, instead arguing that using vendor specific optimizations would in fact be unfair. Users commenting on a reddit thread on the subject tend to disagree however:

“…they just confirmed it's not a proper DX12 benchmark due to it not utilizing the benefits of DX12 low level optimization, all in the sake of "fairness" they used a single path... the path that fits Pascal architecture capabilites.”

Reading the forum thread on OC.net and other comments around web, it’s clear that emotions between green and red camps can certainly run high. My view is perhaps that the customer should ultimately have a choice. If I want to assess how well a GPU vendor is doing in terms of low level optimizations to ‘get closer to the metal’ of a GPU, why shouldn’t a benchmark app provide me with that opportunity? Likewise, if I am of the opinion that a single, common path or compute implementation is fairer, perhaps I should have that option too.

Yup. I vote for an on/off switch. Please add your thoughts in the forum thread below.