Ice Storm Unlimited Sixth New 3DMark Preset - Is Futuremark Diluting The Message?

A picture posted on Facebook, and a quick Google Search. That is what it took for me to be confused. Well, not really confused of course. But seeing yet-another version of 3DMark appear certainly had me raise my eyebrows. In the latest 3DMark for Android update, version 1.1.0.1179, Futuremark has added the Ice Storm Unlimited benchmark. Next to the regular Ice Storm and Ice Storm Extreme, this is the third variant that you can test your hardware with. Unlike the previous two versions, the Unlimited benchmark is not available on iOS or Windows platforms.

But what is the unlimited version exactly? According to Futuremark:

We've added a new test to 3DMark for making chip-to-chip comparisons across mobile devices and operating systems. 3DMark Ice Storm Unlimited is included in the iOS edition and can be added to the Android edition by updating to the latest version, available now from Google Play.

3DMark Ice Storm Unlimited provides an accurate way to measure component performance without vertical sync, display resolution scaling and other operating system factors affecting the result. In Unlimited mode the rendering engine uses a fixed time step between frames and renders exactly the same frames in every run on every device. The frames are rendered in 720p resolution "offscreen" while the display is updated with small frame thumbnails every 100 frames to show progress.

How to choose the right benchmark for your device?

Forced vertical sync on iOS and Android devices limits apps to displaying a maximum of 60 frames per second. Your score will be shown as "Maxed out" if your device hits the vertical sync limit during a test. This indicates that the test is too lightweight for your device. 3DMark will recommend the best test for your device to avoid vertical sync limits. Use 3DMark Ice Storm to make device-to-device comparisons of mainstream smartphones and tablets. Ice Storm includes two Graphics tests designed to stress the graphical (GPU) performance of your device and a Physics test to stress its processing (CPU) performance. Devices that are too fast for Ice Storm will be prompted to run Ice Storm Extreme instead.

3DMark Ice Storm Extreme raises the Graphics tests' rendering resolution to 1080p and uses higher quality textures and post-processing effects to create a more demanding load for the latest mobile devices. Devices that are too fast for Ice Storm Extreme will be prompted to run Ice Storm Unlimited.

Use 3DMark Ice Storm Unlimited to make chip-to-chip comparisons of different chipsets, CPUs and GPUs without vertical sync, display resolution scaling and other operating system factors affecting the result.

The 3DMark we are not allowed to call 3DMark 2013 was released on February 4 2013, with the Android version available on April 2 2013. The original version feature four tests, now we have six. Plus 3x 3DMark11, plus 4x 3DMark Vantage, and plus a couple of older benchmarks. That is a lot.

Perhaps I am slightly too negative when I say Futuremark is diluting their own message. What is your 3DMark score? Wait, what preset are you running? Ice Storm, what version? Extreme, that's better than Unlimited right? No. Ehr. Well, I have 5000 anyway.

What do you think?

Perhaps an interesting (and relevant?) TED Talk, "Barry Schwartz on The Paradox of Choice"


25

Belgium Massman says:

I will assume it works well for Futuremark though. I just wonder if six presets in one product is not too much. Having a choice is good, having too much choice is bad.

Belgium Massman says:

Fyi

I'll just remark here that Ice Storm Unlimited is available on iOS and will be coming to Windows too.

Finland FM_Jarnis says:

I should point out that Unlimited test is in iOS and it will also be in Windows RT version.

We are also going to update the desktop version.

We are covering a massive range of devices and capabilities with 3DMark. You can and should make the choice based on the hardware you are benchmarking.

Belgium Massman says:

FM_Jarnis said: We are covering a massive range of devices and capabilities with 3DMark. You can and should make the choice based on the hardware you are benchmarking.


Hehe, that is kind of the point of that Paradox of Choice TED Talk :p.

You are supposed to be the people with knowledge, we have inferior knowledge. There are three Ice Storm tests, and I have difficulties understand which I should use to test and compare my device with and why.

From that thread over at FM forums, I see the following scores with an iPhone 5:
[list]
3355 on Ice Storm Extreme
Ice Storm: Maxed Out
Ice Storm Unlimited: 10592
[/list]
Is "maxed out" a new type of score (never seen it on earlier Android 3DMark versions)? Will that also pop up on a Desktop version? How come the score of ICX > ISU here?

A lot of questions :(.

Christian Ney says:

@Massman,

When the score says "Maxed Out!" it means 60+ FPS has been hit (Devices are frame limited by the vsync) or the score is higher than 10'500.

Ice Storm Unlimited, is same as Ice Storm but this one does kind of off screen rendering to "bypass" the vsync limit.

ISX score isn't higher than ISU.

Indonesia Lucky_n00b says:

So, basically this 'Ice Storm Unlimited' is the version of 'normal' Ice Storm, but without the Vsync limit?

Belgium Massman says:

I think Unlimited is the "offscreen mode" in GFXBench

K404 says:

Will HWB add this preset too?

Finland FM_Jarnis says:

Massman said: I think Unlimited is the "offscreen mode" in GFXBench


That's pretty much what it is, and yes, it uses normal Ice Storm workload content.

Maxed Out = Device got too close to 60fps during the test and vsync cap influenced the score. It is indeed roughly around the point where the score goes above 10500 points but there is no fixed "score" - it depends on the peak fps.

The device UI tells you to use another test when it thinks you will be maxed out on the test you are about to run. This applies only to devices that are mapped in the database, ie. ones we have data for.

Edit: also desktop will not show Maxed Out because it is not vsync limited.


How come ISU > ISX?


ISU = IS in offscreen mode without render to screen. ISX is a heavier workload.

We actually considered using Extreme for the offscreen mode but unfortunately some low end phones can crash in Extreme and so Unlimited would have crashed as well. For the purposes of determining device performance both tests work when vsync was taken out of the equation.

Belgium Massman says:

Does that mean we can also expect an Ice Storm Extreme Unlimited in the future?

Finland FM_Jarnis says:

Massman said: Does that mean we can also expect an Ice Storm Extreme Unlimited in the future?


Currently no plans. We hope to move to a new test also on mobile side next as Ice Storm is getting too lightweight.

Belgium Massman says:

A seventh test?

Finland FM_Jarnis says:

Massman said: A seventh test?


No. ;)

United States Bobnova says:

No it's been months since the last pay2play benchmark, time for another one!

Finland FM_Jarnis says:

What you are talking about? We've stated already that our goal is to have one thing; "3DMark". Covering everything from mobile phones to Quad-GPU monsters, DX9, 10 and 11 level hardware, with any new tests slotting in to the existing framework.

Additional test(s) to mobile versions are planned to be free (at least in the foreseeable future). And it's not going to be a brand new test. We already have Cloud Gate which is imminently doable on fastest mobile devices we have now... and should stress them pretty well. It's just not doable on OpenGL ES 2.0, so it'll take a bit.

Couple of years and we might see Fire Strike on the mobile devices as well, tho most likely there we may need to have a "lite" version first as Fire Strike is SO much more demanding than Cloud Gate. Who knows.

We could remove tests no longer matching modern hardware (one might argue Ice Storm default run is already like that, tho there are still new devices coming to the market, mostly from China, for which it is relevant) but knowing what (some of) you guys said last time when we retired more than ten year old 3DMark 2001 SE, we're reluctant to do such a thing. At the same time we don't want to have support hell with numerous different versions of the same installer. So yes, there are going to be many tests, for many different hardware. We'll try to explain better which test targets which kind of hardware (see all those "Too long, didn't read" texts in desktop 3DMark UI? That's what they are for) and allow you to make an informed choice which test fits your use case.

But you guys still will run them all on everything under the sun, with and without supercooled liquids... and then we get to hear how "Ice Storm gives silly numbers of Quad Crossfire" etc. when the test is designed for ultrabooks, tablets and mobile phones.... and at the other end, makers of barely-DX11-capable low power hardware complain how Fire Strike has unacceptably high run-to-run jitter - when average FPS is dancing between 1.1 and 1.4fps on the system (and how fire strike extreme crashes on the same device) :)

Benchmark development is hard. Benchmark development across multiple platforms is insanely hard.

United States BeepBeep2 says:

FM_Jarnis said:
But you guys still will run them all on everything under the sun, with and without supercooled liquids... and then we get to hear how "Ice Storm gives silly numbers of Quad Crossfire" etc. when the test is designed for ultrabooks, tablets and mobile phones.... and at the other end, makers of barely-DX11-capable low power hardware complain how Fire Strike has unacceptably high run-to-run jitter - when average FPS is dancing between 1.1 and 1.4fps on the system (and how fire strike extreme crashes on the same device) :)

Hmm, that's a bit of an insult to the benching community...
If you lower points calculation digits nobody cares about silly numbers!

Low end device scores 600? Great.
High end device(s) gets 220 FPS and scores 27,000? That's okay as long as the benchmark doesn't look like 3DMark03 with LOD mod.

FM_Jarnis said:
Benchmark development is hard. Benchmark development across multiple platforms is insanely hard.

Then don't make it so hard on yourselves!

Finland FM_Jarnis says:

I did not intend to insult anyone. My apologies.

Point was, the performance difference between the possible hardware configurations is so massive that same workload / test cannot possibly be applied to everything and expect that the results are accurate and stable (easy to reproduce).

Let me try to explain;

Benchmark running at below 20fps looks ugly to the "average" user ("why is my device doing this poorly?")

Benchmark running at below 5fps loses measurement accuracy - tiny changes in fps become big changes in score.

On mobile devices anything close to 60fps (or over it) leads to vsync cap issues. And right now on Android/iOS/RT the fastest devices available are over 20 times faster (actually, more like 30-40x) than the slowest that can run Ice Storm. So if hypothetically the fastest device available today would peak at 40fps, that means the slowest ones would run that test peaking at 2fps (looking terrible and having score jitter issues). Also peaking at 2fps would mean that at times it would run at below 0.5fps... can you say slideshow?

On laptop/desktop space the performance range is even more massive due to SLI and Crossfire. You effectively have even larger (40-50x?) performance difference between the fastest and the slowest hardware and then on top of that you can pile up to four copies of the fastest card and stretch that to mind-boggling levels.

As an example, the "high end runs at 220fps+" happens at Cloud Gate - 200-300fps is what you see at stock clocks on high end multi-GPU systems. There are plenty of low end tablets and netbooks on the Windows side that simply cannot run Cloud Gate and the slowest ones that can run it get single digit framerates. Most mobile phones available today cannot run it (too complicated) even if we had the OpenGL version ready today.

Also from the benchmark accuracy standpoint, it is not okay if high end device runs at 200fps+ because even without vsync complicating issues on the desktop, as framerates rise, CPU becomes bigger and bigger factor in rendering - a workable 3D *graphics* benchmark cannot be CPU limited. The CPU-limited issue is why Futuremark considers older benchmarks like 2001SE, 03, 05 and even 06 to be somewhat pointless for today's hardware. Yes, they can be used for competitive overclocking benchmarking - as mostly a CPU benchmark - but that's a bit redundant when the benchmark already has a test designed for just that (CPU/Physics test) and the graphics tests should scale with the video card but no longer really do it.

In Cloud Gate you already see that the CPU is a factor in some cases - very slow CPU plus very fast GPU setup can already be limited by the processor.

In fact, with 3DMark Vantage we really tried to push a set of really heavy GPU-stressing tests that wouldn't suffer from being CPU limited for a long time. A lot of people hated it because at launch it ran often at 5-10fps on perfectly fine gaming machines (and I guess some of the art wasn't the greatest - Jane Nash and all that...).

Ice Storm on the desktop can scale to mind-boggling FPS numbers - 1000+ is easily doable - and code there was carefully optimized so it wouldn't hit CPU bottlenecks. Yet it still isn't a very good benchmark for the high end desktop cards; Even a tiny CPU load spike (some background app doing something) can cause massive FPS bounces, leading to scores that are not staying consistent between runs. I'm sure you have all seen that for Ice Storm records, you probably need to run it a bunch of times and hope for a lucky run. That's not good for a benchmark :( - luckily it doesn't do that on the target hardware (well, except due to thermal throttling-induced jitter on phones but that's a whole different story)

Also when framerates get very high, the actual bottleneck in the GPU can vary between architectures, leading to scores that look, at the first sight, illogical.

Yet if it were any heavier, it would run at sub-10fps on some very slow mini-notebooks (the designed target hardware) and look bad & have measurement precision issues.

Remember, for us (and for the hardware industry partners) these benchmarks are intended to be accurate measurement tools and how various hardware performs in 3DMark can be a Very Big Deal. At the same time the tests should look cool for the "average" users who like to use them to show off what their new rig can do and with the latest 3DMark, it should do this on anything from cheap few-year old mobile phones, all the way up to quad-Titan monsters, with mind-boggling differences in performance and supported features.

Also when developing the tests, we need to peer at the crystal ball and try to guess how devices will perform year or two in the future when the benchmark finally ships. Ice Storm already suffered a bit from a faulty crystal ball - when the workload was designed, we were pretty sure we had a good idea how mobile device performance would improve over the next year or two. Nope. Under-guessed by 2-3x and had to "retrofit" Ice Storm Extreme in a hurry and even that wasn't really enough, hence Unlimited mode. Not optimal, but at least the benchmark still works as it should, objectively ranking the available mobile hardware.

K404 says:

Shit devices get shit scores. It is not for Futuremark to release software that runs in such a way to make people feel better about their purchase :) It's not Futuremarks problem :) "Be who you are"

Belgium Massman says:

I understand your point of view, Jarnis. I guess it is always difficult to try to make as much people happy. I have been thinking about this a lot the last couple of days, and I can't really pin down why I am concerned about the diluted message. I guess with Vantage and 11, there were only respectively 4 and 3 presets which is a lot less confusing. I have never really understood (or looked into) which preset should be used for what type of hardware. Quite frankly, I am not even sure if it matters that much for the consumer. I suppose it might matter for your partners. For internal testing, it doesn't really matter what type of workload you throw at it, I think. Question: why is the art of the presets in 3DMark different, but for 3DMark11 and Vantage the same? I would guess it is useful for marketing purposes? When I see advertising of a laptop that says "Vantage E6500", I honestly have no clue what that means. Then again, Cloud Gate 10,000 I don't really understand either. I guess maybe the main critique would be that the default Ice Storm did not last long enough, and that you need anoter spin-off to work around the vsync issue. It sounds to me like you're torn between choosing to evaluate the performance of a device, or the hardware inside that device. Question 2: why is there still no support for Linux?

United Kingdom borandi says:

Then the answer is simple. Have a pre-rendered video and do the rendering off screen. Lots of mobile tests are being rendered off-screen in order to have a consistent test. "While the benchmark is running, for your enjoyment, please watch this pre-rendered cat video/benchmark scene" Besides, I liked the artwork in Jane Nash and Vantage, especially compared to 3D11.

United States Bobnova says:

FM_Jarnis said: <snip>


So in short, Futuremark set itself an insanely difficult goal with "3DMark", and we shouldn't expect to be listened to at all because it's a hard goal that futuremark chose for itself?

That's the message I'm getting, at least.

Please, keep in mind where you are: An extreme benchmarking forum. This is not evga.com, this is not randomgamingforum.com, this is HWBot.org.
<sparta.jpg>

Do you think voltmods are easy?
Do you think learning the bazillion tweaks is easy?
Insulating in such a way that you don't kill the HW one way or another?
Memorizing the proper screenshot method?
Making sense of byzantine memory timings?
Have you actually tried doing what we do? (No disrespect if you haven't, but IMO you should just for the perspective)

It's not. It's hard. So please, take the "we set ourselves a hard goal so give us some slack" bit and stuff it back into a can and bury it somewhere. It's all hard here.

Now when I do want to compare devices, I won't be comparing a phone to a PC, nor a tablet to a PC, and probably not even a tablet to a phone. I'll be comparing a PC to a PC, a phone to a phone, and a tablet to a tablet.
I don't really care how my phone stacks up to my PC, it's not relevant to me.
I'll admit that it's interesting, but not enough to buy it (that's the point right?) and I wouldn't call it useful.

I understand that comparing massively different hardware grades is difficult even on the same platform. I think we all do here, we're benchers remember? We've all looked at say, 3d05 on 1x/2x/3x/4x GPUs. I think we're smarter than the average bear here, or at least better informed.

My train of thought has been derailed, I was going somewhere but I no longer remember where. Crap.
Guess I need more coffee.


What I personally would like to see, which I do not expect to see, and may well not be ideal for Futuremark's wider audience (we're vocal, but we're not large, and I expect a decent percentage of benchers don't actually buy the benchmark anyway. Certainly not the kilobuckish corporate version).

One test for mobile. Show a video and calculate off-screen. Hell show a video and pull frames out of it to simulate the low frame rate on slow stuff.

One test for mainstream PCs, something aimed at 5fps on a 7770. If you're benching something slower than a 7770 you should know that it's crap, because hey guess what? It's crap! In an ideal world a load aimed at 5fps on a 7770 (or GTSwhateverthebunny) will still be vaguely reasonable on a 7970/GTX780/titan/whatever. By reasonable I mean double digits. I'd run this at 1080p, with a crapload of polygons and textures and such.

One test for OMFG PCs. Fire Strike Extreme is nice for this. Got a single GPU lower than a 7950? Say hello to jittery (and crap) scores, or the thing bunnyextractionting itself on load, you're not supposed to be running this on a 7770 anyway. IMO the ability to run the Top Benchmark Anywhere ought to be a selling point. Being able to do it in a double digit frame rate even more so, and being able to actually manage 100fps? OMFG!

Now before you get the idea that I don't like 3DMark (please put a number on these things, calling it 3dmark is really awkward, 3dmark13, 3dmark2013, whatever), I really like 3DMark. I didn't expect to, but it's very cool. Firestrike and FSx are awesome. They aren't that far from what I just described, either.

To finalize: I like the direction 3DMark2013 took from 3d11 (I did not like 3d11), the offscreen rendering to allow me to run a 1440p benchmark on a 1080p monitor is awesome. The OMFG hard test I can run is awesome. Comparing my phone to me PC is useless, but fun. (Comparing it to my HTPC is depressing). What I don't see a need for is a bazillion presets. I guess I don't see much need in comparing a GMA3100 to a GT210 to a HD7200. Maybe I'm in the minority here, but I feel like that market segment either A) doesn't care, or B) sucks at doing research anyway. If they cared, they wouldn't buy a bunnyextraction GPU/bunnyextraction onboard. If they could do research, see previous answer :P


tl:dr version: I don't think I can make a tl:dr version, K404 did a pretty good job of tl:dring this post though, IN THE PAST. Dude can time travel, impressive.


EDIT: (the modern version of P.S.) I would absolutely LOVE to see more realistic space stuff, rather than the classic "let's pretend there is atmosphere to turn with in space" operation. That might actually be the #1 most annoying thing about the the 3DMark series in my book. I'm that shallow.

Belgium Massman says:

FM_Jarnis said: but knowing what (some of) you guys said last time when we retired more than ten year old 3DMark 2001 SE, we're reluctant to do such a thing.


My thoughts on this are quite simple. 3DMark2001 SE is together with SuperPI 32M the benchmark on which competitive overclocking and benchmarking is founded on. It is such a significant part of this community's history. For a lot of hardcore overclockers (even today), that is the benchmark they grew up with. Having that one retired is for a lot of folks just an emotional problem.

Completely off topic: what I have the most issues with regarding the retiring of 3dmark01 is the fact that you have erased such a big and important part of our history. Back when the decision was taken, HWBOT reached out to maybe take over a part of that result database for the sake of history, but we received a "nay". Basically, we don't have any overclocking history from before May 2006 (which is when we switched to our current site). You had the data to re-build that history.

K404 says:

Bobnova said:

tl:dr version: I don't think I can make a tl:dr version, K404 did a pretty good job of tl:dring this post though, IN THE PAST. Dude can time travel, impressive.



? :)

United States Bobnova says:

K404 said: ? :)


This:
K404 said: bunnyextraction devices get bunnyextraction scores. It is not for Futuremark to release software that runs in such a way to make people feel better about their purchase :) It's not Futuremarks problem :)

"Be who you are"

United States BeepBeep2 says:

Massman said: My thoughts on this are quite simple. 3DMark2001 SE is together with SuperPI 32M the benchmark on which competitive overclocking and benchmarking is founded on. It is such a significant part of this community's history. For a lot of hardcore overclockers (even today), that is the benchmark they grew up with. Having that one retired is for a lot of folks just an emotional problem.

Completely off topic: what I have the most issues with regarding the retiring of 3dmark01 is the fact that you have erased such a big and important part of our history. Back when the decision was taken, HWBOT reached out to maybe take over a part of that result database for the sake of history, but we received a "nay". Basically, we don't have any overclocking history from before May 2006 (which is when we switched to our current site). You had the data to re-build that history.

+1 for bunnys sake...

...hop hop hop

Please log in or register to comment.

Leave a Reply: (BBCODE allowed: [B], [QUOTE], [I], [URL], [IMG],...)