Non-K Skylake Overclocking Hurts AVX2 Performance, Problem Related to "256-bit Vector Warm-up" (Hardware.fr)

In an interesting article on overclocking the Core i5 6400 non-K processor, Hardware.fr has a closer look at the poor AVX2 performance of the overclocked non-K processors. They ran a couple of tests including the Intel LINPACK suite and a x265 encoding benchmark. They find the performance to be significantly lower when using the OC BIOS compared to the regular BIOS. For LINPACK the performance going up from 3.1GHz to 4.5GHz drops from 178 to 62 Gflops and the x265 performance drops from 5.89 to 4.68 images/s. Why? Power saving.

The site references an entry on Agner's CPU blog which details how the Skylake processors deal with 256-bit vector instructions. Quote: "I observed an interesting phenomenon when executing 256-bit vector instructions on the Skylake. There is a warm-up period of approximately 14 µs before it can execute 256-bit vector instructions at full speed. Apparently, the upper 128-bit half of the execution units and data buses is turned off in order to save power when it is not used. As soon as the processor sees a 256-bit instruction it starts to power up the upper half. It can still execute 256-bit instructions during the warm-up period, but it does so by using the lower 128-bit units twice for every 256-bit vector. The result is that the throughput for 256-bit vectors is 4-5 times slower during this warm-up period. If you know in advance that you will need to use 256-bit instructions soon, then you can start the warm-up process by placing a dummy 256-bit instruction at a strategic place in the code. My measurements showed that the upper half of the units is shut down again after 675 µs of inactivity."

Since the non-K overclocking capabilities disable any form of power management, it appears that Skylake processors are not able to enable the upper 128-bit half of the execution units.

For more detailed information, check out the original article in French at Hardware.fr.


10

Belgium Massman says:




Well, that's that cleared up.

Now the question is ... which motherboard vendor is going to figure out how to solve this one :D

United States xxbassplayerxx says:

I knew it was an artificial Intel limitation......... Bad Intel!!!! :mad:

Ukraine johnwaynr says:

But for gaming it doesn't matter right?

Belgium Massman says:

xxbassplayerxx said: I knew it was an artificial Intel limitation......... Bad Intel!!!! :mad:


What do you mean by artificial limitation?

johnwaynr said: But for gaming it doesn't matter right?


Very few software applications make use of AVX and AVX2. For gaming this doesn't matter since I don't know of any game that uses it.

United States xxbassplayerxx says:

Massman said: What do you mean by artificial limitation?



Very few software applications make use of AVX and AVX2. For gaming this doesn't matter since I don't know of any game that uses it.


They're throttling to save power. Once you go over their pre-determined BCLK, it automatically throttles. Sounds like one of two things. 1. They programmed that to keep people from bypassing their multiplier limit 2. They didn't expect people to get past the limit and there is a hole in the code that relied on a BCLK below 103.

Belgium Massman says:

Hmm, not sure that's entirely accurate.

To use the AVX and AVX2 instruction sets, there has always been an increase in power consumption. This was also the case with Haswell and Haswell-E processors. This is Raja from ASUS quoting his Maximus VI series guide:

There is one issue with Offset and Adaptive Mode that needs to be taken into account. The processor contains a power control unit which requests voltage based upon software load. When the PCU detects AVX instructions, it will ramp Vcore automatically beyond normal load voltage. There is no way to lock Vcore to prevent this if using Offset or Adapative Mode. This is pre-programmed by Intel into the PCU.

As an example, a CPU is perfectly stable at 1.25V using a manual voltage (static), if Adaptive or Offset Mode is used instead, it is impossible to lock the core voltage when running software that contains AVX instruction sets – stress tests such as AIDA and Prime contain AVX instruction sets. When the AVX instructions are detected by the PCU, the core voltage will be ramped an additional ~0.1V over your target voltage – so 1.25V will become ~1.35V under AVX load. If you intend to run heavy load AVX software, we recommend using Manual Vcore, NOT Adaptive or Offset Mode.


On Skylake, the BCLK frequency is artificially limited for non-K processors using the PCU. Simply put: if the PCU detects >103 BCLK, it shuts down the processor. The way the BCLK is 'unlocked' for the non-K Skylake processors is by disabling the entire PCU in the CPU. This way there is no BCLK detection and thus no logic to shut down the CPU if you're over 103 MHz.

Disabling the PCU causes all the side-effects listed by the motherboard vendors: no IGP, no power management and poor AVX performance. What the Hardware.fr article now explains is why the AVX performance is so poor. In order to enable the upper 256-bit address for AVX2, the PCU needs to 1) detect the instructions and 2) enable higher power consumption mode. Because the PCU is disabled, it cannot accommodate for the AVX2 instructions.

The AVX performance for non-K processors is not artificially limited by Intel. It's a side-effect of disabling the PCU to enable non-K overclocking.

United States Schmuckley says:

Massman said: Hmm, not sure that's entirely accurate.

To use the AVX and AVX2 instruction sets, there has always been an increase in power consumption. This was also the case with Haswell and Haswell-E processors. This is Raja from ASUS quoting his Maximus VI series guide:


On Skylake, the BCLK frequency is artificially limited for non-K processors using the PCU. Simply put: if the PCU detects >103 BCLK, it shuts down the processor. The way the BCLK is 'unlocked' for the non-K Skylake processors is by disabling the entire PCU in the CPU. This way there is no BCLK detection and thus no logic to shut down the CPU if you're over 103 MHz.

Disabling the PCU causes all the side-effects listed by the motherboard vendors: no IGP, no power management and poor AVX performance. What the Hardware.fr article now explains is why the AVX performance is so poor. In order to enable the upper 256-bit address for AVX2, the PCU needs to 1) detect the instructions and 2) enable higher power consumption mode. Because the PCU is disabled, it cannot accommodate for the AVX2 instructions.

The AVX performance for non-K processors is not artificially limited by Intel. It's a side-effect of disabling the PCU to enable non-K overclocking.



It lowers multi when I try to OC..such fun..NOT.
I put that thing up for a minute.
Thank you for the homework, though. :)
Happy New Year.

France Niuulh says:

Hi, anyone know if that AVX2 impact perfomance of streaming with sotf like OBS ?

France Niuulh says:

So.. i tested game streaming and that work well.
My i3 6100 @ 4.5G can play in 1080p and stream via OBS to Twitch.tv in 720p30fps.

Germany Hyperhorn says:

Has anybody noticed a performance decrease in programs that don't make use of AVX(2)? I'm curious because I noticed an obvious performance drop with the "Ushio" binary of Y-Cruncher (download source). I suspected to see a large drop with the "Airi" binary that makes use of AVX2 and well ... that happened. But what I didn't expect was a smaller, but still noticabale performance drop with the "Ushio" binary as it - according to my understanding - makes use of SSE4.1 only, but not AVX. I'm talking about ~20 % longer calculation times for 100M (multi-threaded Pi) at the same clock frequencies with Non-K overlocking BIOS 0001 vs. regular BIOS 1001. (Sys: i5-6500, Maximus VIII Ranger, Win 8.1 Pro x64) It would be interesting if someone could confirm this with a Skylake setup. Maybe I've overlooked something (pretty tired :o), but at the moment I'm not aware of any difference other than the Non-K overclocking BIOS.

Please log in or register to comment.

Leave a Reply: (BBCODE allowed: [B], [QUOTE], [I], [URL], [IMG],...)