It’s not every day that we find the regular tech media really willing to pull back the curtain and delve a little deeper into the technology that is driving our modern PCs. Which s why it is pretty refreshing to see Gamers Nexus chief Steve Burke take time to interview two of AMD’s most technically adept employees. A little while back Steve managed to get some quality time with Sam Naffzieger and Micheal Clarke, a Chief Architect from AMD, who is intimately familiar with the processes and strategies that that were used to create the new Ryzen architecture. The whole interview is caught on video, but for those of you who like to read, here’s a sample of their conversation in regards to uOp caching, power optimizations, shadow tags and more:
Michael Clarke:“One of the hardest problems of trying to build a high-frequency x86 processor is that the instructions are a variable length. That means to try to get a lot of them to dispatch in a wide form, it’s a serial process. To do that, generally we’ve had to build deep pipelines, very power hungry to do that. We actually call it an op-cache because it stores [instructions] in a more dense format than the past; what it does is, having seen [the instructions] once, we store them in this op-cache with those boundaries removed. When you find the first one, you find all its neighbors with it. We can actually put them in that cache 8 at a time so we can pull 8 out per cycle, and we can actually cut two stages off that pipeline of trying to figure out the instructions. It gives us that double-whammy of a power savings and a huge performance uplift.”
Sam Naffziger: : “X86 decode, the variable length instructions, are very complex -- requires a ton of logic. I mean, guys make their career doing this sort of thing. So you pump all these x86 instructions in there, burns a lot of power to decode them all, and in our prior designs every time you encounter that code loop you have to go do it again. You have this expensive logic block chunking away. Now we just stuff those micro-ops into the op-cache, all the decoding done, and the hit-rate there is really high [Clarke: up to 90% on a lot of workloads], so that means we’re only doing that heavy-weight decode 10% of the time. It’s a big power saver, which is great. The other thing we did is the write-back L1 Cache. We aren’t consistently pushing the data through to the L2, there are some simplifications if you do that, but we added the complexity of a write-back so now we keep stuff way more local. We’re not moving data around, because that wastes power.
Catch the full interview from Gamers Nexus here on their YouTube channel, plus a written article here which is perhaps more suited to the readers amongst us.