Bulldozer

TrainTrackHack · Nov 10, 2011

First of all. You said that I made it sound simple to run a single thread on both sets of pipelines on a module. You said that was something that you cant do with a simple stepping or two. If you were not talking about Zambezi, what is this mysterious unknown processor your talking about? You said you were not talking about Zambezi either? Piledriver hasnt been released yet to make new stepping. Zambezi and Bulldozer is the same processor. So you were talking about Zambezi! I wasnt!

I wasn't talking about Zambezi. I wasn't talking about any processor in particular. By "can't be done in a stepping or two" I simply illustrated that it's not a quick fix, I didn't imply that you said that they should do it for the next stepping. It's like saying it can't be done in a day or two (and then you jumped at me saying that "I never claimed it could be done in exactly two days", which was NOT my point). And since the idea you presented is just not sound AND would require major changes which require major resources too, I merely suggested that they'd be better off improving other things instead of some weird one-thread-two-pipeline-sets fix.

Then you said that a module should get double the performance or it would not be worth it. Then you said that 70-80% was a more realistic figure.

"Nearly double" to be specific. I don't think think it's inappropriate to refer to 70-80% as a realistic figure for most workloads as "nearly double". I didn't mean nearly as in "within several percent". Alright, the choice of word could've been bad, but I simply didn't think of a better word for something that's ~3/4 of something just then. The exact percentage isn't relevant as far as the validity of my point goes (unless it's actually consistently lower than 60%, in which case I'm happy to admit that I was wrong on that account - I simply wouldn't have been aware).

Then you ask 80% of what. AMD claimed that just using around 12% more die space vs. two full cores using a module that it would get around 80% performance of two threads running on a full 2 cores. What do you not get about that?

I do get that. The thing is, though, like I said, your proposed fix is pointless. I'm fairly sure you said that a 10% performance increase would be realistic for this one-thread-two-pipelines fix, which I agree with. But then you'd be using all that extra die space the other core is taking up to run and thread, and the "extra die space" is more than 10%. And it would require a vastly reworked front-end, which would take even more die space and resources. In the end, you'd be making performance increase that can't even be noticed under most circumstances, but even worse performance per silicon, when all that time could've been spent coming up with a better fix. I mean, if you could get ~20% more performance out of an integer core that does not share any components, would it not make more sense to simply drop the other integer core altogether to improve the IPC and make the chip smaller rather than try to force one thread run on both and get a ~10% performance increase?

Then you act like you dont know what I am talking about when I say both set of pipelines on a module because I am not saying integer cores on a module. Which it can be argumentative if you can call them integer cores or not. Which is why I say both sets of pipelines.

No I don't. I simply wasn't sure and asked for clarification. I already made it clear that I posted under the assumption that you were indeed referring to the integer cores.

Then seem confused because I said scheduler instead of OS scheduler.

To be fair, you did say "I said that by running a single thread on a whole module would be a scheduling nightmare when you started hitting above 4 threads because of having to shift threads around on the second set of pipelines" in those exact words. It simply doesn't make any sense in context of OS schedulers. If one module appears as a single hardware thread to the OS (which I assume was the idea behind one thread per module), the OS scheduler simply doesn't play any part in which particular pipeline a thread runs on. I have no idea why you think there's a need to "shift threads around the second set of pipelines". And having more than 4 threads has nothing to do with BD scheduling issues. Even then I don't see how this is relevant. My point, in essence, is that trying to make a single thread run on both pipelines is not sound. That's what I originally said. It doesn't matter whether we're talking about Zambezi or Piledriver (or any realistic x86 CPU), trying to make more than one hardware thread run a single software thread is simply not viable.

StrangleHold · Nov 10, 2011

Ok, for the first part.
I take your word for it. But to me saying you cant fix it in a few steppings or in a quick fix would apply for a existing product, not one that doesnt exist yet.

For the third part.
Say we are going with the 10%, we both know thats a guess, it could vary from nothing to more then that. I dont think it would take as much hardware change as you say. Under that considering most of the extra die space was used to start with so it could run 8 threads, then being able to run a single thread on both sets on a module, would be just a benefit.

Fifth part.
There is all kinds of problems with the way it should run threads. Two threads that share alot of data could share a module and not take much of a performance hit at all. If they dont its better to run them on different modules. The OS even has a problem with that, it just looks like a 8 core. It doesnt know what a module is and doesnt even think about what core to run it. Under this, you could run the same benchmark twice and get different performance. Then this problem gets larger if your running more then 4 threads. Add to that the concept if a module was already running a single thread on both sets of pipelines on a module. Thats the nightmare I was talking about.

2048Megabytes · Nov 11, 2011

I hope AMD drops backwards compatibility on their next Socket and goes with a processor that uses at least 1050 pins. They can do it at least once. They haven't done in the AM Sockets since Socket 939.

Hopefully some better processors await Socket AM3+ in the near future before AMD moves on.

linkin · Nov 11, 2011

AMD should bring something like Llano but with Phenom II and six cores with graphics chip to AM3+

Instant success while they fix the suckiness that is Bulldozer.

StrangleHold · Nov 11, 2011

Llano already uses the Ahlon II core at 32nm. Its the same as Phenom II without L3. I doubt they will release a 6 core. The first quarter of the year they are suppost to release the Llano/Trinity with the Piledriver core.

jonnyp11 · Nov 11, 2011

StrangleHold said:
Llano already uses the Ahlon II core at 32nm. Its the same as Phenom II without L3. I doubt they will release a 6 core. The first quarter of the year they are suppost to release the Llano/Trinity with the Piledriver core.

Really??? Sorry if i'm correcting you

(although i do think Trinity is supposed to be on piledriver)

EDIT: Although it is an older article it should be right

"Trinity will be AMD’s first Bulldozer based APU, combining some variation on Bulldozer with some as-yet-unseen AMD GPU architecture. Trinity has already been in AMD’s labs for a few weeks now and will launch in 2012 as the follow-up to Llano."

StrangleHold · Nov 11, 2011

Bulldozer is the name of the architecture. Zambezi is Bulldozer. The new Bulldozer Next/enhanced architecture in desktop will be called Piledriver and the new desktop APU Trinity will use the Piledriver core with out L3 cache. Both are the Next/Enhanced Bulldozer.

Bulldozer and Next/Enhanced Bulldozer is the name of the architecture. Zambezi and Piledriver is the name of the core. Just like Deneb is the name of the core but Phenom II is the name of the processor.

The APU will feature up to four x86 cores powered by enhanced Bulldozer architecture, with a Piledriver core
http://www.xbitlabs.com/news/cpu/di...driver_x86_Cores_Radeon_HD_7000_Graphics.html

jonnyp11 · Nov 11, 2011

so both zambezi and piledriver are both bulldozers like deneb and thuban are both phenom ii's? i knew piledriver was the next version of zambezi but i thought it would be in another line of cpus, like phenom would be zambezi and piledriver would be phenom ii, which considering that wasn't the phenom launch a failure but phenom ii's were pretty good, hopefully that will happen here.

StrangleHold · Nov 11, 2011

jonnyp11 said:
so both zambezi and piledriver are both bulldozers like deneb and thuban are both phenom ii's? i knew piledriver was the next version of zambezi but i thought it would be in another line of cpus, like phenom would be zambezi and piledriver would be phenom ii, which considering that wasn't the phenom launch a failure but phenom ii's were pretty good, hopefully that will happen here.

Well there is alittle more to it then that.

Back to the start
AMD came out with the Barcelona architecture. The desktop was called Phenom and the quad core cores name was the Agena. The architecture is Barcelona, The desktop CPU name itself was Phenom and the core itself is called Agena.

Next came I guess what you could call the Barcelona Next/+/Enhanced, what ever you want to call it architecture. The Desktop CPU name itself is called Phenom II and the quad cores core name is Deneb, the X6 core cores name is Thuban.

Next came the Bulldozer architecture. The desktop CPU name itself is FX and the cores core name is Zambezi

Next will come the Bulldozer Next/+/Enhanced architecture, what ever they end up calling it. The Desktop CPU name was to be called Komodo, but I think they have changed it to Vishera, but who knows they could end up calling it the FX II. The cores core name will be Piledriver

jonnyp11 · Nov 11, 2011

Love how complicated they make that part and how simple their chipset names have been at least since the 700's. But if i remember correctly didn't the original phenom launch with issues like these? looked it up quick and it had speed issues and a bug, and was later fixed, ironically the original phenoms seem to have been on the B2 stepping also, and B3 was much better. in this case, i'm hoping history is an indicator of what is in our future.

StrangleHold · Nov 11, 2011

It had a TLB erratum occur. In the L3 cache, two arbiters would try to overwrite the same data and the thing would lockup. It was really nowhere near being a big problem on a desktop CPU. It would mostly just happen on the server CPU side. But it was only reported in test and never happen in real time. A big stink came out about it and AMD released a bios update to fix it, even though it never happen. The fix if you enabled it would kill performance up to 20% and it wasnt really even needed. But they released the B3 stepping about 3 months later that fixed it and plus could get alittle more clock speed.

Bulldozer

TrainTrackHack

VIP Member

StrangleHold

Moderator

2048Megabytes

Active Member

linkin

VIP Member

StrangleHold

Moderator

jonnyp11

Active Member

StrangleHold

Moderator

jonnyp11

Active Member

StrangleHold

Moderator

jonnyp11

Active Member

StrangleHold

Moderator