^ this.
The sad thing is that if AMD was able to deliver on its promises we would all have some bloody good talking point
As it stands however, this is a real game changer. A small(ish) company like AMD/ATi cannot afford to have a failed architecture. It costs too much, and with Ivy Bridge just around the corner, I am seriously doubting the impact PD will have too.
Great chip if you run a server, however this was not the intended consumer so it was a major fail.
This has the potential to take AMD out of the performance race. My view is unless PD is a performance/dollar champ with a premium enthusiast part, they will stick to low powered mobile technologies.
I don't care if AMD is making technology that is not as good as Intel right now. I just want AMD to get better. If AMD produces a processor that can give 30 percent better performance than what a Phenom II 980 Processor can do now with less power consumption before 2013 I will be satisfied they are making some strides forward.
I doubt that will ever happen. The integer cores are just like the any other x86 cores, they just lack some dedicated hardware, and splitting a single thread across several cores is a nightmare and would require totally revamped, much bigger and probably consequently slower front-end for very little performance gain.let a single thread run on both sets of pipelines on a module
I doubt that will ever happen. The integer cores are just like the any other x86 cores, they just lack some dedicated hardware, and splitting a single thread across several cores is a nightmare and would require totally revamped, much bigger and probably consequently slower front-end for very little performance gain.
Well you said a single thread on both pipelines of a module, I assumed you referred to the integer cores... what exactly did you mean?I never said several cores, which would be a nightmare. I said a single thread on a single whole module. The only nghtmare would be with the scheduler when you hit 5 or more threads.
Well you said a single thread on both pipelines of a module, I assumed you referred to the integer cores... what exactly did you mean?
I meant several as in "more than one".You said a single thread over (several) cores. To me a few is 2 or 3 and several is 6 or 7. I said a single thread on a single module. Up to 4 threads on 4 modules.
I have no idea how that's different - two cores without certain dedicated hardware or two sets of pipelines with certain shared hardware sound more or less the same to me.You said (integer cores are just like the any other x86 cores, they just lack some dedicated hardware) To me is just the opposite, its two sets of pipelines that share 90% hardware. Even the 128bit FP Scheduler can be combined as a single 256bit. The Interger Scheduler and the L1 is the only separated part.
There is a whole lot of reasons. Regardless of whether you talk about pipelines, integer cores, cores or separate physical CPUs altogether, they're still distinct execution units and the fundamental problems of running a single thread over several execution units remain the same. Data hazards (and at least to some extent the semantic atomicity of x86 instructions) make it pointless to just keep giving more and more pipelines to run a single thread on - AFAIK Bulldozer already has 4 for each integer core, I think that's about the practical limit for an x86 CPU, and I sincerely doubt it quadruples the performance of a hypothetical single-pipeline integer core (in theory, yes, but in practice I'm fairly certain only 2 and sometimes 3 get used for most common workloads). As you keep giving more pipelines for a single thread to run on, you just keep realising smaller and smaller performance gains and an exponentially more complex front-end. There's a reason why we stopped the single-core performance race years ago - it's just much more economical and practical to design a reasonably fast core and have several of of them in a single CPU. The exact same reason applies to any unit within a CPU that is capable of executing instructions, regardless of what you call it. For running a single thread on one module to be viable, the single-threaded performance would have to almost double - something that is, in practice, impossible.There is no reason a single thread could not be run on both sets of pipelines on a module.
There is a whole lot of reasons. Regardless of whether you talk about pipelines, integer cores, cores or separate physical CPUs altogether, they're still distinct execution units and the fundamental problems of running a single thread over several execution units remain the same. Data hazards (and at least to some extent the semantic atomicity of x86 instructions) make it pointless to just keep giving more and more pipelines to run a single thread on - AFAIK Bulldozer already has 4 for each integer core, I think that's about the practical limit for an x86 CPU, and I sincerely doubt it quadruples the performance of a hypothetical single-pipeline integer core (in theory, yes, but in practice I'm fairly certain only 2 and sometimes 3 get used for most common workloads). As you keep giving more pipelines for a single thread to run on, you just keep realising smaller and smaller performance gains and an exponentially more complex front-end. There's a reason why we stopped the single-core performance race years ago - it's just much more economical and practical to design a reasonably fast core and have several of of them in a single CPU. The exact same reason applies to any unit within a CPU that is capable of executing instructions, regardless of what you call it. For running a single thread on one module to be viable, the single-threaded performance would have to almost double - something that is, in practice, impossible.
I never said nor implied you did. I merely said that for running a single thread on a module to be worth it, it would have to nearly double the performance. I mean running two threads on one module as it currently is gives you nearly double the performance, but if you were to run a single thread on one module would probably improve the performance of most workloads by only a fraction. Sure, anything AMD could do to boost the IPC at this point would help, I agree with that. But the fact is, if we were to go with the "single thread per module" idea, you'd essentially have to be willing to trade nearly half of the potential throughput for minimal single-threaded performance gains, and considering that massively multi-threaded applications are Bulldozer's biggest selling point that's not something they can afford to do right now. They'd be far better off just improving the slow front-end and fixing the latencies instead of some ad-hoc patch to make the architecture do something it wasn't designed for.Come on man, what are you trying to prove here? I never said running a thread on both sets of pipelines would give you double performance.
My point was the above and the fact that it's not anywhere near as easy as you make it sound like. The hypothetical ability to run a single thread on a module would require major changes in the architecture, not something one just does in a stepping or two. And it's simply not worth it.But I am at a loss of what your point is here. To me it sounds like your over exaggerating what I said to make some kind of unknown point.
I never said nor implied you did. I merely said that for running a single thread on a module to be worth it, it would have to nearly double the performance.
I mean running two threads on one module as it currently is gives you nearly double the performance.
but if you were to run a single thread on one module would probably improve the performance of most workloads by only a fraction.
But the fact is, if we were to go with the "single thread per module" idea, you'd essentially have to be willing to trade nearly half of the potential throughput for minimal single-threaded performance gains, and considering that massively multi-threaded applications are Bulldozer's biggest selling point that's not something they can afford to do right now. They'd be far better off just improving the slow front-end and fixing the latencies instead of some ad-hoc patch to make the architecture do something it wasn't designed for.
My point was the above and the fact that it's not anywhere near as easy as you make it sound like. The hypothetical ability to run a single thread on a module would require major changes in the architecture, not something one just does in a stepping or two. And it's simply not worth it.
I think you're really missing my point, and I have no idea what makes you think I'm stretching it. If you have two threads running on a single two-core module, they will perform better than a single thread on a single module doing the same job. That's basically what I'm getting at. I do admit I might have overestimated the performance (as far as I was aware ~70-80% was a realistic figure for most workloads), but there's simply no way to get more performance per watt/silicon/clock out of a single module with this single-thread-on-both-pipelines trick you were talking about.Double the performance to be worth it! Man your really stretching your point here. It would be worth it if you just got 10% or even less since the IPC sucks.
80% of what, a single integer core with a dedicated front-end and caches? If that's the case, it would be more economical to simply drop the other integer core to save die space and get rid of the 20% sharing penalty instead of some let's-make-a-single-thread-run-on-both-pipeline-sets patchwork. I know you said that you weren't talking about integer cores, but since you still haven't clarified as to what you do mean by the "two sets of pipelines" you keep bringing up, I really have no idea what else you could be referring to.No it doesnt. AMD only claimed that a second thread on a module would get on average at tops 80% compared to 2 full cores. And it fails even on that. Most really got alot less then that.
Alright. I'm reading really, REALLY hard. I can't find it. Where did you say that?Thats all I ever said it would.
Again, I'm reading through the discussion between us really hard and I haven't found anything that says this, even in other words. Maybe I'm just really stressed or tired, call me dumb if it makes you feel better, but I just don't see where that was brought up.If you look at what I said, that is exactly what I said. That they should not waste time and money on Zambezi and maybe come out with a B3 stepping to improve IPC and wattage and drop it. Put most of the money and changes on Piledriver.
I never said it should be done with Zambezi either. And as for the scheduling bit, I really don't know what you mean. The schedulers on the CPU aren't responsible for shifting threads around, that's a responsibility of the OS scheduler. And yes, even though anything will be an improvement, throwing more pipelines at the problem isn't definitely the wisest move. In the time it would take to get your hypothetical 1-thread-2-pipeline-sets concept working they might as well have worked out the latency and starved pipelines instead, which would make much more sense.As I said above, I never said they should do that with Zambezi. Plus I said that by running a single thread on a whole module would be a scheduling nightmare when you started hitting above 4 threads because of having to shift threads around on the second set of pipelines. My whole point was, since single threaded performance was so bad that anything they could do would be a improvement. Plus I will say again, I never said this should be done with Zambezi.
I never claimed they should do that with Zambezi. I never said that you said that. Where are you getting this from? And I don't see how I'm underestimating the performance the single-thread-on-module performance either. I was specifically questioning the validity of the idea of running one thread on both sets pipelines (as you refer to them). It's simply not sound. I never did argue against the concept of a single thread per module in general, and if that's how you understood me, my bad, I'm not always the best wielder of words if you will, but that was not my point.Your twisting what I said. Your claiming that a module has better performance with 2 threads then it really does. Your underestimating the performance of running a single thread on whole module. Your saying I said it should be done with Zambezi, which I didnt. Claiming I said it sounded easy, where did I say that?
Hmm. Every time you pointed out that I claimed you said you never did, I've gone back and read really carefully, and I just have no clue where I've actually made any of those claims. Weird. And no, I'm not just ranting either. I do have better things to do with my time, believe me, when I visit forums it's usually for the purpose of intelligent, productive discussion, not senseless ranting about.Seems like your fabricating/twsting what I said just to rant.
Neither would I, I absolutely do agree with that. But that's not what we were discussing.I would not put much more effort in Zambezi, maybe put out a B3 stepping to save some face. But dump all the money in Piledriver and release better IPC and wattage processor and let 4 or less threads run on both sets of pipelines on the modules.
Just out of curiosity and to avoid confusion, what exactly do you mean by "both sets of pipelines"? Are you or are you not referring to integer cores? I've responded to your posts under the assumption that you are since, like I said before, I just have no idea what else you could be referring to and I haven't come across a single source using that terminology. If this is not the case, just tell me what you do mean - I'm aware that my replies may well no longer apply, there would be little point in arguing against something I said that we both know doesn't hold any more.If they can get Piledriver performance up atleast 15% and let a single thread run on both sets of pipelines on a module