Quote:
|
Originally Posted by http://www.pcmech.com/show/processors/715/
AMD Athlon XP’s have 3 X86 decoders, 3 floating-point pipelines, and 3 integer pipelines. This is compared with Intel’s Pentium 4, which has only one X86 decoder, 2 floating-point pipelines, and 1 more integer pipeline than AMD’s Athlon. This leads to AMD being able to decode more instructions than Intel at the same time, and being able to perform floating-point operations quicker than Intel. Overall, AMD Athlon XP processors are able to perform 9 operations per clock cycle while Intel can only manage 6. It doesn’t sound like much, but in processors every operation is crucial. This is why I said AMD are more about getting more done per clock cycle in my AMD processor buying guide.
|
Unfortunately thats quite inconsequential if not partially incorrect:
- You cannot break an opcode -- even when an interrupt is called, the currently executing opcode will complete before the interrupt is dealt with. This holds true of 8086 and 68xx and 68K assembly. I dont know specifically about other ASM platforms but it would make sense that they would not be able to split an opcode.
- Do make sure you are aware of the distinction between an operation and an instruction as they are not neccesarily the same.
- Now what the comment in the quote was referencing was about the superscalar architecture which was introduced to deal with the idea of "why are we limited to a maximum of 1 operation per cycle?" -- something you didnt mention. Regardless, superscalar architecture simply means "a crapload of extra silicon was added to the chip in the form of decoders and execution units etc. The funky thing about this is that, like all neato inventions ... there is a catch (you didnt think it'd be that easy did you hehe) and the catch is that all this fancy hardware only kicks in if the multiple instructions in the prefetch queue can be executed independently
Getting back to the point Im making about execution times:
- The x86 processor doesnt an opcode within a single cycle. Consider something simple and stupidly common like mov ax, bx and we'll break it down in a bit more detail
- Fetch instruction. 1 cycle
- Update the IP. 1 cycle
- Decode instruction. 1 cycle
- If needed, fetch a 2-byte operand. 0-2 cycles
- If such an operand was required, update the IP again. 0-1 cycles.
- Translate the address of that operand if needed. 0-2 cycles
- Fetch that operand now that we know its address. 0-3 cycles
- Save the value to the destination register. 1 cycle
The total here is anywhere from 5-11 cycles for a stinkin mov op
This is a good analysis of the MOV instruction
That being said, only one serious correction needed to that first link you provided:
Quote:
|
Overall, AMD Athlon XP processors are able to perform 9 operations per clock cycle while Intel can only manage 6
|
While it may just be a case of symantics but an important one. When talking about executions or efficiencies or such, the unit being dealt with is an "operation" or "an opcode" (when we breakdown an opcode, sometimes that's called an 'instruction' but that is explicitly noted). Any other time 'instruction' is used, it's synonymous with 'opcode'. What that quote
should have said was "AMD Athlon XP processors are able to prefetch 3 op-instructions concurrently and deal with three such FP and INT instruction streams and the Intel Pentium 4 is only able to prefect a single instruction and deal with two FP and a single INT instruction stream".
That being said, perhaps you should have a look at
page two of the link you gave me. Contrary to what you try to pass as the number of decoders and execution units being the defining characteristic of a processors performance -- it's not. (That and the fact that not all instructions are able to be executed independently as noted above -- then again thats what OOOE Optimization is for hehe)
-----
Quote:
(AMD) 9 opcc multiplied by 2000 (2ghz) x 1,000,000 = 18,000,000,000 ops (eighteen billion)
(INTEL) 6 opcc multiplied by 3000 (3ghz) x 1,000,000 = 18,000,000,000 ops
realistically, the exact same amount of real world work, regardless of overrated megaherts hype....
|
If you're trying to suggest that either of those processors will execute 18billion opcodes per second, then let me suggest to you that the year is 2005. Not 3005 hehe
----
Hehe
the link gave me quite the chuckle .... its a collection of both AMD and Intel fanboys at a convention. One quote that stood out hehe
Quote:
|
Originally Posted by amdtel
The worst thing that can happen to AMD is that if they degrade their product by putting more pipelines.....if they do it in hammer they are screwed
|
<sarcasm> Well I guess they're screwed arent they? </sarcasm>
-----
Quote:
|
AMD Athlon CPU make up its disadvantage in memory bandwidth by providing three Full x86 decoders (while Pentium 4 has only 1) and performing 9 operations per clock cycle (while the Pentium 4 has 4). Consider the much more expense on the latter system, we still believe the former one is worth to report here.
|
While true from a hardware level (again the wording is quite wrong -- I wish my procs could be that effective), they failed to note that the reason any old-generation Pentium4 loses out to AthlonXP and Pentium IIIs is because their clock speeds are not scaled enough to counter the ineffectiveness of the individual cycle
----
Quote:
|
Also, the Athlons perform 9 operations per clock cycle, compared to 6 for the P4 (non HT).
|
Again with the wording .... just because you find forum comments about the number of "operations" being done doesnt mean thats actually how it is. Because if it was, I'd settle for that 18billion-opcodes/sec setup you 'calculated' earlier. Yer a bright kid I think and i'll assume you get my point about the wording
----
Quote:
Simply put, AMD's do more calculations, and have less bottlenecks when gaming.
I could write you a book about it, but I'd just be quoting AMD...and even I would get lost in the technical explanation
|
.
Well a few things about that:
- The distinction for gaming has only really shone through with the K8 based processors and their on-die memory controllers ... having a memory controller that operates in synch with the cpu rather than the NB is hell of a boon and that is what gives AMD processors their killer advantage over Intel silicon in the gaming market
- Just because a processor is suited for gaming does not mean it's suited for everything. Although this is beginninng to change (although only with the top end processors), real work is still an Intel silicon job
Quote:
So AMDs or Intels?
Let me out an example of the XP 3200. It only goes up to 2.2ghz. On the other hand the P4 goes up to 3.2ghz and over. *ok more ghz, im happy*.
Unforuntely, no. the Intel does only 6 floating points per cycle meaning that it can only carry out 6 operations per cycle. The AMD does 3 more so it can do 9 points.
|
Well in that case with those two processors, explain to me why
- Intel processors dominate the photoshop sector? To a degree that's an INT bount application and since the AMD proc has supposedly 3x the INT capacity ... why does the Intel one win out?
- Intel processors dominate the Premier sector? Thats a FP arena and the AMD chip there should have an advantage since it has an extra FP pipeline?
-
Some other supposedly FP bound stuff ends up in Intel land?
-
And in animation, a FP bound application for CPUs, ends up in Intel land?
.... continues