This might explain alittle of it.
Much of Core 2 Duo's performance advantage over its Pentium predecessors comes from an additional execution unit on each CPU core. (Core 2 Duo chips have four such units per CPU core versus the Pentium D's three per core.) The additional unit per core, plus some clever coding that lets the chip fuse common groups of instructions into single instructions, allows Core 2 Duo chips to outperform Pentium D chips that run at higher clock speeds
A staggering 4MB of L2 cache keeps the higher-end Core 2 Duo chips supplied with the data they need in order to run at full speed, and Intel has carefully tuned their prefetching algorithms, which preemptively cache the appropriate data before the CPU needs it.
While most dual-core chips, including AMD's Athlon 64 line and Intel's Pentium D CPUs, dedicate a certain amount of cache to each CPU core, the Core 2 Duo provides shared access to its entire 4MB of cache. And the chip can distribute that cache between its cores as needed. If one core is churning away at a particularly complex task, it can use most of the L2 cache, while the other core runs a simple task that demands less cache memory.