Section 1 -- Definitions
RAID or
redundant array of inexpensive drives, is a harddrive configuration similar to a
jbod (just-a bunch of drives) however it expands on this by adding performance and/or reliability 'enhancements' (the specific enhancements depend on the type, or
level or RAID in question)
Striping refers to the process of splitting data across multiple drives. The advantage to doing this is that access times for the overal data is reduced because consequtive operations (whether it be reading or writing) will occur off alternating/different drives. A quick example of this:
Task -- access five data "units". All drives involved will be assumed to have an access time of 10ms.
Scenario 1 -- all five data units are stored on a single drive. in this case, accessing the five data units requires 50ms.
Chunk 1 is found off drive a - 10ms
Chunk 2 is found off drive a - 10ms
Chunk 3 is found off drive a - 10ms
Chunk 4 is found off drive a - 10ms
Chunk 5 is found off drive a - 10ms
Scenario 2 -- data is striped across two drives. in this scenario, while data chunk x is being located, data chunk x+1 is being simultaneously seeked. For simplicity sake, suppose the overlap time is 5ms (50% of the original seek time). Thus, in this highly simplified example, the total access time is 30ms.
Chunk 1 is found off drive a - 10ms
Chunk 2 is found off drive b - 5ms
Chunk 3 is found off drive b - 5ms
Chunk 4 is found off drive b - 5ms
Chunk 5 is found off drive b - 5ms
Scenario 3 - data is striped across three drives -- an extension of the second example except now, while data chunk x is being accessed (requiring 10ms), data chunk x+1 is beeing seeked (with an overlap of 5ms) and data chunk x+2 is being seeked (with an overlap of 2. 5ms, taken as half of the previous ... Again, for simplicity sake). Your net seek time will be 22. 5ms
Chunk 1 is found off drive a - 10ms
Chunk 2 is found off drive b - 5ms
Chunk 3 is found off drive b - 2. 5ms
Chunk 4 is found off drive b - 2. 5ms
Chunk 5 is found off drive b - 2. 5ms
it's not hard to see that, as you increase the number of drives we stripe across, the net seek times will be reduced (obviously to a limitng point). Although I've used a 50% scaling factor here, it is nothing quite like that (and there are diminishing returns as more drives are added due to overhead)
Spanning. Windows for instance, has a 26 drive limit (one drive for each letter of the alphabet) ... Now suppose you have, for whatever reason, a requirement for 30 harddrives. This can be done via
spanning. What this means is that, at a hardware level, multiple harddrives are grouped/clumped together as a single logical drive (so as far as the OS is concerned, there arent more than 26 drives however the user still has the available capacity of the 30 drives).
Parity. This generically refers to some scheme for creating recovery data. In the event of a data failure, the parity information can be used in conjunction with existing data to reconstruct the 'lost' data.
Mirroring. Drive/data mirroring is just that: data stored on drive a is duplicated on drive b; the difference between mirrorign and a parity configuration is that, in the event of a drive failure, a mirrored setup does not require any "reconstruction" since any data stored on a given drive is also stored on the other drive.
Fault tolerance. Generally expressed in degrees/levels, fault tolerance, in the context of RAID, refers to the number of simultaneous critical drive failures a configuration can sustain without loss of user data. Take mirroring for instance, since any data on drivea is stored on driveb, either drive can fail without loss of data (thus meaning the fault tolerance is 1-drive); should both drives fail, the user will lose their data
Controller. Raid doesn't just "happen" by plugging harddrives into a standard controller however, they require special drive controllers which provide the RAID functionality (naturally this does not refer to software-RAID). The type of RAID available depends on the controller.
Software-RAID. At the OS-level, the OS can 'reroute it's io operations' to simulate a hardwareRAID; such implementations are called software-RAID. The advantage here is that there is no longer a requirement for fancy (and potentially expensive) RAID controllers; the downside is that (a) you cannot 'simulate' all possible RAID levels and (b) performance of a software-RAID will never be able to match a hardware implenmentation.
Sequential vs Random Like the word suggests, sequential refers to accessing data such that the data blocks are arranged sequentially (with repsect to the physical location on the plates) on the disk. Random access refers to non-sequential access.
Hotswap Hotswapping refers to [the ability to] remove a harddrive from a system while the system is still powered up. If you have to power-down before swapping out a HDD then your drive does not support hot-swapping
Matrix/nForce RAID
Intel Matrix RAID --
http://www.intel.com/design/chipsets/matrixstorage_sb.htm
nVidia RAID --
http://www.nvidia.com/object/feature_raid.html