RAID Primer: What's in a number?
by Dave Robinet on September 7, 2007 12:00 PM EST- Posted in
- Storage
RAID 0
RAID 0 takes two or more disk drives and writes data in a "stripe" across each disk. Data is accessed by requesting the stripe from the array, resulting in the disks more or less simultaneously feeding their portion of the data back to the controller. The overall capacity of the array is equal to the sum of the formatted capacities of all drives, and disk usage is more or less spread evenly among all drives in the array.
The net result is that the system will see much faster sustained transfer rates for both read and write operations compared to a single drive. File access time, however, is not measurably improved by leveraging multiple disks in a RAID 0 set, which means that systems which require frequent access of small, non-contiguous files (as is often the case in desktop configurations) generally do not benefit from RAID 0.
RAID 0 is an excellent choice for video editing and large-scale "solving" applications, where large files need to be read and written in a continuous manner.
Perhaps the greatest drawback to RAID 0 is that the arrays are rendered inaccessible when a single drive in the array fails. In that sense, RAID 0 isn't actually RAID at all, as it lacks the "Redundant" part of the equation. Data reliability and retention is decreased exponentially as drives are added to a RAID 0 setup, so unless frequent backups are made - or if the data is not regarded as even remotely important - RAID 0 should be approached with caution.
Pros:
RAID 1 sits at the other extreme of the spectrum. It makes a continuous copy of all data from one disk (which is written to and read from by the system) onto another physical disk which is in "standby" mode. This "standby" disk is held in reserve by the controller for when a failure is detected on the first disk. At that point in time, the controller "fails over" to the second disk in the system, with all data still available to the user.
While RAID 1 usually offers no performance benefits (and indeed, it often slightly degrades performance in some situations), it does increase the uptime of the host computer by allowing it to remain online even after a disk in the system has failed. This makes it an extremely popular option for mirroring operating systems on enterprise-class servers, and for small office users without the need for massive amounts of data storage but a requirement for constant uptime.
Higher quality RAID 1 controllers can outperform single drive implementations by making both drives active for read operations. This can in theory reduce file access times (requests are sent to whichever drive is closer to the desired data) as well as potentially doubling data throughput on reads (both drives can read different data simultaneously). Most consumer RAID 1 controllers do not provide this level of sophistication, however, resulting in performance that is at best slightly worse than what would be achieved with a single drive. Software RAID 1 solutions also lack support for reading from both drives in a RAID 1 set simultaneously.
Pros:
RAID 0 takes two or more disk drives and writes data in a "stripe" across each disk. Data is accessed by requesting the stripe from the array, resulting in the disks more or less simultaneously feeding their portion of the data back to the controller. The overall capacity of the array is equal to the sum of the formatted capacities of all drives, and disk usage is more or less spread evenly among all drives in the array.
The net result is that the system will see much faster sustained transfer rates for both read and write operations compared to a single drive. File access time, however, is not measurably improved by leveraging multiple disks in a RAID 0 set, which means that systems which require frequent access of small, non-contiguous files (as is often the case in desktop configurations) generally do not benefit from RAID 0.
RAID 0 is an excellent choice for video editing and large-scale "solving" applications, where large files need to be read and written in a continuous manner.
Perhaps the greatest drawback to RAID 0 is that the arrays are rendered inaccessible when a single drive in the array fails. In that sense, RAID 0 isn't actually RAID at all, as it lacks the "Redundant" part of the equation. Data reliability and retention is decreased exponentially as drives are added to a RAID 0 setup, so unless frequent backups are made - or if the data is not regarded as even remotely important - RAID 0 should be approached with caution.
Pros:
- Excellent streaming performance
- Maximum capacity available for users (sum of all disks)
- No redundancy of data
- Negligible performance benefits for many users
RAID 1 sits at the other extreme of the spectrum. It makes a continuous copy of all data from one disk (which is written to and read from by the system) onto another physical disk which is in "standby" mode. This "standby" disk is held in reserve by the controller for when a failure is detected on the first disk. At that point in time, the controller "fails over" to the second disk in the system, with all data still available to the user.
While RAID 1 usually offers no performance benefits (and indeed, it often slightly degrades performance in some situations), it does increase the uptime of the host computer by allowing it to remain online even after a disk in the system has failed. This makes it an extremely popular option for mirroring operating systems on enterprise-class servers, and for small office users without the need for massive amounts of data storage but a requirement for constant uptime.
Higher quality RAID 1 controllers can outperform single drive implementations by making both drives active for read operations. This can in theory reduce file access times (requests are sent to whichever drive is closer to the desired data) as well as potentially doubling data throughput on reads (both drives can read different data simultaneously). Most consumer RAID 1 controllers do not provide this level of sophistication, however, resulting in performance that is at best slightly worse than what would be achieved with a single drive. Software RAID 1 solutions also lack support for reading from both drives in a RAID 1 set simultaneously.
Pros:
- Redundancy of data
- Lowest cost data redundancy available (one additional disk)
- Simple operations make it easy to implement solution using software only
- Poor usage of drive capacity (only 50% of purchased hard drive capacity available)
- Typically no performance benefit over a single hard disk
41 Comments
View All Comments
tynopik - Saturday, September 8, 2007 - link
> So I'm looking for a solution which stores my data in a "normal" way on the discs + one extra disk with the parity (somewhat like RAID 3 but without the striping).unRAID
http://www.lime-technology.com/">http://www.lime-technology.com/
tynopik - Saturday, September 8, 2007 - link
i should point out that1. it does NOT join your drives together into one volume, each drive is separate (this is basically necessary for what you want unless you go the WHS route)
2. it has to be run on a dedicated system that it turns into NAS (you can't run it on your main desktop for instance)
that said, i really like the idea, almost all of the advantages of the WHS mechanism but much more space efficient (in most cases, i assume the largest drive will always be 'lost' to parity data)
Dave Robinet - Saturday, September 8, 2007 - link
Really, you're looking for something that is several RAID 1 mirrors of single volumes.I can think of nothing off-the-shelf that fits all those needs, though "rolling your own" may help:
- Buy two drives. Create one large partition (say, D:) on drive 1. Mirror that.
- Buy two more drives. Create another large partition (say, E:) on drive 3. mirror that.
Etc, etc.
It's still the same volume, but if you do it using software, the two drives won't be dependent on each other in any way.
If you tear one of those drives out of your computer and slap it onto another one (USB connector, etc), then it'll come up just fine, with or without the mirror.
It's inelegant, and really not something I'd ever push on someone - but you've come up with a kind of oddball request, there. :) Might I ask what it's for? Maybe your criteria can be adjusted in some way.
tynopik - Saturday, September 8, 2007 - link
> but you've come up with a kind of oddball request, there. :) Might I ask what it's for? Maybe your criteria can be adjusted in some way.i understand what he's getting at
he wants protection from drive failure, so a 'parity drive' that can rebuild any one drive that fails is handy
but he's also concerned about losing more than 1 drive simultaneously
having just a plain filesystem on the disk is far more robust than any sort of striping system as worst comes to worst you can just yank any surviving drives and recover what's on them
- a series of raid 1 arrays (like what you described) works but isn't particularly flexible (need equal sized drives)
- WHS is more flexible and powerful but it still requires double the amount of storage (EXPENSIVE)
- this only requires 1 extra drive and allows it to backup any number of other drives
it comes from a desire for some protection but not being able or willing to spend enough for true duplication plus wanting something that fails gracefully (ie not raid5)
i would actually like something like that for my system, there's a chance of recovering everything, but if it hits the fan i'll be able to at least recover something plus it's not that expensive
don't forget there may be physical limitations. if you have 4 physical drives filled with data, you might have enough room and power connectors for a 5th drive, but not for 4 more
Sudder - Sunday, September 9, 2007 - link
tnx, this goes a big step in the right direction
since unraid uses slackware there should be (at least in theory) a possibility to do this with the linux "Logical Volume Manager" (allthough one would probably have to do some work so that the TOC is saved on every disc to still being able to access the data if some of the other discs are gone)
But even without, seperate volumes and the option to access them one by one, by mounting the ReiserFS, is good enough for me.
and that's a big downside.
When I have some time I'll probably try to run it in a virtual machine (the "use a physical disc" option in VM should reduce the perfomance-penaltys significantly), but I'm not that optimistic that this will also work with the "bigger", non-free Versions that can handle more than 3 discs (e.g. handling of the registration Key, since I allready ran into some pre-boot USB Issues with VM when I tried to test the bitlocker-feature of Vista in a VM - although it just might have been my old stick or my USB-contoller ..)
yes (that's kind of a given) - the option to use discs of different sizes is a nice bonus though, since the array can now grow more "organically" over time (you just buy the disc with the best cost/gig ratio at the moment you need it without limmiting yourself to one size like with RAID 5)
with the port-multiplier Option of SATA II (up to 15 drives per cable) and an external casing I think there are ways to cope (and if you plan in advance to have X bays/connectors avalable, you just have to start a new array if the old one is full - which might be a good idea anyway as soon as you come close to double digit disc-numbers - although, that might take some time with modern disc sizes ;-) )
Again: I don't want nessecarily to being able to access my data all the time, I just want to switch from my current "DVD-storrage" to a "HD-storrage".
So what I'm looking for now is the funktionallity of unRAID (without the limitation of the drive nuber), being able to run in a VM and for free ;-).
I allready checked freeNAS and NASlite but they all seemed to be fixed either on RAID and/or JBOD without parity .. any suggestions?
tynopik - Sunday, September 9, 2007 - link
> with the port-multiplier Option of SATA II (up to 15 drives per cable)and which consumer level products support port-multipliers?
it's an optional part of the spec and most don't implement it
if you're willing to do a lot of extra work and hassle and really want offline storage you can fill a bunch of external drives with a virtual filesystem (like truecrypt for instance) and then with them all connected run par to build a par file across all your virtual disk files
disadvantages are numerous, have to be able to connect all disks at once, if you update one little piece of one drive have to recalculate the par file across all of them, etc
Sudder - Sunday, September 9, 2007 - link
most e-sata ports support it (although some controllers like the ones using JMB36X just support "Command-based Switching", e.g. all sollutions with Sil 3132 chips (e.g. many notebook S-ata - PCMCIA adapters) even support "FSI-based Switching" which works a little like SCSI (the command is sent to one disc and then the bus is free again, so you can get "close" to the theoretical 300 MB/sec with multiple discs and the bus is not blocked by one working disk (with 4 discs connected, a test showed still 40MB/sec transfer from each of the 4 parallel working discs ..)
so take AFAIKR one of the many new gigabyte mo-boards with 2 ports that can be used as e-sata, put e.g. a "Dawicontrol DC-6510 PM" on the other end of the cable (one is about 100 bucks) add a powersupply and a housing and you are good for 10 extra discs ..
look at more recent motherboards (e-sata slowly shows up on more and more boards) and you'll find that it's supported more and more
well, I'm kind of too lazy to to the par thing each and every time I just change one little file - or to be more practical, I can verry much immagine myself pushing the rebuild further and further into the future as long as I can forsee that I will add more stuff in the verry near future which then again will require a rebuild .. (if I don't find a usable sollution which does it "on the fly" I'll probably end up doing it "by hand" (evetually by adding a small RAID 5 "file-buffer" to my System to strech the write/par Intervalls) but I _really_ would prefer an automatic solution without a most likely multi-hour rebuild process (reading all discs, calculating and writing the hole par-disk) after each little change ..)
Witling - Saturday, September 8, 2007 - link
Something I don't usually see in articles on Raid is the complete lack of protection from failure due to a virus or installation of a bad driver. Both disks get corrupted.I am a home user of Raid 1 through a controller built in to the motherboard using a popular Redmond Washington operating system.
Dave Robinet - Saturday, September 8, 2007 - link
Yep - I touched briefly on this in the last part of the article.Users need to look closely at if an ARCHIVAL system (tape, etc) is better for their needs than RAID 1.
Let's face it - RAID 1 is for "(almost) always on / critically needed to be working when powered up" configurations ONLY. How many home computers fall into this category... really?
kobymu - Saturday, September 8, 2007 - link
In certain cases RAID 1 will give you better read performance.