RAID Primer: What's in a number?

Name: RAID Primer: What's in a number?
Item: RAID Primer: What's in a number?
Author: Dave Robinet

by Dave Robinet on September 7, 2007 12:00 PM EST

Posted in
Storage

41 Comments | Add A Comment

41 Comments

RAID 0

RAID 0 takes two or more disk drives and writes data in a "stripe" across each disk. Data is accessed by requesting the stripe from the array, resulting in the disks more or less simultaneously feeding their portion of the data back to the controller. The overall capacity of the array is equal to the sum of the formatted capacities of all drives, and disk usage is more or less spread evenly among all drives in the array.

The net result is that the system will see much faster sustained transfer rates for both read and write operations compared to a single drive. File access time, however, is not measurably improved by leveraging multiple disks in a RAID 0 set, which means that systems which require frequent access of small, non-contiguous files (as is often the case in desktop configurations) generally do not benefit from RAID 0.

RAID 0 is an excellent choice for video editing and large-scale "solving" applications, where large files need to be read and written in a continuous manner.

Perhaps the greatest drawback to RAID 0 is that the arrays are rendered inaccessible when a single drive in the array fails. In that sense, RAID 0 isn't actually RAID at all, as it lacks the "Redundant" part of the equation. Data reliability and retention is decreased exponentially as drives are added to a RAID 0 setup, so unless frequent backups are made - or if the data is not regarded as even remotely important - RAID 0 should be approached with caution.

Pros:

Excellent streaming performance
Maximum capacity available for users (sum of all disks)

Cons:

No redundancy of data
Negligible performance benefits for many users

RAID 1

RAID 1 sits at the other extreme of the spectrum. It makes a continuous copy of all data from one disk (which is written to and read from by the system) onto another physical disk which is in "standby" mode. This "standby" disk is held in reserve by the controller for when a failure is detected on the first disk. At that point in time, the controller "fails over" to the second disk in the system, with all data still available to the user.

While RAID 1 usually offers no performance benefits (and indeed, it often slightly degrades performance in some situations), it does increase the uptime of the host computer by allowing it to remain online even after a disk in the system has failed. This makes it an extremely popular option for mirroring operating systems on enterprise-class servers, and for small office users without the need for massive amounts of data storage but a requirement for constant uptime.

Higher quality RAID 1 controllers can outperform single drive implementations by making both drives active for read operations. This can in theory reduce file access times (requests are sent to whichever drive is closer to the desired data) as well as potentially doubling data throughput on reads (both drives can read different data simultaneously). Most consumer RAID 1 controllers do not provide this level of sophistication, however, resulting in performance that is at best slightly worse than what would be achieved with a single drive. Software RAID 1 solutions also lack support for reading from both drives in a RAID 1 set simultaneously.

Pros:

Redundancy of data
Lowest cost data redundancy available (one additional disk)
Simple operations make it easy to implement solution using software only

Cons:

Poor usage of drive capacity (only 50% of purchased hard drive capacity available)
Typically no performance benefit over a single hard disk

Index Data Striping and Parity

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

41 Comments

View All Comments

Brovane - Friday, September 7, 2007 - link
Personally we use Raid (0+1) at my work for our Exchange Cluster, SQL cluster and the home drives for our F&P cluster. Were Raid 0+1 is great is in a SAN environment. We have the drives mirror between SAN DAE so we could have a entire DAE fail on our SAN and for example Exchange will remain up and running. Also if you have a drive failure in one of our RAID 0+1 drives the SAN automatically just grabs the hot spare and starts rebuilding the array and pages the the lan team and alerts Dell to ship a new drive. Of course no matter what RAID you have setup you should always have daily tape backups with a copy of those tapes going offsite.
Bladen - Friday, September 7, 2007 - link
Might be asking a bit too much, especially in the case of RAID 5, 6, 0+1, and 1+0, but some SSD raid performance would be nice. They would need more than 2 drives wouldn't they?

However if we could see some RAID 0 figures from a pair off budget SSD's, and a pair of performance SSD's, that would be awesome.
tynopik - Friday, September 7, 2007 - link
in addition to a WHS comparison i hope it covers

1. software raid (like built into windows or linux)
2. motherboard raid solutions (nvraid and intel matrix)
3. low end products (highpoint and promise)
4. high end/enterprise products
5. more exotic raids like raid-z and raid 5ee
6. performance of mixing raids across same disks like you can with matrix raid and some adaptecs

and in addition to features/cost/performance i hope it really tries to test how reliable/bulletproof these solutions are

for instance a ton of people have had problems with nvraid
http://www.nforcershq.com/forum/image-vp511756.htm...">http://www.nforcershq.com/forum/image-vp511756.htm...

what happens if you yank the power in the middle of a write?
how easy is it to migrate an array to a different controller?
can disks in raid1 be yanked from the array and read directly or does it put header info on the disk that makes this impossible?
yyrkoon - Saturday, September 8, 2007 - link

quote:
for instance a ton of people have had problems with nvraid

That would be becasue "a ton of people are idiots'. I have been using nvRAID for a couple of years without issues, and most recently I even swapped motherboards, and the array was picked right up without a hitch once the proper BIOS settings were made. I would suspect that these people who are 'having problems' are the type who expect/believe that having a RAID0 array on their system will give them another 30 frames per second in the latest first person shooter as well . . .
tynopik - Saturday, September 8, 2007 - link
> I would suspect that these people who are 'having problems' are the type who expect/believe that having a RAID0 array on their system will give them another 30 frames per second in the latest first person shooter as well . . .

the link is in the very top comment

they were all actually using raid1 and had problems with it constantly splitting the array
tynopik - Friday, September 7, 2007 - link
http://storageadvisors.adaptec.com/">http://storageadvisors.adaptec.com/
great site with lots of potential topics like:

desktop vs raid/enterprise drives - is there a difference
http://storageadvisors.adaptec.com/2006/11/20/desk...">http://storageadvisors.adaptec.com/2006...-drives-...

Picking the right stripe size
http://storageadvisors.adaptec.com/2006/06/05/pick...">http://storageadvisors.adaptec.com/2006/06/05/pick...

Different types of RAID6
http://storageadvisors.adaptec.com/2005/11/07/a-ta...">http://storageadvisors.adaptec.com/2005/11/07/a-ta...

other features to consider:
handling dissimilar drives
morph online from one RAID level to another
easily add additional drives/capacity to an existing array
can you change which port a drive is connected to without messing up the array?

maybe create a big-honkin features matrix that shows which controllers are missing what?

performance:
- cpu hit between software raid, low-end controllers, enterprise controllers (some have reported high cpu usage with highpoint controllers even when using raid-1 which shouldn't cause much load)
- cpu hit with different busses (PCI, PCI-X, PCIe) and different connections (firewire, sata, scsi, sas, usb)

maybe even a corruption test. (write terabytes of data out under demanding situations and read back to ensure there was no corruption)

But most of all I WANT A TORTURE TEST. I want these arrays pushed to their limits and beyond. What does it take to make them fail? How gracefully do they handle it?
tynopik - Friday, September 7, 2007 - link
an article from the anti-raid perspective
http://www.pugetsystems.com/articles?&id=29">http://www.pugetsystems.com/articles?&id=29
tynopik - Saturday, September 8, 2007 - link
another semi-anti-raid piece

http://www.bestpricecomputers.co.uk/reviews/home-p...">http://www.bestpricecomputers.co.uk/reviews/home-p...

"Why? From our survey of a sample of our customers here's how it tends to happen:

The first and foremost risk is that the RAID BIOS loses the information it stores to track the allocation of the drives. We've seen this caused by all manner of software particularly anti-virus programs. Caught in time a simple recreation of the array (see last page) resolves the problem in over 90% of the cases.

BIOS changes, flashing the BIOS, resetting the BIOS, updating firmware etc can cause an array to fail. BIOS changes happen not just by hitting delete to enter setup. Software can make changes to the BIOS.

Disk managers, hard disk utilities, imaging and partitioning software etc. can often confuse a RAID array."

-------------------------

http://storagemojo.com/?p=383">http://storagemojo.com/?p=383

. . . . the probability of seeing two drives in the cluster fail within one hour is four times larger under the real data . . . .

Translation: one array drive failure means a much higher likelihood of another drive failure. The longer since the last failure, the longer to the next failure. Magic!

(perhaps intentionally mixing the manufacturers of drives in a raid is a good idea?)

------------------

http://www.lime-technology.com/">http://www.lime-technology.com/

unRAID

-----------------

http://www.miracleas.com/BAARF/">http://www.miracleas.com/BAARF/

an amusing little page

-----------------

it would also be cool if you had a failing drive that behaved erratically/intermittently/partially to test these systems

-----------------

if a drive fails in a raid array and you pull the wrong drive, can you stick it back in and still recover or does the controller wig out?

------------------

some parts from the thread at the top that you might have missed

http://www.nforcershq.com/forum/3-vt61937.html?pos...">http://www.nforcershq.com/forum/3-vt619...=0&p...

> Someone claims that the nv sata controler (or maybe raid controler) doesn't work properly with the NCQ function of new hard drives (or the tagged queing or whatever WD calls it).

> if the drives are SATA II drives with 3 G/bps speed and NCQ features NVRAID Controller has know problems with this drives.

> the first test trying to copy data from the raid to the external firewire drive resulted in not 1 but 2 drives dropping out.

Luckily the 2 were both 1 half of the mirror meaning i could rebuild the raid. So looks like trying to use the firewire from the raid is the problem. THis may stand to reason as the firewire card is via an add-on card in a PCI slot so maybe there is some weird bottleneck in the bus when doing this causing the nvraid to malfunction.

(so like check high pci bus competition)

http://www.nforcershq.com/forum/4-vt61937.html?sta...">http://www.nforcershq.com/forum/4-vt61937.html?sta...

> I have read that its best to disable ncq and also read cache from all drives in the raid via the device manager. This may tie in with someone else’s post here who says the nvraid has issues with ncq drives.

http://www.nforcershq.com/forum/image-vp591021.htm...">http://www.nforcershq.com/forum/image-vp591021.htm...

NF4 + Vista + RAID1 = no NCQ?

------------------------------------

RAID is dead, all hail the storage robot

http://www.daniweb.com/blogs/printentry1399.html">http://www.daniweb.com/blogs/printentry1399.html

Drobo - The World's first storage robot

http://www.datarobotics.com/">http://www.datarobotics.com/

"Drobo changes the way you think about storage. In short, it's the best solution for managing external storage needs I have used." - JupiterResearch

"It is the iPod of mass storage" - ZDNet

"...the most impressive multi-drive storage solution for PCs I've seen to date" - eHomeUpgrade

sucks that it's $500 without drives and usb only though
Dave Robinet - Saturday, September 8, 2007 - link
Good posts. A topic you're obviously interested in. :)

Let me try and hit a few of the points in random order:

- Stress/break testing is a GOOD idea, but very highly subjective. You can't GUARANTEE that you'll be writing (or reading) EXACTLY the same data under EXACTLY the same circumstances, so there's always that element of uncertainty. Even opening the same file can't guarantee that the same segments are on the same disk, so... I'll have to give some thought to that. Definitely worthwhile, though, to pursue that angle (especially in terms of looking at how array controllers recover from major issues like that).

- Your other points pretty much all hit on a major argument: Software versus Hardware RAID (and versus proprietary hardware). I actually know an IT Director in a major (Fortune 500) company who uses software RAID exclusively, including in fairly intensive I/O applications. His argument? "I've been burned by "good" hardware too often - it lasts 7 years, I forget to replace it, and when the controller cooks, my array is done." (Make whatever argument you like about him not being on the ball enough to replace his 7 year old equipment, but I digress). I do find the majority of the decent controllers write header information in fairly documented (and retrievable) ways - look at IBM's SmartRAID series as a random example of this - so I don't see that being a hugely big deal anymore.

You're dead on, though. *CONSUMERS* who are looking at RAID need to be very, very sure they know what they're getting themselves into.
tynopik - Saturday, September 8, 2007 - link
> You can't GUARANTEE that you'll be writing (or reading) EXACTLY the same data under EXACTLY the same circumstances, so there's always that element of uncertainty

that's true, but i don't think it's that important

have a test where you're copying a thousand small files and yank the power in the middle
run this test 5-10 times and see how they compare
controller 1 never has a problem
controller 2 required a complete rebuild 5 times

maybe you can't exactly duplicate the circumstances, but it's enough to say controller 2 has problems

(actually requiring a complete rebuild even once would be a serious problem)

similarly, have a heavy read/write pattern with random data while simultaneously writing data out a pci firewire card and maybe even a usb drive and have audio playing and high network traffic (as much bus traffic and conflict as you can generate) that runs for 6 hours
controller 1 has 0 bit errors in that 6 hours
controller 2 has 200 bit errors in that 6 hours

controller 2 obviously has problems even if you can't exactly duplicate it

i think it's sufficient to merely show that a controller could corrupt your data

RAID Primer: What's in a number?

Post Your Comment

41 Comments

View All Comments

Brovane - Friday, September 7, 2007 - link

Bladen - Friday, September 7, 2007 - link

tynopik - Friday, September 7, 2007 - link

yyrkoon - Saturday, September 8, 2007 - link

tynopik - Saturday, September 8, 2007 - link

tynopik - Friday, September 7, 2007 - link

tynopik - Friday, September 7, 2007 - link

tynopik - Saturday, September 8, 2007 - link

Dave Robinet - Saturday, September 8, 2007 - link

tynopik - Saturday, September 8, 2007 - link

Log in

Don't have an account? Sign up now