This note is a brief summary of RAID disk technology and terminology, along with some personal experiences with RAID systems, and a bibliography of RAID references. It is not linux specific, and not intended to be either a HOWTO, nor a step-x-step. It is by no means the definitive work on the subject. To make such a claim would be a disservice to the reader, and to the many fine references cited here. As with anything else, there are a myriad of choices to make when configuring RAID systems. The decisions to be made depend on your needs, requirements, and budget. Hopefully this paper and the references cited will aid you in making the choices you can live with.
What is RAID? - RAID "(Redundant Array of Inexpensive Disks) is a method whereby information is spread across several disks, using techniques such as disk striping (RAID Level 0) and disk mirroring (RAID level 1) to achieve redundancy, lower latency and/or higher bandwidth for reading and/or writing, and data recoverability."
Inexpensive is relative of course. There is a balance between the cost of the solution you choose and the cost of replacing your data. At the same time, the most expensive solution, if not configured properly can provide less data security than a well designed cheaper solution. In addition to the drives, a fully redundant RAID system would incorporate redundant controllers, cabling, power supplies and extend in some cases to separate power sources as well. System redundancy as a whole is beyond the scope of this note, though some of the references give some good pointers and more than a passing reference on the issues involved.
Commonly used RAID designations run from RAID 0-10, with RAID 0, RAID 1, RAID 3, and RAID 5 being the most readily available (and used).
Current hardware RAID systems usually have hot swap capability. This allows you to replace a failed drive w/o taking the system off line, which does a lot for system up time.
A word of caution regarding RAID 5: While a RAID 5 array will continue to operate if a drive fails, NEVER, EVER pull more than one drive from an array. When a drive fails, replace it ASAP. If you loose, or remove more than one drive from a RAID 5 array, your data is effectively lost. You may be able to recover some of it, but most of it will be suspect. This is independent of the number of drives in your array. This may apply to RAID 3 as well, but I have no experience with it.
My own experience w/RAID is limited to RAID 1 (mirroring) and RAID 5, on VMS, NT and Unix. I tend to favor hardware implementations over software. This is a personal preference. One of the references cited claims that software RAID is superior. I have used both external RAID towers and systems that housed the RAID tower internally. RAID enclosures can stack the drives vertically or horizontally.
RAID systems I am familiar with supply configuration software in the controller firmware and/or as an application program. On internal, Intel based systems the firmware is generally accessible through the system bios at boot time. External towers generally use an ascii terminal port for running the software. The utilities provided allow you to specify the RAID type to use, the drives to include, caching strategy etc. There usually are also utilities to fail a drive, test a drive and to rebuild the array. Most current controllers will recognize when a drive has been replaced and rebuild the array without manual intervention.
Hardware RAID generally uses fast cache memory on the controller board to speed read/write operations to the disks, using a write-back caching methodology. The better controllers will have a battery backup on the cache so that data in the cache can be saved in memory if there is a system failure. In the event of battery failure, most controllers will revert to a write-through mechanism to protect against data loss. Write-through caching is inherently slower and there will be a noticeable degradation in throughput.
The bibliography section contains a number of references to SCSI disk controllers. If you use hardware RAID you will use SCSI controllers. A basic understanding is a good thing.
Heat and Disk Drives:
Disk drives generate heat, and cooling in an enclosure is as
important for the drives as it is for the cpu. In three years
of running an internal, vertical tower configuration on four
similar systems, we have lost 6 or 7 drives. All of these have
been the top drive in the tower, which leads me to believe this
is a cooling problem. These systems are ALR/Gateway servers
w/internal RAID towers. These include both the Revolution
series and the 9000 series systems. To bolster this theory, I
have also used DEC Storageworks external RAID enclosures where
the drives are arranged in a horizontal array configuration.
The drive failures with these systems were fewer, and involved
random drives. One vertical tower external RAID system we use
has individual fans on each drive in the enclosure. If the
controller senses that a fan has failed, it will fail the drive
and take it off line.
(Note: the links below are current as of this writing - June, 2000)
Sun World:* - has run a number of good articles on RAID in their Storage series.
"RAID: What does it mean to me?"*, Brian Wong (September 1995)
"0,1, 0+1 ... RAID basics, Part 1"*, Chuck Musciano (June 1999)
"RAID basics, Part 2: Moving on to RAID 3 and RAID 5",* Chuck Musciano (July 1999)
"RAID basics, Part 3: Understanding the implementation of hardware and software-based solutions"*, Chuck Musciano (August 1999)
"RAID basics, Part 4: Uniting systems and storage"*, Chuck Musciano (September 1999)
"RAID basics, Part 5: The power of storage area networks"*, Chuck Musciano (October 1999)
"What is RAID"*, Rawn Shah (June 1999)
"Storage Beyond RAID"*, Rawn Shah (July 1999)
Other storage-related articles listed in the SunWorld Topical Index*
The vendor literature cited below is pretty generic in nature. There is some information specific to their own products, but it is generally low key and does not cloud the technical content. The vendor's technical libraries and white paper libraries do include product specific information as well as more generic technical information. The reader is left to sort this out on an as needed basis. References (or lack thereof) to any specific vendor is entirely due to the fact that these are what I have gathered and found useful.
nStor/Andatco - an offshoot of the Seagate/Conner merger. There are a number of good references to RAID and other storage related subjects in their technical library.
RAID An Introduction to the Technology*
SCSI Challenge WhitePaper*
RAID Disk Array Technical Information*
nStor white papers*
nStor Technical Library*
DPT/Adaptec - this includes current links to the reference library in the "Linux DPT Hardware RAID HOWTO" referenced below.
Understanding RAID* (pdf)
Benefits of Intelligent Hardware RAID and Caching* (pdf)
Glossary of Terms*
Hardware- or Software-Based RAID: Which Solution is Best for You?*
Adaptec - Some technical information is available here.
The Adaptec Array Guide: Choosing the right RAID solution for your business.*
Understanding I/O Subsystems* : W. David Schwaderer , Andrew W. Wilson, Jr. , 1996.
This book describes pc I/O subsystems in some detail. The chapter on SCSI controllers, SCSI Primer*, is available online.
Linux specific references:
Linux DPT Hardware RAID HOWTO* -Updated: April 1999. How to set up hardware RAID under Linux. The links to the DPT raid information are obsolete. Use the links above under DPT/Adaptec. The Buslogic links now point to Mylex, which was recently bought by IBM. The references section has links to a number of good sources as well. An exception here is the reference to Storage Computer - these papers do not appear to be available.
Software-RAID-HOWTO* - Updated: January 2000 (latest version). How to use Software RAID under Linux. It addresses a specific version of the Software RAID layer, namely the 0.90 RAID layer made by Ingo Molnar and others. This is the RAID layer that will be standard in Linux-2.4, and it is the version that is also used by Linux-2.2 kernels shipped from some vendors. The 0.90 RAID support is available as patches to Linux-2.0 and Linux-2.2, and is by many considered far more stable that the older RAID support already in those kernels.
Root RAID HOWTO cookbook* - Updated: March 1998. A cookbook for creating a root mounted raid filesystem and companion fallback rescue system using linux initrd. Step-by-step instructions for both raid1 and raid5 md0 devices.
Linux High Performance SCSI & RAID* - Mike Neuffer. This is a fairly comprehensive discussion of the topics. His contributions to the Linux kernel are mainly SCSI related. He wrote the eata-dma driver for DPTs SCSI controllers, assisted with the eata-pio driver, wrote the /proc/scsi code, added support for WIDE SCSI, multiple channels and hot plugging.
What is RAID* - Mike Neuffer. A discussion of RAID basics, background and terminology.
Purely Academic references and information:
These are generally not available on-line as far as I can tell.
Gibson, Garth; Hellerstein, Linda; Karp, Richard; Katz, Randy;
Patterson,David. (1989). Coding Techniques for Handling
Failures in Large Disk Arrays. Third International Conference
on Architectural Support for Programming Languages and
Operating Systems, (A SPLOS III), Boston MA.
Flynn, Michael Computer Architecture: Pipelined and Parallel
Processor Design. (1995). Jones and Bartlett Publishers,
Inc., London UK
Patterson, D; Garth, G; Katz R. (1987). A Case for the Redundant
Arrays of Inexpensive Disks (RAID). University of Berkeley,
Report No UCB/SCD/87/391
RAID - Redundant Arrays of Independent Disks* - This is a link to some of Garth Gibson's and other's works on RAID technology.
Other miscellaneous RAID related links:
RAID OVERVIEW* - A nice graphic from Ciprico, showing the different RAID levels. No other documentation available.
RAID Tutorial* - From Baydel.
Overview of RAID* - Another short RAID overview.
Gary's Encyclopedia - RAID* - A page with some RAID related references and links. Part of a larger on-line reference site mostly devoted to linux.