Solaris software RAID

Contributor Icon Contributed by bofh468 Date Icon November 13, 2003  
Tag Icon Tagged: Solaris system administration

Add raid to your Solaris system


This describes how to install Software raid using Solstice Disksuite. First, you need to grab the DiskSuite packages from Sun. It’s a free download, but you do need to create a sunsolve account to get it.

You can find a download link for the software at http://wwws.sun.com/software/solaris/get.html. It’s about half-way down that page.

Install the packages - There’s an installer included. Don’t bother. Just pkgadd the individual packages. There’s only 5 or so.

Once you’re done that, you’ll need to determine how you want to lay out your disks. The following assumes that:

1 - You have 2 disks - c0t0d0 (disk0) and c0t1d0 (disk1).
2 - The system installed only on disk0, and disk1 is unused.
3 - Each disk has the following slices:

0 - /
1 - swap
2 - whole-disk
3 - unassigned 64-MB
4 - unassigned 64-MB

Adjust the above to match whatever your preferred layout is. This is only for a simple example. Slices 3 and 4 are for Meta-Database logging. If you don’t have 128MB of free space to spare, then try and make some space (ie., sacrafice some swap if you have to).


Disclaimer: Before you continue, I can’t stress enough. Make backups. I always do this procedure during system installation, and I do it routinely. Although your data is supposed to survive and actually migrate without incident, I have seen DiskSuite eat a system once or twice - usually due to operator error.

Make backups.

You need to duplicate your layout from disk0 to disk1. It’s fairly important that the disk geometry matches. Metadevices work at the block-level of the disk, and if one disk has fewer blocks than the other you’ll wind up making a mess. Once you’re sure you’re ready to proceed, dump the layout from disk0 to disk1 thusly:


prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2

Second, you need to create your meta-databases. This is for logging, and all but eliminates the need for fsck to run after a dirty shutdown. Do the following:


metadb -af -c 2 /dev/dsk/c0t0d0s3 /dev/dsk/c0t0d0s4
metadb -af -c 2 /dev/dsk/c0t1d0s3 /dev/dsk/c0t1d0s4

This adds (-a) 2 (-c for count) meta-databases in each of the slices. If you have more disks, you can span the databases across multiple disks for better performance and fault-tolerance.

The next step is to create your raid-devices. In a two-disk system, you’re stuck with Raid0 and Raid1. Since Raid0 is almost pointless (you’re doing this for redundancy, remember?!), we’ll go with Raid1 - mirrored disks.

We’ll deal with the following raid devices and members:

d0 - / mirror
d10 - /dev/dsk/c0t0d0s0
d20 - /dev/dsk/c0t1d0s0

d1 - swap
d11 - /dev/dsk/c0t0d0s1
d21 - /dev/dsk/c0t1d0s1

The device names are somewhat arbitrary. In a simple setup like this, I use d0 to match up with a mirrored slice0, and d10 to indicate member 1 of d0 (member 1 d0 = d10, member 2 d0 = d20).

So create the raid devices and members:


metainit -f d10 1 1 /dev/dsk/c0t0d0s0
metainit -f d20 1 1 /dev/dsk/c0t1d0s0
metainit -f d0 d10

metainit -f d11 1 1 /dev/dsk/c0t0d0s1
metainit -f d21 1 1 /dev/dsk/c0t1d0s1
metainit -f d1 d11

This initializes the devices. The command “metastat” will show you that the devices exist, but the mirror-halves aren’t attached. So let’s attach them:


metattach -f d0 d10
metattach -f d1 d11

You’ve just attached the first half of the mirror. Yes, this is the disk that you’re currently running on. Your data is still there.

Next, you need to ensure the system will use the metadevices. The root-filesystem is easy:


metaroot d0

Next, you need to edit /etc/vfstab to change the swap device to use /dev/md/dsk/d1 as swap. While you’re in there, turn on logging under the mount options for the root filesystem (d0). Double-check that you haven’t screwed up. Save and exit if it all looks good.

Once you’re done, issue the following:


lockfs -fa
init 6

Watch your system come up. There will be some new messages, most notably the kernel complaining about not being able to forceload three raid modules:


forceload of misc/md_trans failed
forceload of misc/md_raid failed
forceload of misc/md_hotspares failed

You can ignore these messages. They’re harmless. Basically, you haven’t created any raid-devices that require those modules so they’re refusing to load.

Now that your system is up (You didn’t mess up vfstab, did you?!), you need to finish off the process. Log in and do this:


metattach -f d0 d20
metattach -f d1 d21

You’ll notice that your system is now a little slower, both commands took a moment to return, and your disks are going nuts. Look at the output of “metastat” and you’ll see why - your disks are syncing.

You’ll need to install the bootsector to your second disk so that you can boot from it. This is fairly easy to do:


installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0

You might also want to set the OBP to boot from disk1 if it can’t boot from disk0. If you bring the machine to the OBP (ok) prompt via init 0, you can enter the following:


setenv boot-device disk disk1
nvstore
boot disk1

This will set up a failover boot to disk1. The very last command there will also boot from disk1, proving to you that this works. Do be sure to substitute the correct disk for “disk1″.

You now have your root filesystem and swap space sitting on raid1 volumes. This means that losing a disk no longer means that you have to rebuild your system. Now you just need to replace a disk.

I strongly suggest that you read through DiskSuite’s docbook at Sun’s online documentation site (docs.sun.com). There’s a lot more that you can do with it that’s not covered here. Oh, and you’ll probably want to read up on how to actually replace a failed disk. :)

(If there’s enough interest, I might be coerced into posting a howto on that)

Previous recipe | Next recipe |
 

Viewing 9 Comments

    • ^
    • v
    Shouldn't this:

    metadb -af -c 2 /dev/dsk/c0t0d0s3 /dev/dsk/c0t0d0s4
    metadb -af -c 2 /dev/dsk/c0t1d0s3 /dev/dsk/c0t0d0s4

    Be this:

    metadb -af -c 2 /dev/dsk/c0t0d0s3 /dev/dsk/c0t0d0s4
    metadb -af -c 2 /dev/dsk/c0t1d0s3 /dev/dsk/c0t1d0s4

    bubba AT bubba.org
    • ^
    • v
    Great catch! You are correct. The recipe has been fixed.

    Thank you for helping to make Tech-Recipes better!
    • ^
    • v
    got me out of a hole.

    Two comments
    why do you not put the metadb across 2 physical disks? Would that not be part of the ideal of mirroring? Or do the -af -c 2 done twice actually give you a -af -c 4 ?

    You say edit vfstab and turn on logging. For a linux person on solaris I aint too sure about that. Any clues?

    I would appreciate 10 seconds on lockfs -fa - why not just sync?

    Again nice article - it just works, thanks
    • ^
    • v
    on sol9:

    bash-2.05# metainit -f d0 d10
    metainit: srv-e4500: d0: "d10": syntax error

    ideas?
    • ^
    • v
    <ul id="quote"><h6>Anonymous wrote:</h6>on sol9:

    bash-2.05# metainit -f d0 d10
    metainit: srv-e4500: d0: "d10": syntax error

    ideas?</ul>

    from the metainit man page...

    metainit d0 -r d10
    • ^
    • v
    <ul id="quote"><h6>Anonymous wrote:</h6>on sol9:

    bash-2.05# metainit -f d0 d10
    metainit: srv-e4500: d0: "d10": syntax error

    ideas?</ul>

    metainit d0 -m d10
    metainit d0 -m d10

    the metattach further down should be left without the -f switch since it doesnt exsist in metaattach command

    should just be

    metattach d0 d20
    metattach d1 d21

    not metattach -f d0 d20

    otherwise it works like a charm :wink:
    • ^
    • v
    <ul id="quote"><h6>Reg wrote:</h6></ul><ul id="quote"><h6>Anonymous wrote:</h6>on sol9:

    bash-2.05# metainit -f d0 d10
    metainit: srv-e4500: d0: "d10": syntax error

    ideas?</ul>

    metainit d0 -m d10
    metainit d0 -m d10

    the metattach further down should be left without the -f switch since it doesnt exsist in metaattach command

    should just be

    metattach d0 d20
    metattach d1 d21

    not metattach -f d0 d20

    otherwise it works like a charm :wink:

    sry
    metainit d0 -m d10
    metainit d1 -m d11

    thats it

    reg
    • ^
    • v
    why do you use 64mb? is it enough for every disk size? i want to build raid1 with two 400gb disks, can i still use 64mb?
    • ^
    • v
    The quick way of making a stripe:

    partition each of your disks with a 10mb partition, (I use slice3) to hold your metadb's, assigning the rest of the space to slice 7

    Create you metadb's
    metadb -a -f -c 3 /dev/dsk/c1t1d0s3 /dev/dsk/c1t2d0s3 /dev/dsk/c1t3d0s3

    At the following to /etc/lvm/md.tab

    d0 3 1 dev/dsk/c1t1d0s7 1 /dev/dsk/c1t2d0s7 1 /dev/dsk/c1t3d0s7

    Build the stripe:

    metainit d0

    create the file system:

    newfs /dev/md/rdsk/d0

    mount the file system to check it works:

    mount /dev/md/dsk/d0 /mnt

    write to it, df it, fsck it, etc.

    then edit /etc/vfstab

    /dev/md/dsk/d0 /dev/md/rdsk/d0 /mnt ufs 2 yes quota

    This means the file system is mounted on /mnt at reboot, and will not be mounted in single user mode, change the 2 for 1 if you want it mounted when the kernel comes up.

    Note: it's usefull to keep the output from newfs somewhere, because if your disk ever dies, you need to know where your next super block is, (for use with fsck) and the output from newfs is the only way you have of knowing this.

    Note also, that if you ever screw your metadevice you can still mount the individual disks manually:

    mount /dev/dsk /dev/dsk/c1t1d0s7 /mnt

    It is frequently quicker to do this, back up the data, then rebuild your metadevice from scratch, then restrore the data, than it is to try to recover the metadevice.

    To build a mirror, same rules for partitions

    Edit /etc/lvm/md.tab

    d0 -m /dev/md/dsk/d1 /dev/md/dsk/d2
    d1 1 1 /dev/dsk/c1t1d0s7
    d2 1 1 /dev/dsk/c1t2d0s7

    The rest is the same.

    If ever a mirror disk goes bad, you can do a soft replace with the following command:

    metareplace -e d0 c1t2d0s7

    It will then try to resync the disk.

    If a mirrored disk dies and has to be replaced use the following:

    metadb -d /dev/dsk/c1t1d0s3 (to delete the state databases)
    replace physical disk (c1t1)
    prtvtoc /dev/rdsk/c1t2d0s2 | fmthard –s - /dev/rdsk/c1t1d0s2 (to mirror the partition table, working drive -> new drive)
    metadb -a -c 3 /dev/dsk/c1t1d0s3 (to rebuild the state databases)
    metareplace -e d0 c1t1d0s7 (to replace the failed mirror)

    Note:
    You can replace the disk on the fly, repartition and metareplace. *THEN* metadb -d & metadb -a -c 3 with the system up...
 
close Reblog this comment
blog comments powered by Disqus