Brownie RAID

Brownie RAID Upgrade Mini HOWTO

Our old Transtec SCSI RAID (actually rebranded Brownie boxes) started to experience increasing disk failures. The original RAID consisted of 16 200GB hard drives. We had the idea to upgrade the storage by replacing all of the drives with 750GB drives (the largest currently available). It does work, but there are some caveats.

Before You Start

Replacing the disks will destroy any data in the RAID. Once you have the new disks in place there is no going back. Make a backup or two. Also, this will certainly void any warranty you may have. You assume any and all risks. All I can say is that this worked for me.

Caveats

There are a few things of which you should be aware.

  • You are replacing the disks. The data will be effectively gone as the old disks will be sitting in a box somewhere. Make a backup. Or two.
  • The only configuration I was able to get to work was four RAID-5 arrays of four disks each. That gives a total of 9 TB1) (4 x 2.25) of available storage out of a possible 12 TB (16 x 750GB). The problem was that it seemed impossible to configure the slice sizes. The default is to put 2TB in the first slice and the remainder in the second. On older firmware, it seemed to put 2TB in the first slice and ignored the rest. This is the reason that 750GB hard drives are pretty much the limit anyway.
  • The rebuild takes a long time. The RAID will be offline for at least two days.
  • You need to physically disconnect the RAID from your server if you intend to reboot the server while the rebuild is in progress. The RAID does report the new disks, but refuses any requests (Spin up, etc.). This really confuses Linux (I believe it would eventually timeout but I wasn’t patient enough to find out).
  • I used LVM to create one large volume of 8TB. This is the limit for an ext2/3 type file system. In theory, you can increase the block size of the ext file system, or use a kernel that uses unsigned block numbers to increase the limit to 32TB, but from what I have read this is not recommended. In other words, 8TB is the current safe limit for an ext2/3 file system. If you turn all four arrays into a single file system, you will be around 50MB under the limit (Yes, MB; that isn’t a typo).

Time Required

The Brownie will be out of service for at least two days.

  • Replacing the disks takes approximately two hours
  • Rebuilding the RAID arrays takes approximately 40 to 48 hours

Required Tools

Required Tools

  • An anti-static mat and wrist strap
  • A philips screwdriver
  • Needle-nose plyers to move the drive jumper
  • A flat screwdriver to flip the safety lock
  • Espresso, Espresso machine (not depicted) and cup

Backup the Data

The first step can be a little tricky. Backuping up several terabytes of data requires someplace to put it. In our case we used the distributed storage on our supercomputer, the zBox2. To minimize the downtime, we ran rsync to copy all of the data while the system was live. When we took the drives off the network, we ran rsync again to transfer anything that had changed.

Replace the disks

Shut down the server and the Brownie RAID unit (i.e., power off). Replacing the drives is pretty straightforward.

Safety Latch

  • The first step is to unlock the safety latch. A small flat screwdriver will help here, although it can be done with really strong fingernails. Just click the safety latch to the left. The direction will change if you don’t have a rack mount brownie.

Latch

  • Next press the release latch down. The door will pop open and the drive can be pulled from the chassis.

Remove screws

  • Turn over the drive and locate the four screws holding the drive to the tray. Use a standard philips screwdriver to remove all four screws.

Remove the old drive

  • Carefully disconnect the IDE and power connector from the old drive. You can now take the old drive out and find for it a new home.

Drive Jumper

  • Verify that the master, slave or cable-select jumper is set correctly. The original disks were set for master, so I set the replacement disks the same way. Cable-select might work, but why take the chance when you only have to move a simple jumper.

Install the new drive

  • At this point you are ready to install the replacement drive. Attach the IDE and power cables to the new drive and fasten it to the tray with the four screws.
  • Replace the drive in the Brownie unit and close the latch. Make sure the drive is fully inserted. Now flick the safety switch back to the locked position.

Congratulations, you have just replaced your first disk. Now you have to repeat the process fifteen more times. I found that the coffee was optimally used between drive eight and nine, but this is highly variable.

I also found it useful to label each disk as I removed it with the slot number from whence it came. There really is no going back once the RAID realizes the disks have changed, but I found it comforting never the less.

Reconfigure the RAID

Initialized for the first time

  • After the disks have been replaced you can turn on the RAID. You should see something similar to the the following on the serial console.

All the Disks are okay

  • Scroll up and make sure that the disks are all recognized correctly. You will notice that they are all recognized as JBOD (Just a Bunch Of Disks).

Erase NVRAM

  • I like to reset the NVRAM although this step may not be strictly necessary. Remember to note any custom setting you have (such as the SCSI ID) as you will have to put them back.

Reset the RAID

  • Now reset the RAID. The device will reboot and after about a minute will be back up.

Reconfigure RAID

  • Now we want to reconfigure the RAID. You need to follow these instructions to the letter as the Brownie seems to be somewhat confused with large disks. In particular, don’t configure the slice; it doesn’t seem to work anyway. First, set reconfigure RAID to Yes. This is a safety switch to prevent accidental erasure.

RAID Level

  • Set the RAID level to 5 (Block-level striping with distributed parity).

Number of disks

  • Set the number of disks to four. This will create a slice of 2TB, with around 60GB left over in slice 2.
  • Repeat the process for each of the remaining three arrays (Array 2 through 4).

SCSI ID

  • Verify that the SCSI ID is set correctly. If you reset the NVRAM, it will be zero. If this isn’t the first device on the SCSI bus, a different ID would be appropriate. You may want to set both the primary and secondary channel (even if you only use the primary).

Set the LUN Mapping

  • Now the LUN Mapping must be set. Again, both the primary and secondary channel can be configured. Set the following.
    • LUN 0 to Array 1, Slice 0
    • LUN 1 to Array 2, Slice 0
    • LUN 2 to Array 3, Slice 0
    • LUN 3 to Array 4, Slice 0

Save settings to NVRAM

  • Save the settings to NVRAM.

Restart the RAID

  • Now restart the array. I hope you did all the steps above correctly.

Rebuild

  • When the RAID finished rebooting, it will start to build all four arrays. This takes around 40 hours, so go get some well deserved rest.

If for some reason, you find that something is wrong with your configuration, it is possible to start over without waiting for the build to finish. If you power off and on, it will just continue the build process. Similarly, if you try to delete the RAID it will complain that it is busy.

The only way I know to cancel the process is to configure each array for “NONE” as the RAID level (instead of 5) and save it to NVRAM. Now if you power off the RAID and turn it back on, it won’t continue the build process and you can start over.

What's With the LUN?

This is a problem I encountered even before the upgrade, but is worth mentioning here. It seems that the Brownie RAID devices don’t handle LUN reporting correctly. You will get an error similar to the following when Linux scans the SCSI bus. The actual LUN reported will be different and you may get several such messages.

scsi: host 0 channel 0 id 0 lun 0xffffffffffffffff has a LUN larger than currently supported.

From what I can tell by reading, the Brownie claims to support the SCSI REPORT_LUNS command, but returns garbage instead of valid data. This isn’t necessarily restricted to the Brownie devices. Upgrading to the latest firmware didn’t fix the problem for us.

Fortunately, there is a simple workaround. Add the following line to /etc/modprobe.conf. You can change max_luns to whatever is most appropriate for your system. In our case, we never have more than four LUNS on a single device.

options scsi_mod default_dev_flags=0x40000 max_luns=4

The value 0×40000 sets the SCSI_NOREPORTLUN flag in the driver. Instead of using the REPORT_LUNS command, it will scan for LUNs. In theory this is slower, but it’s all relative. It takes very little time in the end.

1) 1 TB = 1,000,000,000,000 bytes. Don’t blame me; that’s what the disk manufacturers use.
 
howto/brownie.txt · Last modified: 2006/08/06 17:59 by dpotter