RAID.IT – Adaptec Raid Controller, continued – part 3.

This is part 3 of the saga of the Adaptec Raid Controller – ASR-5405 that suddenly decided that my array was no longer there. Due to a motherboard bug on the Gigabyte motherboard of my test machine, four of my 1TB disk drives in the array was somehow configured to be 32MB in size. While researching the problem, I came across this great website that had a utility to fix the problem.

Restoring Factory Hard Drive Capacity

The utility they provide is to restore the hard disk drive factory capacity. It only runs on Windows – which isn’t a problem as such, and eventually I did this, and was able to turn each of my 1TB drives back into a 1TB drive – does that make sense? Or rather, turn my 32MB drives back into the original 1TB drives – yes, much more like it.

dsc_0201

dsc_0202

dsc_0203

dsc_0204

Now, I tried the drives back in the machine, however the Adaptec controller still insists that there is no logical drives found, even though it sees all four 1TB drives – including the one that was failing – I thought I should put that back in. The data isn’t really lost, since it is still sitting on three of the drives that are working. A RAID-5 array can tolerate one disk failure, so I have 3 out of 4 working drives from the array. All I have to do is to determine some parameters about the disk array. Sounds simple?

Most RAID-5 arrays use a distribute parity block, so effectively, we put data blocks on three disks with one disk having a parity block. Then the next three blocks go onto three disks, with a parity block on the other disk, except that it doesn’t go onto the disk that had the last parity block – not sure if I am explaining it properly. Anyway, what I have to do is to determine the block size, then the order of the data disks, and determine where the parity block goes first, then where the parity block goes next, and so on.

Once I work that out, then I can explain the layout a bit better. You will find terms like stripe factor, blocking factor, parity rotation order etc. What it means is that the parity block moves around the disks in a particular order. Last time, I had copied 10MB from each disk into files that I called arraydisk1, arraydisk2, arraydisk3 and arraydisk4. The number refers to the physical connection order on the array controller.

After some examination of the arraydisk files using a hex editor (which in this case is HexEdit), I was able to find some regular data structures in the files that allowed me to work out the size of the block – which was 256KB. Once I know the block size, I can then look at data just before and after the boundary and try to match it up. It is like a jigsaw puzzle – except we are working with data instead of shapes, but same sort of thing.

Last night, I was able to work out to my own satisfaction, that the physical disk order and the data disk order was the same, and that the parity block order was left asymmetric – which is nice and easy to explain. I also found some documentation on the internet that also indicated that Adaptec uses the left asymmetric parity block order.

My logical disk drive is 3TB, so just consider the following. The Adaptec controller writes the first 256KB onto the first disk, then the next 256KB onto the second disk, then another 256KB onto the third disk. A parity block which is comprised of the XOR of the previously written 256KB blocks – the result is written to the fourth disk. So now, we have 256KB of data or parity written to each disk. Now, the left asymmetric method says that the next parity disk will be the third disk. So now the next lot of data after what has already been written will be 256KB to the first disk, 256KB to the second disk, 256KB to the fourth disk and then the parity block generated will be written to the third disk. And so on, the next parity disk is the second disk, after that it is will be the first disk, then back to the fourth.

Got that? Ok, next to do will be to reverse this, since I have the first three disks, I can write my data to a new 3TB disk drive. I have a perl program that I wrote many years ago, just to do this – I just have to tailor it to just this situation. So, how do I do this?

Just imagine that this is what will happen, I will read 256KB from each disk 1, 2 & 3 – the first block of the disks comprise of data, so this will be written to the destination disk. The next 256KB blocks from the disks will be data, data, and parity – so the data blocks are written, then I do an XOR of the data blocks and parity block, and the result will be a data block that I write. So far, I have written six blocks, ok?

The next block from each disk will be data, parity, data – so again, write the data blocks, then XOR everything together, and write that as data – now I have nine blocks. The next block from each disk will be parity, data, data – so now I write the data blocks first, then XOR the blocks to get the new data block that is written to the disk. I now have written twelve blocks. The next block from each disk will be data, data, data – so we are the same as we were at the beginning of the disk – we just write out the data – and continue, ok?

So we keep going and eventually we have read the entire three disks and written 3TB or so of data – which I should be able to connect up and the computer should recognize the drive. Well, that is for another day to do, or maybe on the weekend. Wasn’t I lucky that it was the last disk that failed? Actually it really doesn’t matter which disk has failed, as long as we can determine the order of the drives.

As an example, what if it was the second disk that failed, and we have the first, third and fourth disk available. Since we know which is the parity block, we would know that the first block from each disk is data, data and parity – so as the missing disk is the second disk, we have to write data, XOR, and data – where XOR is the result of XOR on the two data blocks and the parity block. The next block we read would be data, parity, data, so we would write data, XOR, data. The third block, we would be reading data, data, data, so that is what we write as the block from the missing second disk would be parity, which we don’t need. Makes sense? Ok, I am glad it makes sense to someone. See you next time.

RAID.IT – Adaptec Raid Controller, continued.

I finally got a response back from Microsemi. Essentially, my complimentary support ended in 2012 and my warranty ended in 2013, so if I wish to proceed – I would have to pay 80USD or 65 Euro per support incident. Plus no guarantee of resolution.

Ok, so last night, I got the drives including the failed drive, and read 10MB from each one – this is so that I can work out the striping factor and the parity rotation order. It was at that time, that I noticed that my Hitachi 1TB drives were initially seen in my test machine as 32Mb and then the capacity increased to 1TB – this is interesting. It happened on all of the drives which is strange – so I did some further looking on the internet.

It turns out that there have been quite a few incidents of this happening to different manufacturer drives – with a seemingly common factor, a Gigabyte motherboard. I checked my test machine and certainly, this one is Gigabyte. So now I have a possible cause, my drives were checked out on my test machine, which is a Gigabyte. It seems that due to a motherboard bug, something might have been written to the disk, that somehow makes it think it is 32MB instead of 1TB.

Ok, more research now, to see about fixing this problem – and I guess I need to change motherboards on my test machine.

 

Raid.IT – Adaptec Raid Controller ASR-5405 sees 1TB disks as 32MB, what?

Ok, what is this about? Oh yes, you probably have read previously that I do some data recovery from disk drives. Just recently I received a laptop to do some recovery of deleted photos. Usually the first thing that I do is to make a raw image of the disk. I do this on a Ubuntu linux machine onto my network storage, however in this case – my network storage was a little full, and could not handle another 500GB. So I then used another machine that is set up with an Adaptec Array Controller – the ASR-5405, with four 1TB disk drives configured as Raid-5, which gives me a usable 3TB or so.

I connected up the laptop disk and duly made a copy of the disk. It was getting late, so the machine was turned off. When I turned it on again, the array controller started beeping, very loudly – which generally signifies a failed disk, so it didn’t make sense to continue with it until the disk failure is resolved. Once powered off, I removed the disks one by one and connected it to my other test machine. I run ‘smartctl -a’ commands on each disk drive and eventually found that the last one had SMART errors that indicated that it was failing. I replaced this disk drive with another 1TB disk that I had on hand.

To my surprise, when I powered on, and let it boot up – I could find my Data disk which should be 2TB. Neither was my Temp disk visible which was the remaining 1TB or so. No beeps on powering up, so what gives?  I powered down, then this time I watched it boot up – this is what I saw…

DSC_0198

This doesn’t make sense, my 1TB disks are now seen as 31MB. That is why my logical drive was missing – it thought it did not have capacity for my array. Now what? I did a Google search of the internet but did not find anything like this happening – so why did it happen to me? Don’t know, then I decided to contact Adaptec, which by this time had been bought out by Microsemi. I opened a ticket with Microsemi, in which I needed to supply my Tsid number, which is some number that shows that you are a valid owner of this controller.

Later I got an automated reply saying that I need to create a support archive and upload it. For Ubuntu, I managed to run the Storage Manager GUI but when I tried to connect to the controller, I got a Java exception error which crashed the GUI – bummer. Back to Google then and Adaptec’s support page and found out how to get the support archive by running the command line utility ‘Arcconf’.  For my version of the software, the command

arcconf savesupportarchive

would generate the support archive, then I could zip it up and upload it to them. That is where I am now, so I will wait and see what happens because it is a public holiday in the US right now.

P.S. I could recover the data from the disks, since the disks are still intact – I have three of them which is the minimum needed, but I don’t quite have enough storage. How would I do this, firstly by making a raw image of each disk. The beginning of each disk should contain information that Adaptec uses to determine the disk and array configuration, since it will know the position of each disk in the array. Then the data will be striped across the four disks (of which I have three) with a block size that I would have to determine, and the striping factor, which is the way the parity block is distributed. Once that is determined, it is a simple matter of running a little perl script that I had written once before, to generate the array as a single file.

With this array, which is like a raw 3TB disk – I could copy this to a 3TB disk which could possibly be usable straight away and recognized by Ubuntu. Hence, in order to do this I need a minimum of 3TB for the raw data, and 3TB for the final array – 6TB in total, or two 3TB drives – which I do happen to have on hand. I might have to do this, but let’s give Microsemi a chance to come back because maybe they can tell me to run some magical command that will let the controller recognize the disks for the size that they actually are, and then I can continue my work.

Retask.IT, Replace.IT – Cryptomining & VMware ESXi 5.5 Update 2 Host Server

So, what has cryptomining got to do with VMware?

Late last year, when the bitcoin was around the US$600 mark, I embarked into cryptocurrency mining.  This was where I used my desktop together with some software like cgminer and began scrypt number crunching using my video card. During a couple of months of trial, I was mining Anoncoin, then moved on to Novacoin, and dabbled briefly on Peercoin which really didn’t work out. There was enough justification to go into this in a bigger way, i.e. 5 mining computers instead of one. I bought a few video cards, actually not a few, 3x Radeon 7950 cards, 7x Radeon 7850 cards, and a Radeon 7870 card. I even pressed into service my older Radeon 5850 card when gave up the ghost after its fan failed one day, but I replaced the fan and heatsink with an after-market cooler and kept it workng.  I played around with a lot of other cryptocoins – that is until the returns from mining would not cover the cost of our expensive electricity.  In addition the room was getting quite hot and having to have the aircon running during summer was just not acceptable. Okay – basically everything was shutdown in June this year, so now I have this hardware sitting around essentially doing nothing.

Retask.it” – the mining hardware, of course. My VMware ESXi 4.0 Host Server was getting old, having run for several years and perhaps now was an opportunity to “Replace.it“. The current version of VMware ESXi is 5.5 Update 2. I put together some hardware to test this version – and had lots of issues installing it because some previously working hardware was no longer supported. There is another story there that I might tell another day. Anyway, after creating my customized installation cd that contains the Realtek 8168 network drivers, and updated adaptec array controller drivers – I was ready to install the production server.

The current configuration for my server contains the following parts:

Asrock 970 Extreme 4 AMD AM3+ motherboard with AMD Athlon X3 420e triple-core cpu and 8GB of ram.  The motherboard can handle up to 64GB of ram, so is sufficient for future expansion. There is no onboard video so I had to buy a single slot Gigabyte GV-R545 video card which houses a Radeon 5450 for $33. I don’t need a high performance card, just one that is a low power card.  The disk storage is an Adaptec 5805 Sata Array Controller (this was found for $300, normally $700+) – initially with 3x WD 3TB Nas Red drives, configured with Raid Level 1E.  I chose the Nas Red drives because they are designed for 24 hr operation – a little more expensive but hopefully are worth it, only time will tell.

SONY DSC

My server needs multiple network cards, one for onboard management, one for internet connection, one for general network and one for backbone network.  Backbone is where I plan to have multiple host servers communicating – not implemented as yet.  The motherboard only has two PCI slots, so I could only install two network cards.  I have a couple of PCI-e networks cards on order – one of those will be for the backbone network.  I found from experience that having a few drives running 24 hours a day has a bit of heat, which requires a bit of cooling.  To that end, I have reused the Antec 1100 case to house all of these items.  This case is a very good for gaming and has lots of cooling, apparently better than the Corsair 500R that was also available.

One more thing is missing, the power supply – I have two FSP Aurum Pro 1000W power supplies left over – one of these was pressed into service and should easily handle another half a dozen drives for future expansion.  Almost forgot – add a cd/dvd-rom drive – I need one in order to install from my customized cd.  To save power, I can always disconnect it after installation – a good idea as this will be running 24×7, since one of the virtual machines is a firewall that protects my internal network from the world wide web.

Current capacity is 4.1TB of which I have used just 80GB, so still another 4000GB to go. If I add five more 3TB drives in Raid 5, this will give me 12TB additional capacity.  In comparison, the old ESXi 4.0 server had 5x 1TB drives in a Raid 1 and a Raid 5 configuration giving me a total of 3TB.  I didn’t know at the time that if I had upgraded the firmware on the Adaptec 5405 Sata Array Controller, I could have achieved this capacity with only 4 drives in Raid 5.  The older firmware at that time only allowed a maximum array size of 2TB to be created.  This was one of the benefits that came out of my testing of ESXi 5.5 – to work out what can be improved.

Anyway, still more work to do. Need to sort out all of the virtual machines, work out which to keep and migrate those to the new server. Better get on with it, I guess.

[PS]  There is a very good reason for including the updated adaptec array controller drivers in my customized installation cd.  During testing and installing adaptec monitoring software, I found that the included adaptec drivers that are bundled with VMware ESXi 5.5 Update 2 did not allow array monitoring, so I had to install some updated drivers from Adaptec.  After doing this, each time I rebooted the server, the datastore went missing.  The datastore houses all of the virtual machines – if this was missing, no virtual machines can run.  It turns out that upgrading the driver caused the VMware to think that the datastore is now a snapshot.  We cannot run from a snapshot (which is like a copy or an image), so the only thing that could be done to fix this permanently is to resignature the storage, but that meant I will need to relink every virtual machine (like 20 of them) – what a headache, so it is best not to upgrade the drivers unless absolutely necessary which means – use the right drivers from the start.
The technical document is here

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1011387