RAID.IT – Adaptec Raid Controller, continued.

I finally got a response back from Microsemi. Essentially, my complimentary support ended in 2012 and my warranty ended in 2013, so if I wish to proceed – I would have to pay 80USD or 65 Euro per support incident. Plus no guarantee of resolution.

Ok, so last night, I got the drives including the failed drive, and read 10MB from each one – this is so that I can work out the striping factor and the parity rotation order. It was at that time, that I noticed that my Hitachi 1TB drives were initially seen in my test machine as 32Mb and then the capacity increased to 1TB – this is interesting. It happened on all of the drives which is strange – so I did some further looking on the internet.

It turns out that there have been quite a few incidents of this happening to different manufacturer drives – with a seemingly common factor, a Gigabyte motherboard. I checked my test machine and certainly, this one is Gigabyte. So now I have a possible cause, my drives were checked out on my test machine, which is a Gigabyte. It seems that due to a motherboard bug, something might have been written to the disk, that somehow makes it think it is 32MB instead of 1TB.

Ok, more research now, to see about fixing this problem – and I guess I need to change motherboards on my test machine.

 

Raid.IT – Adaptec Raid Controller ASR-5405 sees 1TB disks as 32MB, what?

Ok, what is this about? Oh yes, you probably have read previously that I do some data recovery from disk drives. Just recently I received a laptop to do some recovery of deleted photos. Usually the first thing that I do is to make a raw image of the disk. I do this on a Ubuntu linux machine onto my network storage, however in this case – my network storage was a little full, and could not handle another 500GB. So I then used another machine that is set up with an Adaptec Array Controller – the ASR-5405, with four 1TB disk drives configured as Raid-5, which gives me a usable 3TB or so.

I connected up the laptop disk and duly made a copy of the disk. It was getting late, so the machine was turned off. When I turned it on again, the array controller started beeping, very loudly – which generally signifies a failed disk, so it didn’t make sense to continue with it until the disk failure is resolved. Once powered off, I removed the disks one by one and connected it to my other test machine. I run ‘smartctl -a’ commands on each disk drive and eventually found that the last one had SMART errors that indicated that it was failing. I replaced this disk drive with another 1TB disk that I had on hand.

To my surprise, when I powered on, and let it boot up – I could find my Data disk which should be 2TB. Neither was my Temp disk visible which was the remaining 1TB or so. No beeps on powering up, so what gives?  I powered down, then this time I watched it boot up – this is what I saw…

DSC_0198

This doesn’t make sense, my 1TB disks are now seen as 31MB. That is why my logical drive was missing – it thought it did not have capacity for my array. Now what? I did a Google search of the internet but did not find anything like this happening – so why did it happen to me? Don’t know, then I decided to contact Adaptec, which by this time had been bought out by Microsemi. I opened a ticket with Microsemi, in which I needed to supply my Tsid number, which is some number that shows that you are a valid owner of this controller.

Later I got an automated reply saying that I need to create a support archive and upload it. For Ubuntu, I managed to run the Storage Manager GUI but when I tried to connect to the controller, I got a Java exception error which crashed the GUI – bummer. Back to Google then and Adaptec’s support page and found out how to get the support archive by running the command line utility ‘Arcconf’.  For my version of the software, the command

arcconf savesupportarchive

would generate the support archive, then I could zip it up and upload it to them. That is where I am now, so I will wait and see what happens because it is a public holiday in the US right now.

P.S. I could recover the data from the disks, since the disks are still intact – I have three of them which is the minimum needed, but I don’t quite have enough storage. How would I do this, firstly by making a raw image of each disk. The beginning of each disk should contain information that Adaptec uses to determine the disk and array configuration, since it will know the position of each disk in the array. Then the data will be striped across the four disks (of which I have three) with a block size that I would have to determine, and the striping factor, which is the way the parity block is distributed. Once that is determined, it is a simple matter of running a little perl script that I had written once before, to generate the array as a single file.

With this array, which is like a raw 3TB disk – I could copy this to a 3TB disk which could possibly be usable straight away and recognized by Ubuntu. Hence, in order to do this I need a minimum of 3TB for the raw data, and 3TB for the final array – 6TB in total, or two 3TB drives – which I do happen to have on hand. I might have to do this, but let’s give Microsemi a chance to come back because maybe they can tell me to run some magical command that will let the controller recognize the disks for the size that they actually are, and then I can continue my work.

Recover.IT – HP EX490 MediaSmart Server – Part 3

Ok, where was I?  Yes, the faulty disk in my MediaSmart Server – eventually the removal process came to an end where there was only a few files left over – about 10 – which was old and unnecessary – I tried to delete them, but each time I deleted them, they came back. So, I had to log onto the server through the Windows console, and then run a chkdsk command on the disk. I had to run it twice and after that, was able to do the removal again, and finally the disk was blinking – indicating that it was ready for removal.

As this disk had some bad sectors in one area, sometimes we can do a security erase on the disk. The security erase is an internal function of the disk drive firmware and we have to go through a little process to do this. The security erase will effectively perform a factory format of the disk surface which in general should rewrite the entire data surface and we should (theoretically) end up with no bad sectors.

ScreenShot074

The first hdparm command is to interrogate the disk, and writes the output to a file, we need to check that it isn’t frozen. The second hdparm command sets a master password which I have called llformat (meaning low level format), then we check the drive again, to confirm that it is enabled for erasing – which was confirmed.

The final hdparm command tells the drive to commence a security erase, which would take approximately 322 minutes, so it was time to leave it and let it run and check it in the morning, which I did. Afterwards, I ran a diagnostic on the drive, but had some strange error – it seems that the drive now thinks that it is a 1TB drive and not a 2TB drive. I checked the hdparm output from the initial command.

ScreenShot072

Definitely it shows that it is 2TB (2000GB) as indicated by the device size with M = 1000*1000 – i.e. M = 1 million bytes.  The hdparm command I ran after the erase had finished is shown here.

ScreenShot073

Definitely, here again, it says it is a 1TB = 1000GB drive – what is going on? The serial number is the same, so I am not dreaming – and definitely, it thinks it is 1TB. I also ran a smartctl command to check the status of the SMART data on the drive.

ScreenShot071

It shows that parameter 184 End-to-End_Error is FAILING_NOW, so basically the drive is failing – so should not be used for anything critical as it could stop working at any time. A pity because a 2TB drive could still be handy to play around with, but now it is 1TB.

I performed an erase again, and it seems that it still is a 1TB afterwards, so definitely there is a problem somewhere – maybe in the firmware. Suspiciously, there is nothing like this when I do a Google search. Maybe if someone knows how this happened, they can let me know and we can try to reverse it. I have performed security erase on numerous drives without this happening, so it would be good to know.