Rejuvenate.IT – Old Sofa

Since it was the festive season after celebrations of Christmas 2016 and the New Year of 2017, I thought it was time to tackle one of those jobs that have been sitting around, or laying around in this case.

We have an old wooden sofa with cloth covered foam cushions. Under the cushions is an arrangement of springs and spring wire and with the years (25+) the spring wire has stretched and the springs have loosened (or we have gotten heavier) such that it sags when we sit on it, or is uncomfortable if we lie down on it for having a nap.

I had been meaning to try out the Kreg pockethole system and this was a good opportunity to do this. It would mean removing the spring supports then replacing it with 19mm or 3/4in wooden boards. I had some 240mm width pine boards from a long time ago, and I had a long piece of 235mm width that I kept when our old kitchen was renovated some years ago. This piece was 3.2m long so by calculating the number of boards I would need and the length, I could get 5 pieces from this ex-kitchen board. Then I needed one piece of the pine.

The removal of the springs took a few hours, needing to remove the cloth cover first – unfortunately I didn’t have a photo of this. Then measuring up the gap to determine the board lengths – mostly 580-581mm in length except that the pieces on the sides were not parallel – anyway, I cut them to fit the gap.

dsc_0262

I chose to use the Kreg R3 Kit – because it was reasonably price, at $79, bought a Kreg face clamp for $39 then a box of 1.25in (100) with a coarse thread. Now the boards had to have a chamfer cut on the ends, so I had to adjust the Kreg jig accordingly to my satisfaction. Then drill each board with four pocket holes – clamp then in place and screw them in. Time to cut drill and screw the boards was about 4 hours. Plus removing the spring system in the first place was also about 4 hours. Not bad for a days work.

dsc_0266

Each board sits on a ledge which means that the screws will hold the boards in place without needing to hold the weight. The smaller wooden piece was the spreader that was left over from the spring system and was what I used to determine what board thickness to use. I could have removed it but decided to leave it as a reminder of what it used to be. And the final result?

dsc_0263

Looks like new (almost)! So, the final test is how it feels – very firm – just don’t plomp down on it, you will definitely feel it bottoming out. I think I will add a layer of high density foam to help with the cushioning – but when I try it out for nap – it feels fantastic as in the past, the two wooden supports which were between the cushions got in the way – now it is all flat. This old sofa has been rejuvenated.

RAID.IT – Adaptec Raid Controller, continued – part 3.

This is part 3 of the saga of the Adaptec Raid Controller – ASR-5405 that suddenly decided that my array was no longer there. Due to a motherboard bug on the Gigabyte motherboard of my test machine, four of my 1TB disk drives in the array was somehow configured to be 32MB in size. While researching the problem, I came across this great website that had a utility to fix the problem.

Restoring Factory Hard Drive Capacity

The utility they provide is to restore the hard disk drive factory capacity. It only runs on Windows – which isn’t a problem as such, and eventually I did this, and was able to turn each of my 1TB drives back into a 1TB drive – does that make sense? Or rather, turn my 32MB drives back into the original 1TB drives – yes, much more like it.

dsc_0201

dsc_0202

dsc_0203

dsc_0204

Now, I tried the drives back in the machine, however the Adaptec controller still insists that there is no logical drives found, even though it sees all four 1TB drives – including the one that was failing – I thought I should put that back in. The data isn’t really lost, since it is still sitting on three of the drives that are working. A RAID-5 array can tolerate one disk failure, so I have 3 out of 4 working drives from the array. All I have to do is to determine some parameters about the disk array. Sounds simple?

Most RAID-5 arrays use a distribute parity block, so effectively, we put data blocks on three disks with one disk having a parity block. Then the next three blocks go onto three disks, with a parity block on the other disk, except that it doesn’t go onto the disk that had the last parity block – not sure if I am explaining it properly. Anyway, what I have to do is to determine the block size, then the order of the data disks, and determine where the parity block goes first, then where the parity block goes next, and so on.

Once I work that out, then I can explain the layout a bit better. You will find terms like stripe factor, blocking factor, parity rotation order etc. What it means is that the parity block moves around the disks in a particular order. Last time, I had copied 10MB from each disk into files that I called arraydisk1, arraydisk2, arraydisk3 and arraydisk4. The number refers to the physical connection order on the array controller.

After some examination of the arraydisk files using a hex editor (which in this case is HexEdit), I was able to find some regular data structures in the files that allowed me to work out the size of the block – which was 256KB. Once I know the block size, I can then look at data just before and after the boundary and try to match it up. It is like a jigsaw puzzle – except we are working with data instead of shapes, but same sort of thing.

Last night, I was able to work out to my own satisfaction, that the physical disk order and the data disk order was the same, and that the parity block order was left asymmetric – which is nice and easy to explain. I also found some documentation on the internet that also indicated that Adaptec uses the left asymmetric parity block order.

My logical disk drive is 3TB, so just consider the following. The Adaptec controller writes the first 256KB onto the first disk, then the next 256KB onto the second disk, then another 256KB onto the third disk. A parity block which is comprised of the XOR of the previously written 256KB blocks – the result is written to the fourth disk. So now, we have 256KB of data or parity written to each disk. Now, the left asymmetric method says that the next parity disk will be the third disk. So now the next lot of data after what has already been written will be 256KB to the first disk, 256KB to the second disk, 256KB to the fourth disk and then the parity block generated will be written to the third disk. And so on, the next parity disk is the second disk, after that it is will be the first disk, then back to the fourth.

Got that? Ok, next to do will be to reverse this, since I have the first three disks, I can write my data to a new 3TB disk drive. I have a perl program that I wrote many years ago, just to do this – I just have to tailor it to just this situation. So, how do I do this?

Just imagine that this is what will happen, I will read 256KB from each disk 1, 2 & 3 – the first block of the disks comprise of data, so this will be written to the destination disk. The next 256KB blocks from the disks will be data, data, and parity – so the data blocks are written, then I do an XOR of the data blocks and parity block, and the result will be a data block that I write. So far, I have written six blocks, ok?

The next block from each disk will be data, parity, data – so again, write the data blocks, then XOR everything together, and write that as data – now I have nine blocks. The next block from each disk will be parity, data, data – so now I write the data blocks first, then XOR the blocks to get the new data block that is written to the disk. I now have written twelve blocks. The next block from each disk will be data, data, data – so we are the same as we were at the beginning of the disk – we just write out the data – and continue, ok?

So we keep going and eventually we have read the entire three disks and written 3TB or so of data – which I should be able to connect up and the computer should recognize the drive. Well, that is for another day to do, or maybe on the weekend. Wasn’t I lucky that it was the last disk that failed? Actually it really doesn’t matter which disk has failed, as long as we can determine the order of the drives.

As an example, what if it was the second disk that failed, and we have the first, third and fourth disk available. Since we know which is the parity block, we would know that the first block from each disk is data, data and parity – so as the missing disk is the second disk, we have to write data, XOR, and data – where XOR is the result of XOR on the two data blocks and the parity block. The next block we read would be data, parity, data, so we would write data, XOR, data. The third block, we would be reading data, data, data, so that is what we write as the block from the missing second disk would be parity, which we don’t need. Makes sense? Ok, I am glad it makes sense to someone. See you next time.

RAID.IT – Adaptec Raid Controller, continued.

I finally got a response back from Microsemi. Essentially, my complimentary support ended in 2012 and my warranty ended in 2013, so if I wish to proceed – I would have to pay 80USD or 65 Euro per support incident. Plus no guarantee of resolution.

Ok, so last night, I got the drives including the failed drive, and read 10MB from each one – this is so that I can work out the striping factor and the parity rotation order. It was at that time, that I noticed that my Hitachi 1TB drives were initially seen in my test machine as 32Mb and then the capacity increased to 1TB – this is interesting. It happened on all of the drives which is strange – so I did some further looking on the internet.

It turns out that there have been quite a few incidents of this happening to different manufacturer drives – with a seemingly common factor, a Gigabyte motherboard. I checked my test machine and certainly, this one is Gigabyte. So now I have a possible cause, my drives were checked out on my test machine, which is a Gigabyte. It seems that due to a motherboard bug, something might have been written to the disk, that somehow makes it think it is 32MB instead of 1TB.

Ok, more research now, to see about fixing this problem – and I guess I need to change motherboards on my test machine.

 

Raid.IT – Adaptec Raid Controller ASR-5405 sees 1TB disks as 32MB, what?

Ok, what is this about? Oh yes, you probably have read previously that I do some data recovery from disk drives. Just recently I received a laptop to do some recovery of deleted photos. Usually the first thing that I do is to make a raw image of the disk. I do this on a Ubuntu linux machine onto my network storage, however in this case – my network storage was a little full, and could not handle another 500GB. So I then used another machine that is set up with an Adaptec Array Controller – the ASR-5405, with four 1TB disk drives configured as Raid-5, which gives me a usable 3TB or so.

I connected up the laptop disk and duly made a copy of the disk. It was getting late, so the machine was turned off. When I turned it on again, the array controller started beeping, very loudly – which generally signifies a failed disk, so it didn’t make sense to continue with it until the disk failure is resolved. Once powered off, I removed the disks one by one and connected it to my other test machine. I run ‘smartctl -a’ commands on each disk drive and eventually found that the last one had SMART errors that indicated that it was failing. I replaced this disk drive with another 1TB disk that I had on hand.

To my surprise, when I powered on, and let it boot up – I could find my Data disk which should be 2TB. Neither was my Temp disk visible which was the remaining 1TB or so. No beeps on powering up, so what gives?  I powered down, then this time I watched it boot up – this is what I saw…

DSC_0198

This doesn’t make sense, my 1TB disks are now seen as 31MB. That is why my logical drive was missing – it thought it did not have capacity for my array. Now what? I did a Google search of the internet but did not find anything like this happening – so why did it happen to me? Don’t know, then I decided to contact Adaptec, which by this time had been bought out by Microsemi. I opened a ticket with Microsemi, in which I needed to supply my Tsid number, which is some number that shows that you are a valid owner of this controller.

Later I got an automated reply saying that I need to create a support archive and upload it. For Ubuntu, I managed to run the Storage Manager GUI but when I tried to connect to the controller, I got a Java exception error which crashed the GUI – bummer. Back to Google then and Adaptec’s support page and found out how to get the support archive by running the command line utility ‘Arcconf’.  For my version of the software, the command

arcconf savesupportarchive

would generate the support archive, then I could zip it up and upload it to them. That is where I am now, so I will wait and see what happens because it is a public holiday in the US right now.

P.S. I could recover the data from the disks, since the disks are still intact – I have three of them which is the minimum needed, but I don’t quite have enough storage. How would I do this, firstly by making a raw image of each disk. The beginning of each disk should contain information that Adaptec uses to determine the disk and array configuration, since it will know the position of each disk in the array. Then the data will be striped across the four disks (of which I have three) with a block size that I would have to determine, and the striping factor, which is the way the parity block is distributed. Once that is determined, it is a simple matter of running a little perl script that I had written once before, to generate the array as a single file.

With this array, which is like a raw 3TB disk – I could copy this to a 3TB disk which could possibly be usable straight away and recognized by Ubuntu. Hence, in order to do this I need a minimum of 3TB for the raw data, and 3TB for the final array – 6TB in total, or two 3TB drives – which I do happen to have on hand. I might have to do this, but let’s give Microsemi a chance to come back because maybe they can tell me to run some magical command that will let the controller recognize the disks for the size that they actually are, and then I can continue my work.

Recover.IT – HP EX490 MediaSmart Server – Part 3

Ok, where was I?  Yes, the faulty disk in my MediaSmart Server – eventually the removal process came to an end where there was only a few files left over – about 10 – which was old and unnecessary – I tried to delete them, but each time I deleted them, they came back. So, I had to log onto the server through the Windows console, and then run a chkdsk command on the disk. I had to run it twice and after that, was able to do the removal again, and finally the disk was blinking – indicating that it was ready for removal.

As this disk had some bad sectors in one area, sometimes we can do a security erase on the disk. The security erase is an internal function of the disk drive firmware and we have to go through a little process to do this. The security erase will effectively perform a factory format of the disk surface which in general should rewrite the entire data surface and we should (theoretically) end up with no bad sectors.

ScreenShot074

The first hdparm command is to interrogate the disk, and writes the output to a file, we need to check that it isn’t frozen. The second hdparm command sets a master password which I have called llformat (meaning low level format), then we check the drive again, to confirm that it is enabled for erasing – which was confirmed.

The final hdparm command tells the drive to commence a security erase, which would take approximately 322 minutes, so it was time to leave it and let it run and check it in the morning, which I did. Afterwards, I ran a diagnostic on the drive, but had some strange error – it seems that the drive now thinks that it is a 1TB drive and not a 2TB drive. I checked the hdparm output from the initial command.

ScreenShot072

Definitely it shows that it is 2TB (2000GB) as indicated by the device size with M = 1000*1000 – i.e. M = 1 million bytes.  The hdparm command I ran after the erase had finished is shown here.

ScreenShot073

Definitely, here again, it says it is a 1TB = 1000GB drive – what is going on? The serial number is the same, so I am not dreaming – and definitely, it thinks it is 1TB. I also ran a smartctl command to check the status of the SMART data on the drive.

ScreenShot071

It shows that parameter 184 End-to-End_Error is FAILING_NOW, so basically the drive is failing – so should not be used for anything critical as it could stop working at any time. A pity because a 2TB drive could still be handy to play around with, but now it is 1TB.

I performed an erase again, and it seems that it still is a 1TB afterwards, so definitely there is a problem somewhere – maybe in the firmware. Suspiciously, there is nothing like this when I do a Google search. Maybe if someone knows how this happened, they can let me know and we can try to reverse it. I have performed security erase on numerous drives without this happening, so it would be good to know.

 

Recover.IT – HP EX490 MediaSmart Server – Part 2

This is part 2 of the recovery of the HP EX490 MediaSmart Server – which is a Windows Home Server machine. The second drive on this server was seen to be offline, so I had shut down this server to investigate the problem.

Last Saturday, I ran some tests on the drive as previously mentioned. On Wednesday night, I decided to copy the disk to a new 3TB disk that I had just bought – a Toshiba 3TB drive with 3 years warranty for $127 each from a local computer shop. I thought that this was a good price.

Anyway, as you might have guessed – I connected this disk on a Linux machine. The Linux in this case was Ubuntu. I used the dd command (that I have previously mentioned) to copy raw data from the disk directly to the new disk.

dd if=/dev/sdb of=/dev/sdc conv=noerror,sync 2>&1 | tee -a ./logfile.txt

Now, of course – I did a smartctl -a /dev/sdb first to check the source disk, and then another one – smartctl -a /dev/sdc to confirm the destination disk. The source disk is a Seagate – correct, and the destination disk is a Toshiba – also correct, so I was good to go. It is good to check, and don’t assume that because the Seagate is connected to SATA0 and the Toshiba is connected to SATA1 that the disk designations will be in the right order.

Ok, so on Wednesday, the copy was started – then I went back to the machine some time later to check on its progress and I see these errors on the display.

dd: error reading ‘/dev/sdb’: Input/output error
6364552+0 records in
6364552+0 records out
3258650624 bytes (3.3 GB) copied, 346.982 s, 9.4 MB/s
dd: error reading ‘/dev/sdb’: Input/output error
6364552+1 records in
6364553+0 records out

6,364,552 sectors were read and copied before an error occurred. The noerror parameter means that it will continue, and sync means that the unreadable sector will be replaced on the destination with a blank sector. I stopped the copy at that time, since it is not a good idea to keep trying to read bad sectors in case the drive decides to quit permanently.

Then last night, I decided to copy from a point after this sector. This time I used this command line and let it run overnight after it seemed to start without throwing up any errors.

dd if=/dev/sdb of=/dev/sdc conv=noerror,sync bs=1M skip=4000 seek=4000 2>&1 | tee -a ./logfile.txt
1903728+1 records in
1903729+0 records out
1996204539904 bytes (2.0 TB) copied, 24148.3 s, 82.7 MB/s

For that command, I set a block size (bs) of 1MB, then used the skip and seek parameters to begin at a point 4000MB into the drive, on both the source and the destination. I checked this morning when I woke up, and found that it had completed successfully – the time taken for the copy works out to about 6.7 hours.

This evening, I also bought a Toshiba 2TB disk drive on my way home – I will talk about this later on. Ok, so I had copied about 3.3GB on Wednesday before it hit the bad sectors. Last night – I started the copy at 4GB or thereabouts onwards and it copied to the end. Now I did a few more copying commands – I won’t bore you with all of the details however the result was to copy the remaining good sectors, using the count parameter to specify how many blocks to copy.

Eventually, I had copied every sector that was able to be copied. It turns out that sectors 6,364,553 to 6,364,568 – 16 of them was unable to be read, not too bad. I also copied a couple of blocks before and after the bad sectors and had a look at the data – it seems to be file information, most likely parts of the Master File Table – which means that a few files are potentially lost.

Ok, this is where my new 2TB drive comes in. I put the faulty Seagate drive back into the EX490, and then added the new Toshiba drive into the top-most bay. After powering up the MediaSmart Server, and waiting – I was eventually shown two solid green lights – which means that the Seagate drive is now online together with the main WD drive, and one blinking green light which was the new Toshiba drive. I logged onto my Windows Home Server console and went into Server Storage and proceeded to add the new drive.

Screenshot 2016-08-05 19.45.01

The idea is to add the new Toshiba drive, so that WHS knows that it is available for storage, and then tell WHS that I want to remove the Seagate drive.

Screenshot 2016-08-05 19.45.51

You might ask, why am I doing this? The drive has bad sectors – it isn’t a good idea to keep using it. Also WHS allows me to remove this disk – by moving and redistributing the files on the disk to other available disks, like the new one that I just added.

Screenshot 2016-08-05 19.46.11

Great, it says that I have sufficient storage space to have this drive removed.

Screenshot 2016-08-05 21.42.40

Ok, I am not actually going to sit here and wait for it, but eventually it will (hopefully) tell me that the drive is ready to be removed. Depending on how full the disk drive was, it can definitely take many hours. Windows Home Server is actually really good, because most storage systems don’t allow you to remove disk drives once they had been used for storing data.

What about the 3TB drive, you are thinking? That is for insurance – in case the disk stops working during the removal, then I have a copy of it that I can use to copy files from. If this removal works successfully, then my 3TB drive can be retasked. By the way, Windows Home Server cannot use disk drives larger than 2TB without major surgery. The reason for this is that WHS uses partitioning based on the Master Boot Record. In order to use drives larger than 2TB, it is necessary to use GPT partitioning – but that is another story.

What about the 16 bad sectors on this Seagate drive? Once I take it out, I plan to do a factory erase on the Seagate drive – this should rewrite every sector on the disk, including the bad ones and I should end up with a disk drive without bad sectors. I can then use it it either for temporary storage of non-critical data or run lots of diagnostics on it to see if it is continuing to fail. If it holds up to the diagnostics, maybe it gets a second chance on life.

In the meantime, I am off to bed!

Recover.IT – HP EX490 MediaSmart Server

Yesterday, I noticed that the Windows Home Server icon in my taskbar was red.  I opened it up and saw some file conflicts – that is strange.  I could access the files in the server, so what is going on – then the penny dropped, it says that a disk drive is missing. I went out to the computer area and could see only one disk was lit up, the second one is not lit – meaning that it is offline. I went back to the console and shut down the server – which eventually it did, albeit slowly because it had stopped responding for a long time before I could hit the Shutdown button.

DSC_0098

Some of you may have heard about Windows Home Server, many probably haven’t. WHS was a great product for its time – a semi-redundant network storage device that could be packaged like a NAS. I bought this HP EX490 MediaSmart Server back when it was available in 2009. That is the box on the right in the photo above, ok – a little dusty even though it sets on a shelf 2m above the floor.  It came with a single Seagate 1TB disk drive, and over the next few years went to 4x1TB drives, then eventually to 2x2TB drives. The files can be stored in folders that are shared out – and each folder/share can be configured to be redundant or not.

Ok – back to the problem at hand, one of the two drives – the Seagate 2TB had apparently stopped working.  After it had shutdown, I pulled out the second drive and connected it to my test/recovery machine. This second drive was able to spin up, and I ran a few commands on it, to determine what the issue with the drive was and then shut down. I didn’t want to keep the drive running until I had a way to copy its contents – having temporarily run out of disk storage space recently.

One of the commands that I run is “smartctl -a /dev/sdb” which on Linux will check the display the SMART data from the disk drive which is physically connected as /dev/sdb. The interesting things I am looking for are the Reallocated Sector Count and if any of the SMART attributes show that the drive has failed. None of them did and the Reallocated Sector Count was 14760 which is a little high – but this can be normal for the drive. The Power On Hours was 34,235 which equates to nearly 4 years – the drive itself is 5 years old. If I hadn’t used the drive straight away – this might be ok.

Of course, there were other values to be considered. Attribute 187 – Reported Uncorrectable was 0, 188 Command Timeout was 1, 197 Current Pending Sector Count was 216 and 198 Offline Uncorrectable Sector Count was also 216. Now – these last two are concerning – generally a non-zero number on these can indicate that the drive is having issues, and we should plan to replace it.

Smartctl also reports SMART errors that the drive has recorded – the main one occurred at 34,227 hours – like 8 hours before I noticed the problem and shut it down. This was error 8170 – WP at LBA = 0x00611d8f = 6364559 – this probably means that it couldn’t access this particular sector – which is a concern. What I need to do now, is to obtain or get a spare disk of at least 2TB and make a disk to disk copy of it – in order to ensure that my data is copied. I have a few 3TB disks lying around – maybe I can free one up for a little while. I think I will do that during the week.

Remember that I mentioned that we can specify some folders or shares to be redundant – meaning that the contents of those folders have copies that reside on the other disk? Well – not all folders were marked to be redundant, so if any of those folders reside on this particular disk might well be inaccessible. Fortunately, Windows Home Server creates a NTFS file system on each drive, so these drives can be connected to any Windows machine and be accessible – unlike some versions of RAID which can mean that the data is striped across each disk.

The other thing I want to think about is – what I would replace this WHS with. I currently run a virtual Freenas on ESXi server – but I was thinking about building a new standalone network storage appliance. Freenas is great if we can get the right hardware – such as ECC memory, a CPU and motherboard that supports ECC memory – and run ZFS but then I was reading about issues on ZFS – which caused me to look at what other people are using.

I could stay with Linux and run something like MergerFS and SnapRaid or I could go the Windows way – with Storage Spaces which is looking very tempting, except I don’t have a spare Windows 10 machine to play with – since the Free Upgrade from Windows 7/8.1 was over a couple of days ago. Decisions, decisions…