Rectify.IT – Fujitsu Lifebook P8110 Scroll Lock flashing

On Saturday, while doing a few things around the house, amongst other things – I was checking to see whether or not I could get a second drive caddy for my Fujitsu Lifebook P8110 notebook.  I was trying to remove the DVD drive and at the same time, decided to take all of the covers off the bottom of the notebook.  One cover hid a mini PCIe socket which I believe is for an optional wireless card – or perhaps even a small SSD.  Another slightly larger cover hid the memory socket, in which was installed a Kingston 4GB DDR3 1333 Sodimm, and of course the much larger cover was for the internal hard disk drive.

In due course, I put everything back together and put it back into my backpack since I use it for work as a Windows 10 machine from time to time.  To my surprise this morning, it failed to power up – well, actually the power light came on, the disk light came on then the Scroll Lock light started blinking.  I could still hear the hard disk drive spinning.  There was no display at all, not even the Bios POST screen came on.  I held the power button to force it to power off.  I did this a few times to confirm that I was not imagining it and eventually put it back into my backpack and went on with my tasks using my work laptop.

After coming home, and watching a short movie, I got my Lifebook back out and tried it again – the same flashing Scroll Lock light.  A quick check on the internet showed no solutions however one site did say that it may be power related.  I eventually got the battery removed, and connected the power adapter and still the same.  One site did suggest memory – and yes, I did remove the memory module on Saturday, so could this be it?

I opened the cover and removed the memory – a 4GB module as described above.  I had the laptop screen down and the main body up – i.e. the laptop was open so that I could reach the power button and see the screen but could also access the memory slot.  I pressed the power button, and the Lifebook came to life, as in the Bios screen came up then proceeded to boot to Windows.  I powered off and went in search for some memory.  I had another Kingston 4GB so tried that – no go.  Then I found two new Kingmax 4GB Sodimm’s and tried one – and yes, it worked.  Afterwards I decided to try the original memory – the first Kingston 4GB and while putting it in, I latched and unlatched it a few times since it could be just a contact problem.

So did it work, I can hear you all asking?

Yes, the notebook booted up!  I shut it down, then put the cover back on, then put it right side up and powered on.  Still working, so it appears that the flashing Scroll Lock light is indicating a memory problem.  The motherboard has 2GB of inbuilt memory, so the notebook will boot from this, and my 4GB brought it up to 6GB – more memory of course, is better for Windows 10.  At least I know what to do if this happens again.

Advertisements

Reheat.IT – Dell XPS M1530 laptop

This is about the Dell XPS M1530 laptop that I had worked on at length back in 2015.  At the time, I had heated the graphics chip to a high temperature in order to get it back in operation.  The laptop had failed again and was not powering up except for a couple of lights – so the same failure condition like last time.

Now, we cannot keep reflowing the graphics chip since the heat used may also damage other components, and if the problem continued to occur, then it meant that eventually, the graphics chip would need replacement. After some further research on this problem, I came across an article that talked about the failure mechanism.  It appears that there is some conjecture about why it fails, and most of the talk that it fails because the contacts where it attaches to the motherboard become disconnected.

Ok, what are we talking about, you might ask?  A typical modern graphics chip is housed in what is called a BGA package.  BGA stands for ball grid array, meaning that the chip instead of having lots of pins sticking out the bottom like a lot of the older CPU’s, has instead, little solder balls on the bottom in an array or matrix.  Most sites say that they need to reball the graphics chip – meaning to remove the GPU from the motherboard, clean off all of the solder balls and then put flux and new solder balls on it and then reheat it so that the balls are melted and firmly attached, then place back on the motherboard and reheat it so that the solder balls melt (flow) and join to the contacts on the motherboard.

How do we get all the tiny solder balls onto the chip?  We use a stencil is how we do this – the stencil is made from stainless steel and looks like this.

DSC_0298.JPG

This is one for an ATI 215-0719090 GPU that has 0.5mm ball.  There is a small bottle of 25,000 solder balls and a coin for comparison.  However, the Dell XPS M1530 laptop has a nVidia 6800M graphics chip and I don’t have a stencil for it.

Let’s get back to the failure mechanism – a number of other sites say that reballing the chip doesn’t really fix the problem, since it will happen again.  Some also say that reflowing the chip also doesn’t fix the problem and that the solution is to replace the graphics chip.

Now, a number of sites also talked about wrapping the laptop in a towel and let it run until it overheats and effectively reflowing the chip.  Unfortunately, the temperature would never get high enough to reflow since the battery would have exploded by then, so why does it seem to work and to fix the problem, albeit temporarily.

It seems that further research is required and I may have found some.  If we consider how the graphics chip is put together, lots of actual graphics processors are manufactured on a silicon wafer which is then cut up to individual pieces which is called a die.  This is the part on the top of a chip which is the part that touches the heatsink.  On one side of the die, there are lots of contacts, so there are either small leads attached or we might also have little tiny solder balls.  These solder balls connect to the substrate which is the greenish material and then the die is coated all around with some sort of filler material which is meant to keep the die in place and to help protect it against the sort of thermal and mechanical stress that will occur.  The substrate then is where this reballing happens as mentioned above.  This is of course a very simplified description of the BGA chip and can be very confusing especially when the die is also called a flip chip, since it is upside down, i.e. flipped…

So where were we?  Ok, there is some disagreement about why these chips fail. Either the chip is detached from the motherboard or the die is detached from the substrate.  Any one of these hundreds of connections failing would cause the failure.  If the failure was on the die itself, apparently this could be confirmed by heating the die itself – apparently to about 130-150 degrees Celsius is sufficient to get the connections working again.  Now this temperature is well below the melting point of leaded and non-leaded solder so what does this do?

I thought that I would try it out, since I had nothing to lose – I used a hot air rework station to heat the top of the chip at around 100 degrees for 5 minutes or so, and checked the laptop afterwards.  Nothing happened, so tried the other laptop since I have two of them here.  This time I went to 130 degrees and that laptop after this process did power up enough to show that the screen display was very inconsistent – with a lot of breakage and flickering.

dsc_0300

I went back to the first laptop and this time heated at 140 degrees.  I measure the temperature using a temperature probe.  This time, I had success – the laptop afterwards would power up and I could log in and ran some diagnostics – at the same time as using a monitor program to show the temperature of the graphics chip.  The diagnostics ran fine and the maximum temperature reached with the graphics chip was about 75 degrees – ok, fantastic!

Alas, it was not to be – after shutting down, the laptop would not power up.  Ok, so try again by heating to 150 degrees.  I did this and sure enough the laptop came on as normal, and with a notebook cooler attached – blowing air onto the bottom of the laptop.  It has now been a week, with powering on and leaving it running for a few hours then shutting down and checking again on the next day.  As of today, it is still working, and I think that it could go back to its owner finally.

So, how do we explain this – the temperature is not hot enough to melt solder as such, but maybe by doing this we are equalizing thermal stresses on the die to substrate connections.  This could be explained that some of the connections are not actually properly soldered (we call these dry joints or cold joints), but do get into contact when the die is in the right place.  Meaning of course that the chip is faulty and by heating it up, we get it working again.  So cross your fingers, and hope that the owner is happy and is working again with his laptop for some time to come.

Repair.IT – Asus Taichi 21 Notebook

This follows on from my data recovery of the D: drive of the Asus Taichi 21 notebook.  Actually, it wasn’t really so much data recovery as just copying files and folders from the SSD drive once I had it mounted, but this is about the repair of the notebook.  After I returned the notebook and the data, I was told that the owner would order a replacement motherboard and let me know when it came in so I could then fix the notebook.

In due course (a week and a half later), the motherboard arrived from the US and I got the notebook back. By the way, I didn’t mention about opening the case – there are 10 little Torx screws to be removed and then the two plastic feet near the hinges can be removed to uncover two more Phillips screws.  Then I disconnected and removed the battery, and then removed the heatsink/fan assembly.

dsc_0292

Once the heatsink came off, I could see that the cpu was covered with excessive amounts of thermal interface material – actually only the top of the cpu that contacts the heatsink needs the thermal interface material.  Then it was a matter of disconnecting and removing the wireless card, and the other connectors – then put in the replacement motherboard and reconnect everything.  For the heatsink, I used Arctic Silver thermal material to cover the top of the cpu as a thin film, then put the heatsink on top, jiggled it around a little, then screwed it on firmly.  The last thing was to install and connect the battery.

While powering on, I did notice that occasionally the screen would flicker but it stayed on most of the time, and when I closed the lid, the back screen came on as expected, so that was that – or was it?

The notebook went back to its owner the next day, and all was well – there was the occasion that the screen did not light up but after updating drivers, all appeared to be well.  At least until the owner tried to connect a couple of external monitors and somehow there was no display anymore.

I got the notebook back and I thought it would be strange if the motherboard was faulty again – but it is possible since they might only test it for a short time.  After some examination and reconnecting of the two screen cables, I found that one of the the connectors might have been a little dodgy, so I had to unplug it, then plug it in, unplug it and do this a few times – each time ensuring that it was lined up and would click back in securely.  This seemed to fix it and was able to get a working screen consistently and told the owner what I had done in case it happened again.

That was a couple of weeks ago and nothing has been heard of it since, which I guess is good news.

Recover.IT – Asus Taichi 21 Notebook

I haven’t been writing much lately so it is time to get a few out of the way.  Some weeks ago, I was asked about an Asus Taichi 21 Notebook that had suddenly stopped working.  The notebook is one that has a dual screen, open up normally as a notebook, close the lid and the back screen comes up as a tablet.  Neither screen was operating and it had been sent to Asus to look at.  I suggested that I should be able to get his data off the notebook as Asus would not provide this service.  Eventually a quotation was received which was quite high – you could buy a second hand Asus Taichi 21 on eBay for much less than the quote, so eventually it came to me to look at and get some very important files from it.

On inspection, the notebook as an internal SSD which at first glance looks like a normal mSata or M.2 SSD however on closer inspection – it is quite different.  Further research indicated that there were adapters available that would convert this SSD to standard Sata – and I was fortunate enough to find a local Sydney supplier that had one of these in stock for $20 or so.  I ordered one, and when it was ready – went for a  short drive to pick it up.  Now the adapter looked like it wasn’t the right one, but they assured me that it would work.  The socket is much larger and is not quite the same as the socket on the motherboard, so after some further research, I decided that it should work.  Of course, this can be a risk that could destroy much wanted data – but there were no indications on the internet that these adapters posed a problem.

dsc_0285

This shows the adapter with the SSD installed.  Note the size of the socket.  This adapter is used for the Asus Taichi and UX21/31 notebooks.  See below, for a photo of the motherboard with its socket.

dsc_0279

You can clearly see the difference as the motherboard socket has 6 and 12 pins, but the adapter socket has many more pins.  Anyway, I connected the adapter to my recovery machine, and it was recognized by the Bios and by my Ubuntu operating system.  I went to mount the disk, but it complained that the partition had not been cleanly dismounted.

No real problem, the way to get around this is to mount it as read-only which will ignore the dirty bits as I only want to copy data from it.  After doing an “fdisk -l”  to list the partitions, I eventually used the “mount -t ntfs -o ro” command to mount the partition and then was able to copy the required data to any external usb disk.  The D: drive folders and contents which is what I copied – as this was what was required.

After that, I reassembled the notebook and that was that, or was it?  A quick search of the internet showed that the motherboard “60-NTFMB1102-D07” was available for a few hundred dollars which would likely fix the notebook, but that is another story.

Rejuvenate.IT – Old Sofa

Since it was the festive season after celebrations of Christmas 2016 and the New Year of 2017, I thought it was time to tackle one of those jobs that have been sitting around, or laying around in this case.

We have an old wooden sofa with cloth covered foam cushions. Under the cushions is an arrangement of springs and spring wire and with the years (25+) the spring wire has stretched and the springs have loosened (or we have gotten heavier) such that it sags when we sit on it, or is uncomfortable if we lie down on it for having a nap.

I had been meaning to try out the Kreg pockethole system and this was a good opportunity to do this. It would mean removing the spring supports then replacing it with 19mm or 3/4in wooden boards. I had some 240mm width pine boards from a long time ago, and I had a long piece of 235mm width that I kept when our old kitchen was renovated some years ago. This piece was 3.2m long so by calculating the number of boards I would need and the length, I could get 5 pieces from this ex-kitchen board. Then I needed one piece of the pine.

The removal of the springs took a few hours, needing to remove the cloth cover first – unfortunately I didn’t have a photo of this. Then measuring up the gap to determine the board lengths – mostly 580-581mm in length except that the pieces on the sides were not parallel – anyway, I cut them to fit the gap.

dsc_0262

I chose to use the Kreg R3 Kit – because it was reasonably price, at $79, bought a Kreg face clamp for $39 then a box of 1.25in (100) with a coarse thread. Now the boards had to have a chamfer cut on the ends, so I had to adjust the Kreg jig accordingly to my satisfaction. Then drill each board with four pocket holes – clamp then in place and screw them in. Time to cut drill and screw the boards was about 4 hours. Plus removing the spring system in the first place was also about 4 hours. Not bad for a days work.

dsc_0266

Each board sits on a ledge which means that the screws will hold the boards in place without needing to hold the weight. The smaller wooden piece was the spreader that was left over from the spring system and was what I used to determine what board thickness to use. I could have removed it but decided to leave it as a reminder of what it used to be. And the final result?

dsc_0263

Looks like new (almost)! So, the final test is how it feels – very firm – just don’t plomp down on it, you will definitely feel it bottoming out. I think I will add a layer of high density foam to help with the cushioning – but when I try it out for nap – it feels fantastic as in the past, the two wooden supports which were between the cushions got in the way – now it is all flat. This old sofa has been rejuvenated.

RAID.IT – Adaptec Raid Controller, continued – part 3.

This is part 3 of the saga of the Adaptec Raid Controller – ASR-5405 that suddenly decided that my array was no longer there. Due to a motherboard bug on the Gigabyte motherboard of my test machine, four of my 1TB disk drives in the array was somehow configured to be 32MB in size. While researching the problem, I came across this great website that had a utility to fix the problem.

Restoring Factory Hard Drive Capacity

The utility they provide is to restore the hard disk drive factory capacity. It only runs on Windows – which isn’t a problem as such, and eventually I did this, and was able to turn each of my 1TB drives back into a 1TB drive – does that make sense? Or rather, turn my 32MB drives back into the original 1TB drives – yes, much more like it.

dsc_0201

dsc_0202

dsc_0203

dsc_0204

Now, I tried the drives back in the machine, however the Adaptec controller still insists that there is no logical drives found, even though it sees all four 1TB drives – including the one that was failing – I thought I should put that back in. The data isn’t really lost, since it is still sitting on three of the drives that are working. A RAID-5 array can tolerate one disk failure, so I have 3 out of 4 working drives from the array. All I have to do is to determine some parameters about the disk array. Sounds simple?

Most RAID-5 arrays use a distribute parity block, so effectively, we put data blocks on three disks with one disk having a parity block. Then the next three blocks go onto three disks, with a parity block on the other disk, except that it doesn’t go onto the disk that had the last parity block – not sure if I am explaining it properly. Anyway, what I have to do is to determine the block size, then the order of the data disks, and determine where the parity block goes first, then where the parity block goes next, and so on.

Once I work that out, then I can explain the layout a bit better. You will find terms like stripe factor, blocking factor, parity rotation order etc. What it means is that the parity block moves around the disks in a particular order. Last time, I had copied 10MB from each disk into files that I called arraydisk1, arraydisk2, arraydisk3 and arraydisk4. The number refers to the physical connection order on the array controller.

After some examination of the arraydisk files using a hex editor (which in this case is HexEdit), I was able to find some regular data structures in the files that allowed me to work out the size of the block – which was 256KB. Once I know the block size, I can then look at data just before and after the boundary and try to match it up. It is like a jigsaw puzzle – except we are working with data instead of shapes, but same sort of thing.

Last night, I was able to work out to my own satisfaction, that the physical disk order and the data disk order was the same, and that the parity block order was left asymmetric – which is nice and easy to explain. I also found some documentation on the internet that also indicated that Adaptec uses the left asymmetric parity block order.

My logical disk drive is 3TB, so just consider the following. The Adaptec controller writes the first 256KB onto the first disk, then the next 256KB onto the second disk, then another 256KB onto the third disk. A parity block which is comprised of the XOR of the previously written 256KB blocks – the result is written to the fourth disk. So now, we have 256KB of data or parity written to each disk. Now, the left asymmetric method says that the next parity disk will be the third disk. So now the next lot of data after what has already been written will be 256KB to the first disk, 256KB to the second disk, 256KB to the fourth disk and then the parity block generated will be written to the third disk. And so on, the next parity disk is the second disk, after that it is will be the first disk, then back to the fourth.

Got that? Ok, next to do will be to reverse this, since I have the first three disks, I can write my data to a new 3TB disk drive. I have a perl program that I wrote many years ago, just to do this – I just have to tailor it to just this situation. So, how do I do this?

Just imagine that this is what will happen, I will read 256KB from each disk 1, 2 & 3 – the first block of the disks comprise of data, so this will be written to the destination disk. The next 256KB blocks from the disks will be data, data, and parity – so the data blocks are written, then I do an XOR of the data blocks and parity block, and the result will be a data block that I write. So far, I have written six blocks, ok?

The next block from each disk will be data, parity, data – so again, write the data blocks, then XOR everything together, and write that as data – now I have nine blocks. The next block from each disk will be parity, data, data – so now I write the data blocks first, then XOR the blocks to get the new data block that is written to the disk. I now have written twelve blocks. The next block from each disk will be data, data, data – so we are the same as we were at the beginning of the disk – we just write out the data – and continue, ok?

So we keep going and eventually we have read the entire three disks and written 3TB or so of data – which I should be able to connect up and the computer should recognize the drive. Well, that is for another day to do, or maybe on the weekend. Wasn’t I lucky that it was the last disk that failed? Actually it really doesn’t matter which disk has failed, as long as we can determine the order of the drives.

As an example, what if it was the second disk that failed, and we have the first, third and fourth disk available. Since we know which is the parity block, we would know that the first block from each disk is data, data and parity – so as the missing disk is the second disk, we have to write data, XOR, and data – where XOR is the result of XOR on the two data blocks and the parity block. The next block we read would be data, parity, data, so we would write data, XOR, data. The third block, we would be reading data, data, data, so that is what we write as the block from the missing second disk would be parity, which we don’t need. Makes sense? Ok, I am glad it makes sense to someone. See you next time.

RAID.IT – Adaptec Raid Controller, continued.

I finally got a response back from Microsemi. Essentially, my complimentary support ended in 2012 and my warranty ended in 2013, so if I wish to proceed – I would have to pay 80USD or 65 Euro per support incident. Plus no guarantee of resolution.

Ok, so last night, I got the drives including the failed drive, and read 10MB from each one – this is so that I can work out the striping factor and the parity rotation order. It was at that time, that I noticed that my Hitachi 1TB drives were initially seen in my test machine as 32Mb and then the capacity increased to 1TB – this is interesting. It happened on all of the drives which is strange – so I did some further looking on the internet.

It turns out that there have been quite a few incidents of this happening to different manufacturer drives – with a seemingly common factor, a Gigabyte motherboard. I checked my test machine and certainly, this one is Gigabyte. So now I have a possible cause, my drives were checked out on my test machine, which is a Gigabyte. It seems that due to a motherboard bug, something might have been written to the disk, that somehow makes it think it is 32MB instead of 1TB.

Ok, more research now, to see about fixing this problem – and I guess I need to change motherboards on my test machine.