Reheat.IT – Dell XPS M1530 laptop

This is about the Dell XPS M1530 laptop that I had worked on at length back in 2015.  At the time, I had heated the graphics chip to a high temperature in order to get it back in operation.  The laptop had failed again and was not powering up except for a couple of lights – so the same failure condition like last time.

Now, we cannot keep reflowing the graphics chip since the heat used may also damage other components, and if the problem continued to occur, then it meant that eventually, the graphics chip would need replacement. After some further research on this problem, I came across an article that talked about the failure mechanism.  It appears that there is some conjecture about why it fails, and most of the talk that it fails because the contacts where it attaches to the motherboard become disconnected.

Ok, what are we talking about, you might ask?  A typical modern graphics chip is housed in what is called a BGA package.  BGA stands for ball grid array, meaning that the chip instead of having lots of pins sticking out the bottom like a lot of the older CPU’s, has instead, little solder balls on the bottom in an array or matrix.  Most sites say that they need to reball the graphics chip – meaning to remove the GPU from the motherboard, clean off all of the solder balls and then put flux and new solder balls on it and then reheat it so that the balls are melted and firmly attached, then place back on the motherboard and reheat it so that the solder balls melt (flow) and join to the contacts on the motherboard.

How do we get all the tiny solder balls onto the chip?  We use a stencil is how we do this – the stencil is made from stainless steel and looks like this.

DSC_0298.JPG

This is one for an ATI 215-0719090 GPU that has 0.5mm ball.  There is a small bottle of 25,000 solder balls and a coin for comparison.  However, the Dell XPS M1530 laptop has a nVidia 6800M graphics chip and I don’t have a stencil for it.

Let’s get back to the failure mechanism – a number of other sites say that reballing the chip doesn’t really fix the problem, since it will happen again.  Some also say that reflowing the chip also doesn’t fix the problem and that the solution is to replace the graphics chip.

Now, a number of sites also talked about wrapping the laptop in a towel and let it run until it overheats and effectively reflowing the chip.  Unfortunately, the temperature would never get high enough to reflow since the battery would have exploded by then, so why does it seem to work and to fix the problem, albeit temporarily.

It seems that further research is required and I may have found some.  If we consider how the graphics chip is put together, lots of actual graphics processors are manufactured on a silicon wafer which is then cut up to individual pieces which is called a die.  This is the part on the top of a chip which is the part that touches the heatsink.  On one side of the die, there are lots of contacts, so there are either small leads attached or we might also have little tiny solder balls.  These solder balls connect to the substrate which is the greenish material and then the die is coated all around with some sort of filler material which is meant to keep the die in place and to help protect it against the sort of thermal and mechanical stress that will occur.  The substrate then is where this reballing happens as mentioned above.  This is of course a very simplified description of the BGA chip and can be very confusing especially when the die is also called a flip chip, since it is upside down, i.e. flipped…

So where were we?  Ok, there is some disagreement about why these chips fail. Either the chip is detached from the motherboard or the die is detached from the substrate.  Any one of these hundreds of connections failing would cause the failure.  If the failure was on the die itself, apparently this could be confirmed by heating the die itself – apparently to about 130-150 degrees Celsius is sufficient to get the connections working again.  Now this temperature is well below the melting point of leaded and non-leaded solder so what does this do?

I thought that I would try it out, since I had nothing to lose – I used a hot air rework station to heat the top of the chip at around 100 degrees for 5 minutes or so, and checked the laptop afterwards.  Nothing happened, so tried the other laptop since I have two of them here.  This time I went to 130 degrees and that laptop after this process did power up enough to show that the screen display was very inconsistent – with a lot of breakage and flickering.

dsc_0300

I went back to the first laptop and this time heated at 140 degrees.  I measure the temperature using a temperature probe.  This time, I had success – the laptop afterwards would power up and I could log in and ran some diagnostics – at the same time as using a monitor program to show the temperature of the graphics chip.  The diagnostics ran fine and the maximum temperature reached with the graphics chip was about 75 degrees – ok, fantastic!

Alas, it was not to be – after shutting down, the laptop would not power up.  Ok, so try again by heating to 150 degrees.  I did this and sure enough the laptop came on as normal, and with a notebook cooler attached – blowing air onto the bottom of the laptop.  It has now been a week, with powering on and leaving it running for a few hours then shutting down and checking again on the next day.  As of today, it is still working, and I think that it could go back to its owner finally.

So, how do we explain this – the temperature is not hot enough to melt solder as such, but maybe by doing this we are equalizing thermal stresses on the die to substrate connections.  This could be explained that some of the connections are not actually properly soldered (we call these dry joints or cold joints), but do get into contact when the die is in the right place.  Meaning of course that the chip is faulty and by heating it up, we get it working again.  So cross your fingers, and hope that the owner is happy and is working again with his laptop for some time to come.

Advertisements

Reflow.IT – Dell XPS M1530 Laptop

This Dell XPS M1530 laptop came in a while ago.  I was busy at the time, so it got put on the back burner. Anyway, I was asked about it recently and of course, I hadn’t forgotten about it because I see it every few days, but I had left it – quite a long time.

The problem is that it doesn’t boot, or even get to a bios screen. It sits there with the fan and hard drive running, and just not boot. It will however, do a couple of things. If I press and hold the D button, then press the power button, it will come up and do a diagnostic on the lcd screen. Mainly going through a few different colors, white/grey, red, green, etc.

If I press the Fn button, then press Power, it will go into a diagnostic. It shows three possible lights using the blue keyboard status leds, NumLock, CapsLock & ScrollLock. The left one is flashing, and the middle and right ones are solidly lit.  From the internet, this indicates a CPU Processor fault. Solution is to – 1 Reset the CPU, 2 Replace the CPU, 3 Replace the system board.

Now this laptop had come in previously and I had replaced the system board. This was because one of the heatsink mounting tabs had come off the system board which meant that the heatsink was not making good contact with the graphics chip allowing it to overheat and therefore fail. Replacing the system board again, is no guarantee that the problem won’t happen again, so this time I decided to try a reflow.

Some background information – a number of nVidia graphics chips in laptops had problems whereby they would prematurely fail. It was known to occur in Compaq, HP, Dell and other brands so it wasn’t the brand, but actually the chip. It seems that these chips being a BGA (Ball grid array) had the wrong alloy of solder balls on it, so that when soldered onto the system boards, after some time, these solder balls somehow didn’t make contact. The only permanent solution to this was to remove the graphics chip and install an upgraded chip that did not use that alloy – of course, it is difficult to find out what alloy we need, and certainly, the manufacturers won’t tell us – but that is what we hear.

An alternative is to remove the chip, clean all the solder alloy off it, then reball it, by melting new solder balls to the grid array. I haven’t been successful in doing this because my cheap equipment isn’t up to the job or maybe I don’t have the experience. So plan B, is to perform a reflow. A reflow means to heat up the graphics chip to a high enough temperature, like 250 degrees, to allow the existing solder balls under the BGA chip to melt, and then allow it to cool, and hope (fingers crossed) that it cools down with all of the joints intact. Again, I haven’t successfully performed a reflow as yet, but maybe now I might get lucky.

I removed the bottom memory and cpu cover, removed the memory, then unscrewed the 7 screws that hold the heatsink on, remembering to remove the heatsink fan power cable. I removed a small piece of plastic from near the graphics chip. Then cleaned the thermal material from both the cpu and the graphics chip. The cpu was easy to clean, but the graphics chip has lots of chip capacitors on it, so we don’t want to scrub too hard, and it is often easier to leave the residue, but remove it from the contact surface.

I hooked up a temperature probe – a thermocouple to the graphics chip side, then it was time to “reflow.IT“. I have a smd rework station, which is a hot air machine, set to 300 degrees C, then by heating the graphics chip and surrounding area for a little while, then get closer to the graphics chip and go around and around it. Just basically aim the hot air onto the chip to get it evenly heated. The thermometer would go up when the hot air is in that area, then drop down. I did this for about five minutes gradually getting the chip hotter and hotter – the temperature only got to about 200 or so, but the chip itself was probably hotter than that. Anyway, after I thought it was ready, I stopped and let the chip cool down on its own.

I put some Arctic Silver 5 thermal paste onto the cpu and graphics chip – spread it around with a small piece of plastic (cut from an old credit or loyalty card). Mounted the heatsink, attached the fan cable, installed the memory and put the cover back on. Lastly, which I didn’t mention before, insert the battery. Oh, by the way, I had the laptop open with the keyboard facing down on the work bench when I was doing the heating up – you don’t want the heat to get to the lcd screen otherwise that will be damaged. A better way is to take the system board out of the laptop, but it was easier to leave it in.

Ok, acid test time – open up the laptop, press the power button…  My goodness, I see a bios screen, then shortly it starts booting up – success! My first successful reflow. I realize that I didn’t take a photo. By now, Windows had started and was asking for a password – ok, I messaged the owner for the password. In the meantime, could I try this with the old laptop which I still have lying around?

SONY DSC

This time, I took a photo to show the area around the graphics chip. The graphics chip is a nVidia chip, the one on the left. The cpu is the one on the right bottom. To the top left of the graphics chip, there is a hole, where a heatsink mounting nut should be. If this works, I might have to use some epoxy and put that nut back on.

I set the hot air rework station to 350 degrees. Then started heating the graphics chip – this time only for about 4 minutes or so. The temperature probe was showing that when the air was directed at the thermocouple, it was hitting 245 degrees, so this time, it should be hot enough – but of course, don’t have it too hot for too long. Ok, done – leave it to cool down, and check if a password had come through. Yes, it had – enter password and log on.

Success! This laptop comes up with a popup message each time you log in – it is a throwback from when it used to be on a Windows domain. I had a look and found that it was a script running from the Startup folder in the Program groups – deleted it, and this should fix this particular problem.

I left it to reboot, then reassembled the other laptop, put some memory in – didn’t have a hard drive to put in, but won’t need it. Plugged in a battery, then fingers crossed, pressed the power button, and… voila! I get a bios screen – that is two successful reflows today. Ok, now I need to dig out a few more laptops and try this out some more.

[Note] Previously, I had been trying to perform reflows with an infrared rework station, however I could never get the chips hot enough to be able to remove a BGA chip. Then a while ago, I had the need to work on some surface mount boards, to remove chips which needed a hot air rework station – that was when I got this rework station and the infrared station got relegated to the garage.

[Edit] Updated diagnostic light indicators.