This is about the Dell XPS M1530 laptop that I had worked on at length back in 2015. At the time, I had heated the graphics chip to a high temperature in order to get it back in operation. The laptop had failed again and was not powering up except for a couple of lights – so the same failure condition like last time.
Now, we cannot keep reflowing the graphics chip since the heat used may also damage other components, and if the problem continued to occur, then it meant that eventually, the graphics chip would need replacement. After some further research on this problem, I came across an article that talked about the failure mechanism. It appears that there is some conjecture about why it fails, and most of the talk that it fails because the contacts where it attaches to the motherboard become disconnected.
Ok, what are we talking about, you might ask? A typical modern graphics chip is housed in what is called a BGA package. BGA stands for ball grid array, meaning that the chip instead of having lots of pins sticking out the bottom like a lot of the older CPU’s, has instead, little solder balls on the bottom in an array or matrix. Most sites say that they need to reball the graphics chip – meaning to remove the GPU from the motherboard, clean off all of the solder balls and then put flux and new solder balls on it and then reheat it so that the balls are melted and firmly attached, then place back on the motherboard and reheat it so that the solder balls melt (flow) and join to the contacts on the motherboard.
How do we get all the tiny solder balls onto the chip? We use a stencil is how we do this – the stencil is made from stainless steel and looks like this.
This is one for an ATI 215-0719090 GPU that has 0.5mm ball. There is a small bottle of 25,000 solder balls and a coin for comparison. However, the Dell XPS M1530 laptop has a nVidia 6800M graphics chip and I don’t have a stencil for it.
Let’s get back to the failure mechanism – a number of other sites say that reballing the chip doesn’t really fix the problem, since it will happen again. Some also say that reflowing the chip also doesn’t fix the problem and that the solution is to replace the graphics chip.
Now, a number of sites also talked about wrapping the laptop in a towel and let it run until it overheats and effectively reflowing the chip. Unfortunately, the temperature would never get high enough to reflow since the battery would have exploded by then, so why does it seem to work and to fix the problem, albeit temporarily.
It seems that further research is required and I may have found some. If we consider how the graphics chip is put together, lots of actual graphics processors are manufactured on a silicon wafer which is then cut up to individual pieces which is called a die. This is the part on the top of a chip which is the part that touches the heatsink. On one side of the die, there are lots of contacts, so there are either small leads attached or we might also have little tiny solder balls. These solder balls connect to the substrate which is the greenish material and then the die is coated all around with some sort of filler material which is meant to keep the die in place and to help protect it against the sort of thermal and mechanical stress that will occur. The substrate then is where this reballing happens as mentioned above. This is of course a very simplified description of the BGA chip and can be very confusing especially when the die is also called a flip chip, since it is upside down, i.e. flipped…
So where were we? Ok, there is some disagreement about why these chips fail. Either the chip is detached from the motherboard or the die is detached from the substrate. Any one of these hundreds of connections failing would cause the failure. If the failure was on the die itself, apparently this could be confirmed by heating the die itself – apparently to about 130-150 degrees Celsius is sufficient to get the connections working again. Now this temperature is well below the melting point of leaded and non-leaded solder so what does this do?
I thought that I would try it out, since I had nothing to lose – I used a hot air rework station to heat the top of the chip at around 100 degrees for 5 minutes or so, and checked the laptop afterwards. Nothing happened, so tried the other laptop since I have two of them here. This time I went to 130 degrees and that laptop after this process did power up enough to show that the screen display was very inconsistent – with a lot of breakage and flickering.
I went back to the first laptop and this time heated at 140 degrees. I measure the temperature using a temperature probe. This time, I had success – the laptop afterwards would power up and I could log in and ran some diagnostics – at the same time as using a monitor program to show the temperature of the graphics chip. The diagnostics ran fine and the maximum temperature reached with the graphics chip was about 75 degrees – ok, fantastic!
Alas, it was not to be – after shutting down, the laptop would not power up. Ok, so try again by heating to 150 degrees. I did this and sure enough the laptop came on as normal, and with a notebook cooler attached – blowing air onto the bottom of the laptop. It has now been a week, with powering on and leaving it running for a few hours then shutting down and checking again on the next day. As of today, it is still working, and I think that it could go back to its owner finally.
So, how do we explain this – the temperature is not hot enough to melt solder as such, but maybe by doing this we are equalizing thermal stresses on the die to substrate connections. This could be explained that some of the connections are not actually properly soldered (we call these dry joints or cold joints), but do get into contact when the die is in the right place. Meaning of course that the chip is faulty and by heating it up, we get it working again. So cross your fingers, and hope that the owner is happy and is working again with his laptop for some time to come.