Graphics Cards

Hotspot problem of AMD’s Radeon RX 7900 XTX (MBA): Even several batches of the vapor chamber affected … – igorslab.de

I deliberately waited until I could get congruent and plausible information through various information channels, which can now also shed some light after two to three weeks. While AMD’s support initially resisted even acknowledging the hotspot problem as such and described the 110 °C hotspot temperature as completely normal, the situation is now much more pleasant from the customer’s point of view.

I would also like to take this opportunity to thank in advance all those who have provided me with information over the last two or three weeks but who, for understandable reasons, do not want to or cannot be named. However, I explicitly point out that all this information should be considered preliminary until an official statement from AMD and that there could still be corrections to the mentioned numbers.

In this context, I had actually waited for yesterday, because a statement from AMD was communicated internally (I wrote about it several times that I would like to wait for this for the time being), but then it did not happen. I am therefore taking this incomprehensible delay as an opportunity today to bring some transparency to the matter.

The problem is already known for more than two weeks and is communicated between AMD and the distributors (internally) .. How exactly the problem will be communicated to the end customer has not yet been finally decided. … AMD’s statement was originally supposed to be made on January 3, 2023 at 6 p.m. CET, but it was postponed.  (quote from email)

Let’s first summarize what information is now known. In the following, I will also quote from my mail correspondence in order to give one or the other figure and to clear up some of the confusion of speculations.

Problems with the vapor chamber of the RX 7900 XTX and the amount of affected cards

I already wrote in the forum that I could also ask sources at the manufacturers of cooling components, who could fully and independently confirm the first assumptions that the error was due to the vapor chamber. In the meantime, however, you have to separate between the general processing problems, as I already described them in a Radeon RX 7900 XT (article is linked at the end) and a real production defect, which only affects the Radeon RX 7900 XTX, though. Both problems exist (see my measurement), but only the RX 7900 XTX’s functionally limited vapor chamber is such an essential problem that it requires an RMA. This is because it is not only the hotspot temperature, but also the storage tank, which can get as hot as 110 °C and thus operate well above the permissible temperature limit.

However, if it really affects the four to six batches of vapor chambers I mentioned, then the amount of cards affected is even in the high five figures in the worst case. However, it does not affect a single board design, but only the so-called MBA cards (Made by AMD). These cards are sold both by AMD directly and through board partners, who can buy them and sell them under their own label. Colloquially, such a thing is also called a reference card, even if AMD (like NVIDIA with the Founders Edition) wants to stand out a bit from the image of the butter-and-bread card.

As assumed, the cause is the evaporator chamber…  Several batches are affected. Currently, 4-6 batches and thousands of graphics cards are assumed. Only MBA cards are affected. (quote from email)

Of course, one can ask why all this was not recognized much earlier and why quality control obviously failed so grandiosely. This also has something to do with the fragmentation of production and supply chains. Both the chamber and the complete cooler come from a third-party manufacturer far in advance and are of course very difficult to test without the board assembled. Using QR codes, however, all components can be assigned in time and also located subsequently.

If the PVT samples (Production Validation Test) worked, then later one takes samples from the current MP (Mass Production) at most. In the best case, such cards are tested e.g. at PC-Partner in special hot boxes, but are there in a vertical setup. And then it comes exactly to the occurred case, because the affected vapor chambers are not completely non-functional, but only more or less limited functional. However, this is almost impossible to monitor with granular tests.

Feedback from distributors and system integrators

Since AMD decided to take back and exchange, several hundred MBA graphics cards have already been returned according to various sources. Unfortunately, this also affects system integrators who had to disassemble complete systems again in order to exchange the cards. As an end customer, you only ever see your own individual card, but the group of people who are much harder hit by such problems is much larger. In addition to the costs of the exchange itself, the responsible parties will then have to bear further repair and logistics costs. Not to mention the damage compensation and the loss of image. The points against NVIDIA’s 12VHPWR adapter in the Radeon presentation suddenly look like bad satire.

We had to return 300+ graphics cards of Asus MBA, Sapphire MBA, PowerColor MBA, XFX MBA to the retailers/warehouse/wholesale…
Complete systems also had to be disassembled for this procedure. (quote from email)

Exchange or return? AMD offers both

There are now also instructions to the employees concerned to address the problem actively and accommodatingly if the customer can plausibly explain or prove a corresponding error. The first cards are already in exchange and AMD offers both a refund and an exchange for a working graphics card. However, the return shipping for a refund is at the customer’s expense, which is unattractive but not unusual.

End customers have to contact the vendor or AMD support directly. Distributors and stores have to send the affected graphics cards to wholesalers and warehouses. (quote from email)

The relevant mail to the persons concerned then looks like this (names of the parties involved made unrecognizable):

Here, too, my thanks go to the many active readers and silent consumers who have turned to me with confidence. Some things could already be solved accommodatingly in advance via the small inofficial channels, for others you really had to rely on AMD’s unofficial Go! wait for the support. So far, however, I am not aware of any case that could not be resolved to my satisfaction. From that point of view, it’s good news for the customer. However, as for the cost and scope of the whole operation – I’d rather not even know.

Yes, mistakes can happen. This is due to the nature of the business and the highly complex processes involved. But it is also the manufacturer’s duty to communicate openly and transparently. Simply sitting out announced statements is never a good solution. If you consider the usual salaries of well-paid marketing, this should actually suggest capable and competent people who can also pro-actively deal with such situations. The ostrich is the completely wrong role model from the animal kingdom, otherwise you quickly look like a donkey (the dear odd-toed ungulates may please forgive me this comparison).

The other problem, which is not quite as serious, is the final processing of the heatsink on some cards. My Radeon RX 7900 XT had various machining errors (crooked grinding, bumps), which can no longer be dismissed with acceptable gaps when the chamber including the mounted lamella construct is firmly connected to the body and not mounted floating (as e.g. with NVIDIA and older Radeons) to be able to compensate for unevenness by sophisticated tightening on its own. Here, the engineers would do better to orient themselves on functioning role models (also from their own production). Form follows Function and not vice versa.

RDNA3 and too high hotspot temperatures on some AMD Radeon RX 7900 XT(X) – Cause research