Skip to main content Skip to secondary navigation
Main content start

Transistor aging research can keep chips working longer, reduce early breakdowns

Team of electrical engineers studies what goes wrong to detect the early warning signs of silicon chip failure

Everyone knows that electronics become obsolete (What’s that quaint old thing? A non-3G iPhone?) but far less well known is that they physically age. It’s not just a matter of a keyboard getting dirty or a touch screen getting scratched, either. Under the stressful ebbs and flows of electrical current, the transistors within a silicon chip can gradually slow down until they stop functioning altogether. Similarly, chips that come off the manufacturing line with defects will fizzle out before their time, leading to “early-life” failures that inconvenience consumers and incur warranty or recall costs for manufacturers.

Eager to stave off such breakdowns, a team of electrical engineers is studying what goes wrong so that they can detect the early warning signs of trouble. The research could enable manufacturers to do better quality testing before they ship their products. It could also allow them to design chips that could spot the signs of aging and then “self-heal,” for example by routing around the failing transistors or by re-distributing the workload away from them before errors occur.

If the problem doesn’t seem epidemic yet, it might in a couple of years as transistors continue to become smaller and smaller. Even now, a key part of every transistor called the gate oxide has become so thin that is particularly susceptible to defects and wear.

“We’re talking about glass that is four atomic spacings thick but we’re slamming electrons back and forth through it,” says electrical engineering Professor Bob Dutton, whose research group is working with that of electrical engineering Assistant Professor Subhasish Mitra on the research. “Some of them change the chemical nature of the [silicon] bonds [in the glass].”

Ultimately, as the bonds are battered and broken, the transistors switch on and off more and more slowly until the delays they create in the circuits they are part of become a hindrance to the performance of the system.

Stressing out transistors

Rather than despair about the problem, the engineers devised an experiment to study it. Their goal was to gather enough data about the exact nature of early-life gate oxide failures to create a model of it. Because it maps out the behavior of weakening transistors over time, the model could enable early predictions of eventual failure.

The value of the model, Mitra notes, is that detecting gate oxide defects just from the overt symptoms of early-life failure can be tricky.

“These early-life failure chips fail intermittently – like cars,” he says. “You have some trouble while driving but nothing shows up when you bring it to the mechanic.”

To conduct the experiment Tze Wee Chen, a doctoral student in the Dutton group, and Kyunglok Kim and Young Moon Kim, students in the Mitra group, acquired a test chip made by a commercial foundry. The transistors, made using the 90-nanometer technology generation (about a step behind today’s most advanced transistors), has an oxide thickness of only 1.6 nanometers. Because they could not count on the transistors to have oxide defects that would result in early-life failure, they had to induce them by torturing the transistors with an extra high voltage for 10 minutes, a process called “soft breakdown.”

After soft breakdown, the researchers continued to stress the transistors with 10-minute bursts of high voltage. At each break they would measure each transistors’ characteristic curve of current vs. voltage. Normally transistors yield a curve with a quick rise and then a quick leveling off (like a lowercase letter r), but after enough stress, use and abuse, they will eventually experience a “hard breakdown” that turns their curve into a slash (“/”). Along the way, the current they pass through becomes less and less, translating to slower and slower operation.

Predicting failure

From the data they collected Chen and his fellow students were able to derive a model that quantifies the performance decay and makes it recognizable (and therefore predictable) at an early stage. This is somewhat akin to a stress test that a cardiologist might run on a patient suspected of having heart trouble. Chen says it could be the basis of a much less costly and quicker test for transistor early life-failure than the “burn in” test currently used by manufacturers to test newly made chips. Burn-in is a more brutal, energy-intensive process that can actually age the chips it is meant to merely diagnose.

“The cost of testing is a very high cost for manufacturers, and the cost of a recall is even higher,” Chen says. “But burn-in isn’t as cheap or as effective as it used to be.”

Chen presented a paper on the experiment at the 26th IEEE VLSI Test Symposium at the end of April in San Diego. The team’s original test chip in that research had only 10 transistors, but the\ researchers have since created a chip with 32,000 transistors to repeat the experiment and improve their understanding of the phenomenon’s statistical behavior.

Beyond improving assessments of early life failure, the model could also be used to detect signs of aging in chips that have been running seemingly just fine for years. In the days when computers had only one processor, the information might not have been so useful, but modern computers have multiple processor cores. Taking the idea that aging can be detected well before breakdowns occur, Professor Mitra’s group is looking at whether computers can be designed to be “self-healing.” One idea is to exploit the parallelism of multiple cores so that if one core starts to weaken, more processing work might be shifted to other, healthier cores. Such designs could ensure a longer, more reliable life for the system overall.

“With multicore you’ve got some redundancy,” Dutton says. “If things are not working properly or starting to go bad, use another part of the chip.”

Having a means for early detection of these problems should allow for finding the lowest cost solutions, Mitra adds.

The rapid obsolescence–sometimes more a matter of marketing-driven perception than reality–of technology has probably quite often meant that users have replaced machines before the signs of aging were apparent. But as technology advances and transistors become even more susceptible to breakdowns, research into transistor aging may become essential to preserving the benefits of information technology.