Skip to main content Skip to secondary navigation
Main content start

Lean machines: Research aims to produce more efficient computer chips

Chip companies have taken important steps to improve their power performance

There was a time when giving computers more “power,” that is to say “performance,” was an unabashed mantra of the semiconductor industry. No chip seemed fast enough. But a more ominous meaning for power has taken over: Wattage and heat. The industry is now looking for a long-term solution to a troublesome problem.

“The cost of energy for a server exceeds the purchase price of that machine in a year and half,” says Bill Dally, chair of the computer science department and also a professor of electrical engineering. “It’s also a significant amount of carbon that’s being released into the atmosphere because people are powering servers.”

In fact, computer servers and the infrastructure required to cool them now account for more than 1 percent of total electricity usage in the United States, according to Jonathan Koomey, a consulting professor of civil and environmental engineering. Meanwhile, people are depending increasingly on mobile devices that can only last so long on a given battery charge. More efficient cell phones and laptops would last longer.

Hardly oblivious to the problem, chip companies have taken important steps to improve their power performance, but achieving more than a stopgap or near-term solution, say Dally and electrical engineering and computer science Professor Mark Horowitz, could require substantially rethinking how chips are designed.

For example, chip companies have made chips more efficient by optimizing them for just one application, such as handling video encoding and decoding, or handling packets in a high-speed network. But these application specific integrated circuits (ASICs) are prohibitively expensive to make for all but the most popular uses.

Eager to provide chipmakers with a better solution, each professor is now pursuing innovative ideas to reduce the financial and environmental footprint of future computers. Dally has made a very efficient general use processor, while Horowitz is investigating how new design methods could slash the cost of creating ASICs.

Location, location, location

In describing his new processor that is, on average, 32 times more energy efficient than a standard chip with comparable function, Dally readily makes analogies to real estate. The reason is that one of the biggest electrical problems facing chips today is the distance that signals must traverse to make them work. About 70 percent of a chip’s energy is expended pushing bits from distant memory banks to the logic units that must process them, and then hauling that output to its next destination.

A key innovation in his EEC (efficient embedded computing) processor, is, in a sense, the same kind of shift that environmentalists call for when they advocate greater consumption of locally grown produce. By putting smartly managed storage closer to the logic units, Dally is greatly reducing data transport much in the same way that people buying vegetables at a farmer’s market would greatly reduce the need for long-haul boat, plane, or train transport of food.

To bring data closer to logic is helpful, but not if it’s the wrong data. Dally therefore employs sophisticated optimization techniques in his processor’s compiler to ensure that the most deserving data is given the best proximity as often as possible. Here the best analogy is to a neighborhood convenience store chain that employs demand forecasting to ensure that the products in highest demand are always in stock.

To understand the difference optimization can make, consider how typical processors “decide” what to keep in a cache. Say such a cache can hold four units of information. It would simply keep the most recently used four units. If the program the chip is executing, however, cycles through a loop that uses five units, the cache will never have the data that is needed next. It will always have to send away, at great energy cost, for the missing unit.

Dally’s compiler looks ahead in the code of the programs written for his chip and anticipates what data will likely be needed next. The compiler also analyzes what the flow of data around the chip will look like, and tries to plan the most efficient paths for that flow.

The results of this and other design advances are to be published in an upcoming paper. They represent a major improvement over the energy efficiency of a conventional embedded Reduced Instruction Set Computer (RISC) processor. Dally’s group tested each chip’s power usage while running standard software tasks, such as encryption, signal processing, image encoding and mathematical tasks. On average the EEC processor used 32 times less power (the median factor was a little more than 20) than the RISC processor.

The technology embodied in the EEC processor promises to both reduce the development time of demanding embedded applications and to enable applications where the sales volume unfortunately does not justify the high development cost of an ASIC, such as scientific instrumentation and aids to the handicapped, Dally says.

Virtual virtues?

“The best way I know to create an efficient design is to create an application optimized design,” says Horowitz, who will become chair of the electrical engineering department this summer. “But creating this design has got to be cheaper.”

The reasons ASIC design is expensive, is modern chips are very complex, and it takes a lot of work to ensure that this complex system really does what you want it to do. Horowitz’s approach is to try to make the result of this expensive design process more useful than a single chip. He wants the designers to create a chip generator, rather than a chip. In other words, rather than creating a flexible video processor, create a tool that can create a family of video processors – creating a virtual, extremely flexible, video chip. To create an optimized processor, the application experts would configure/program the generator to run their application. Once this has been done, the generator would use this information to create an implementation energy and performance optimized for this application.

To some, the generator may seem a little magical, but Horowitz has thought intently about the challenges he’ll have to overcome to succeed.

“Can I build a generator that is flexible enough to be usable?” he asks. “Can I generate not only the design of the chips but also all the testing collateral that’s needed to validate that it works? And can I take logic design, optimize it and create an efficient silicon implementation?”

The research is young, but he is optimistic that it can succeed.

“I don’t know that I can build a chip generator, but at least right now I don’t know of any reason why I can’t,” he says, and his group is off working to demonstrate a prototype chip generator this year.

Stopgap measures and small careful steps, after all, are not going to be enough to change the basic nature of the power problem. That will more likely come about either by massive leaps in the efficiency of general processors or large reductions in the cost of designing ASICs. Or both.