Consistency, completeness and sharing of data can provide insights about a chip’s design, health and interactions, but it’s not that simple.
(Originally posted on semiengineering.com)
The semiconductor industry is becoming more reliant on data analytics to ensure that a chip will work as expected over its projected lifetime, but that data is frequently inconsistent or incomplete, and some of the most useful data is being hoarded by companies for competitive reasons.
The volume of data is rising at each new process node, where there are simply more things to keep track of, as well as in advanced packages, where there are multiple dies. Alongside of that, the value of that data also is increasing for several key reasons:
- Increased transistor density and thinner dielectrics have escalated the number of potential physical effects and interactions caused by running circuits harder and longer, and those kinds of problems cannot be discovered in enough detail using traditional testing alone. It requires a combination of testing plus data analytics.
- That, in turn, has led to an explosion of sensors inside of chips and packages, as well as inside of fab and test equipment. Each of those sensors can generate large quantities of data, and machine learning can be used to detect patterns. By applying machine learning to that data, patterns can be detected to improve yield and reliability.
- On top of that, the use of advanced-node chips in safety-critical and mission-critical applications has ratcheted up the pressure to predict when those chips will fail, exactly what will fail, and to give ample warning of a pending problem. That requires more data from more parts of the design-through-manufacturing flow.
Some, but not all of that data, is useful where it is collected. But it becomes much more useful when it is made available across multiple points in the flow. That helps to significantly boost reliability because it provides a traceability path back to the source of a problem, which can then be used to fix existing problems and prevent future issues.
“The key is having the right data and making the right decisions based on that data,” said Scott Kroeger, chief marketing officer at Veeco. “Semiconductor players traditionally have been protective of data because there is a certain amount of IP in there. There is a fine line between what can be shared. But for sub-5nm logic, you need feed-forward analysis of quality and yield.”
The starting point for feeding that data may be failures 10 years after a device is manufactured, or it may be a test chip for a complex SoC or IP block within that chip.
“You need data to determine how a product is performing so that you can go back and look at different recipes and create predictive models, put in place predictive maintenance, and know what to do to prevent a device from failing,” said Doug Elder, vice president and general manager of OptimalPlus. “But as you get into the lower geometries, what we’re finding is the ability to collect that data is becoming much harder. At 5nm, defect and metrology data is much harder to collect. Therefore, your ability to take action on that is going to be delayed until you can get to wafer-level data to feed that back to say whether you have a recipe problem, an etch problem, and where is that occurring. The access or availability is becoming harder.”
One of the key issues is just being able to inspect all of the structures at advanced nodes.
“In some cases there’s also too much data from the advanced nodes, and people aren’t sure what to do with it all,” Elder said. “So some folks are having trouble getting the data. One of the things we’ve been doing is taking machine vision images, digitizing those, and running machine learning algorithms on them to determine whether something is good or bad. We can digitize a very sophisticated image. The frame grabbers in the machine vision are such that you can take a very well defined image, look at what points are important to the manufacturing process, and then determine whether you have a good item or a bad one.”
That produces its own set of data, which can be analyzed in detail using a combination of approaches, some new and some not so new. Consider what is happening in high-speed optical screening, for example.
“Optical inspection is basically the only way you can do that because you need so much coverage,” said Chet Lenox, senior director of industry and customer collaboration at KLA. “For conventional inline process control you can get away with sampling a couple of wafers per lot, for example. But you need 100% inspection for screening, so it has to be fast. The challenge is that if you’re going to do it on a more leading-edge node, there are still going to be relatively small defects that are a latent reliability risk. So you need a capable inspection, and to utilize the fact that often the same latent reliability defect mode that is the biggest risk often has a range of sizes. We can catch the bigger ones with our screening inspection and use those as a proxy for smaller defects. When you combine that with machine learning algorithms that allow us to do defect classification based on the optical inspection results only, without review required, it’s impressively predictive.”
With increasing frequency, that data also is being paired with in-circuit types of sensors, which add yet another dimension to the data.
“We’re collecting data and applying it in ways that previously were not do-able,” said Jeff David, vice president of AI Solutions at PDF Solutions. “There’s a lot of work going on to collect data above and beyond what our customers are collecting. We’re generating data sets from that. The obvious next step is how to incorporate that into machine learning algorithms to make better predictions about different use cases. One of those use cases is smart testing, where we can predict whether a chip will fail based on expensive tests. If we determine that a chip will pass that test, then you can skip that test altogether. Data also can be used to predict whether a chip will fail out in the field, so you’re reducing RMAs.”
The key here is boosting the accuracy of those predictions, particularly early life failures, and that requires both a deep understanding of the data coupled with domain expertise. But it also requires access to the data in the first place, and not all companies are willing to give that up.
“All of the fabs are very protective of their data,” said Subodh Kulkarni, CEO of CyberOptics. “Our customers give us enough data to improve our sensors, and we offer tools in software where we can do more than what they can do with raw data.”
Sharing data varies by customer. “The customer owns the data,” said Evelyn Landman, CTO of proteanTecs. “In chip production testing it’s the semiconductor company, in system test it’s the system vendor and when in the field it’s the service provider. They can share the data and this will augment the value even more. The semiconductor vendor will be able to see the performance and health of their chips in the field, the same for the system vendor regarding their systems. Data can be shared back and forth, from chip design all the way to fleets in the field. So when there is an issue in-field, you don’t only get a predictive maintenance alert, before the failure happens, but also instead of sending material (systems) back to the vendor, you only need to send data.”
Storing and accessing data
Utilizing data from different sources also can create other problems. It often is stored in different ways and in different formats, and those discrepancies increase as the number of different companies involved in developing a chip increases, whether those include design, IP, manufacturing, packaging or test. Moreover, all of those elements can change from one market to the next, and from one chip to the next.
“There is variation in the ways in which data is stored, and there is variation in the raw data,” said John O’Donnell, CEO of yieldHUB. “The key is to make sure customers can see clean data. In final test, for example, you don’t have x and y, so you need the underlying clean data sorted out to figure out one parameter versus two or three other parameters. When you clean that data you should have a single representation of a die for each test.”
Companies provide on-chip sensors do collect data in a consistent way. “Our Agents can work from structural test to functional test to system test, so they ‘speak’ the same language,” said proteanTecs’ Landman. “We are providing the same data, whatever state the system is in, regardless of whether it’s the tester or the system or an evaluation board.”
While an important development, there are many other pieces in this complex puzzle. There also is variation in the tools themselves and the wafers being processed by that equipment. The most effective way to minimize that variation is by applying machine learning algorithms, because in a large fab there could be dozens or even hundreds of various pieces of manufacturing equipment. To offset that problem, equipment makers have been adding sensors to detect every step of the process, and those sensors are generating data that is used to do things like balance etch chambers to achieve consistent results.
“During wafer manufacturing, every tool and every wafer is in a unique starting state,” said David Fried, vice president of computational products at Lam Research and CTO of Coventor, a Lam Research Company. “The performance of any piece of equipment is dependent upon the quality of the wafers that were placed onto that equipment. Every equipment manufacturer suffers from this problem. If you clean up the distribution of wafers going into a process tool, the distribution coming out is obviously going to be better. Instead of fighting that fact, you can use machine learning to adaptively process your wafers as the distribution of the incoming wafers changes. This can eliminate the requirement to clean up the input wafer distribution in order to obtain a clean output distribution by having a tool that adapts as the input distributions change.”
Variation is endemic in chip manufacturing, and that variation becomes more important at advanced nodes and packages — in many cases, the most complex packages are developed at the leading edge nodes — because the tolerances are tighter. There is less room for error at 7nm than at 10nm, and far less at 5nm than 7nm.
Accurate data is required here to adjust processes and to ensure quality over time. This is made more difficult by the fact that not all data is in the same format, and not all sensors perform exactly the same over time. Sensors age just like any other circuits, and data output from those sensors may need to be adjusted. Understanding the lifecycle of the sensors themselves, whether in-circuit or external, is a critical piece of this puzzle.
Add in multiple companies — particularly some that are reluctant to share data, such as those in the highly competitive automotive supply chain — as well as multiple data formats that need to be translated, and the picture becomes blurrier.
“You cannot test everything, but you can use data to test for one parameter or another parameter,” said yieldHUB’s O’Donnell. “So you get something out of the data that you couldn’t do otherwise. You also can check the database for trends and go beyond what’s otherwise available. This is important.”
Another key piece of the equation is traceability. Just detecting a problem in data is far less valuable without the context around that problem. For example, was it just one die on one wafer that was defective, or was every die in that same exact position on every wafer defective? And how long did the problem last? Was it due to the initial design, damage during the testing process, a problem in the wafer polishing, or was it just an aberration that never appeared again?
“You need to be able to search for a die in the database and see where it was in wafer sort and final test and packaging, and how that die performed versus the rest of the population,” said O’Donnell. “This is really big. Everybody is looking at this. They don’t necessarily even know how they’re going to do it, but they know they have to do it.”
The big challenge there is broadening the data set to span the entire design through manufacturing flow. “A lot of these chipmakers have been trying to utilize data across the supply chain, from foundries to OSATs, but none of them has been fully successful yet,” said Jon Holt, worldwide fab applications solutions manager at PDF Solutions. “They’re using a data warehouse or a data lake, or incorporating the data via big data, deep learning technologies, but they all have vertical integration challenges — silos of data, OSATs, multiple warehouses, different data quality, control and speed. And their ability to bring all of that together is difficult. They’re working on it, but it hasn’t happened.”
Utilizing data more effectively is a work in progress. It requires rethinking some of the core information flows that have been developed over the past couple decades. The semiconductor supply chain runs like a finely tuned instrument, but at the most advanced nodes it has evolved largely for consumer electronics where the lifecycle was typically short and the range of conditions was generally controllable.
As these chips end up in automotive and industrial applications, or in edge and cloud data centers, reliability at the leading edge becomes more important. Chipmakers are leveraging the entire arsenal of equipment they already have, but they also are developing some entirely new data-driven approaches, as well. This is resulting in some big changes across the supply chain, and it likely will have broad implications for chips developed at every node as these approaches become more mainstream.