Understanding how to utilize data from multiple sources is major challenge and opportunity for chipmakers.
Originally published on Semiengineering.com
Data has become vital to understanding the useful life of a semiconductor — and the knowledge gleaned is key to staying competitive beyond Moore’s Law.
What’s changed is a growing reliance earlier in the design cycle on multiple sources of data, including some from further right in the design-through-manufacturing flow. While this holistic approach may seem logical enough, the semiconductor industry has been siloed since its inception. Expertise has developed around specific steps in the flow, and careers have been made based on successful execution in each of those silos.
The problem is those demarcations no longer work for a data-driven design and manufacturing model. As time-to-market shrinks, complexity increases, and as unique architectures become essential for tapping increasingly fractured market segments, traditional silos will need to be rethought. Increasingly, the glue between all these pieces is data, and the top challenge now is getting all that data to work together.
“The semiconductor industry has been very good at collecting data, but that data has not been brought together,” said Doug Elder, vice president and general manager of OptimalPlus, an NI Company. “That’s where the next order of magnitude of learning will come from.”
Standards are helping with this effort. In addition, third-party analytics and monitoring companies are amassing and organizing the data to connect the dots. These companies are making it easier to see patterns by ingesting and cleaning the data and putting it in accessible database formats to help their customers gain visibility, ideally in real-time. By doing this they are adding visibility into chips that generations of semiconductor customers and design, manufacturing, and test engineers only dreamt about in the past.
Challenged by design
One of the big challenges is understanding data in the context of analog and digital. This can vary greatly, depending upon whether the bulk of the design was developed at older nodes for most or all analog circuits, or whether those circuits are predominantly digital with an analog component. While most designers recognize that analog can drift and be susceptible to noise even at older nodes, at 5nm and below the digital circuitry becomes more analog-like and subject to similar disruptions from various types of noise. In addition, those nodes have less margin to buffer the effects of thinner wires and dielectrics, as well as the breakdown of those thinner dielectrics over time.
“Analog is not as easy to analyze,” said Elder. “We will see more of that, too, in traditional markets. People are using a lot more analog content in these systems.”
How that data matches up with other data from the digital side is still a work in progress. But the trend is toward an increasing amount of in-circuit monitoring for a couple of reasons. First, that data may not be accessible through regular testing.
“In a PMIC, there are a lot of nodes where it is not even testable,” said Carl Moore, yield management specialist at yieldHUB. “As you design these chips, you need to understand what are the testable nodes. There are hundreds of test nodes internally. If you can reach them, you can test them to determine if they are slow, average, or whether you need to amplify something to a reasonable level. But you also can design these chips so you can sense all of this from the outside.”
The second reason has to do with the growing emphasis on reliability in markets such as automotive, where chips are being used in safety-critical applications and where the anticipated lifespan may be as long as 20 years.
“Analog and MEMS are harder to test because you have tiny signals from the sensors and they’re more dense,” said Moore. “For designers who develop these chips, what matters now is what’s around the chip and how that chip is used. It’s not 1:1 for testing over time for things that you don’t need versus things that you do.”
All of that needs to be factored into the design. So rather than just making sure a chip is functionally correct, it needs to be data accessible using a format that can be correlated with other data being generated by a system or piece of equipment.
Semiconductor manufacturing equipment has largely been isolated from much of the data used to analyze a chip’s functionality. Efforts to correlate data across the entire supply chain are relatively new.
There are a number of factors behind this. First, much of the data being collected is considered proprietary and competitive, so sharing across the supply chain has been limited. Second, equipment vendors interface primarily with their customers rather than each other. Third, equipment vendors have been adding sensors for some time — but often those sensors have been add-ons because no one wants to sell advanced equipment these days without data analytics and machine learning. As a result, not all data formats are consistent, and the value of all that data isn’t always clear.
While the chip industry tends to view this from the context of increasing reliability through on-chip sensors or manufacturing processes, that data collection and analysis needs to permeate the supply chain. So rather than just temperature or voltage testing, foundries, OSATs and systems companies increasingly are concerned about the source of various technologies, when they were produced, in what size batches, and where and how they were burned in.
“If you have traceability on a batch, you’ve got to have equipment to read that data,” said Dave Huntley, who heads business development at PDF Solutions. “In addition, you need traceability for the manufacturer of that material. If one batch is bad, why is it bad? Being able to have pinpoint accuracy is essential. You need anomaly detection, and to do that you have to develop a good population of data around the timeframe for when a [manufacturing] lot starts and when it’s finished. And at that particular time, what was the toolset that was being used? There are lots of different elements that need to be tracked. It’s multi-variant Gaussian detection. You need all data from all factories that were involved.”
This coincides nicely with an effort to screen counterfeit chips and components, as well, which is particularly important with safety-critical and mission-critical applications. “We’ve been working on a supply chain standard for external traceability plus internal traceability,” said Huntley. “You want to trace that everything was done properly, but you also want to make sure it gets to where it’s supposed to go. And to be effective, you need to provide a way to link the silos in a factory.”
Fig. 1: Die-level identification and traceability model. Source: SEMI
Behind all of this is rising complexity within chips, and the need for enough coverage to ensure that chips will work as expected throughout their lifetimes.
“Multiple power rails, multiple power domains are the primary thing driving these large probe counts in the probe cards,” said Mike Slessor, CEO of FormFactor. “You don’t just have a density problem where you have to get a tremendous amount of current in and out in local areas of the chip. You also have to do it at different voltages and at different frequencies. In some cases, these chips have a dozen different power supply domains that you have to supply, all of them populated in different spatial areas of the chip. For example, they’ll have us build a probe card that maybe tests one or two die early in production. And then as they build some level of confidence on what they really have to test, and how much power they really need for the test programs they’re going to run — and as we start to increase parallelism and move from that 1-die card to an 8- or 16-die card — the probe count per die goes down quite a bit as they start to depopulate and these different power supplies get supported in different ways by the tester. So, a lot of it becomes an adaptive process as customers learn how they’re going to test these things and begin to ramp up.”
This produces an enormous amount of data, though, and that needs supplemented and correlated with other volumes of data being generated from various processes in the fab. A lot of work goes into pulling good data and turning it into something that is usable.
“The test floor in itself is a gold mine of data, which is growing all the time, and in order to deal with it you have to effectively purify the gold, which has to be automated when you are dealing with the types of volume we’re dealing with these days,” said John O’Donnell, CEO of yieldHub.
From early days of Electronic Data Interchange (EDI) and IEEE’s Standard Test Interface Language (STIL), which began in the late 1990s, it was obvious more test compatibility was needed. Wafers and chip die are still tested on a variety of equipment from a variety of vendors. Test equipment from different manufacturers used to present data in different or incompatible formats, using different operating systems.
To fix that, Teradyne started — and other vendors co-developed — the Standard Test Data Format (STDF). STDF is a binary format agnostic to database architecture and operating systems, developed for automated test equipment (ATE) so that ATE from competing vendors can be connected and the data ported and stored more easily. STDF is a set of logical record types that provide underlying data abstraction.
The format consists of standard record types and global meta data, and parametric (pass/fail and multipin) and functional tests. The binary format is converted to the more human-friendly ASCII format (an ATDF — an ASCII version of STDF) and used in database, or an Excel spreadsheet. The process of conversion can be tricky. The lot IDs or header files may go missing — the opportunities for bad data are numerous.
“Bad data is often caused by maybe not all the data being uploaded to the server, for example,” said O’Donnell. “And if you’re making decision on yield where you don’t have all the data available, because maybe some testers aren’t networked correctly or there is some issue with the uploading, that’s going cause a problem because you’re making a decision on less than the full amount of data available. Another example where you may lose data integrity is where maybe the lot ID is entered with a zero instead of an O, or vice-versa, or there’s some manual data entry issue. The actual test programs that run and test the chips on the floor may have issues that lend themselves to data integrity issues. So the quality of the test program and how it relates to generating data is so important.”
This has a big impact on time to yield, as well. “Good data — data you can rely on, trustworthy data — leads to faster yield improvement,” said O’Donnell. “It means less additional head count as your data grows. It means view lots in hold, and any lots in hold, would be on hold for a shorter time, while adding the ability to relate your MES (manufacturing execution system) data with the production data log data on the floor.”
STDF is the most commonly used format in test, but it has limitations. “While STDF is widely used in the semiconductor industry, it does not directly support the new use models in today’s test environment, such as real-time or pseudo real-time queries, adaptive test and streaming access,” writes Paul Trio, SEMI. “The STDF V4 record format is not extendable and, because the standard itself can be imprecise, it tends to result in many interpretations. These limitations become apparent when there is a need for more efficient and flexible format to manage ‘big test data.’”
The next format may be the Rich Interactive Test Database (RITdb) standard, which as the SEMI Collaborative Alliance for Semiconductor Test (CAST) Special Interest Group is working on now. CAST also is working on Tester Event Messaging for Semiconductors (TEMS), which enables real-time visibility into tests.
The wafer itself may not be able to provide the same test opportunities it once did. Hidden on the wafer are wafer acceptance tests (WATs) “Those are test structures placed within the wafer, between the dies themselves, in the scribe line,” said Nir Sever, senior director of product marketing at proteanTecs in a video, “These are special tests structures that can be tested by the foundry after the wafer processing is completed.” These WATs examine the wafer parameters and find wafer variations.
Understanding how that data correlates with other test data, particularly from areas such as built-in self-test and internal/external monitoring in the field is as complicated as these devices, coupled with all the possible use cases. In fact, companies are building entire business models around being able to process and correlate that data in useful ways.
“From our perspective it’s not a big problem because we have the same platform for all data,” said Evelyn Landman, CTO and co-founder of proteanTecs. “As long as the basis for correlation is there, there is no problem. We know how to correlate all of this data since it is generated by the same on-chip telemetry at every stage, whether it’s at ATE or system level test. Today you have different languages of data generated at different testing environments. Our end-to-end approach and deep data generated by the chips overcomes these difficulties.”
ProteanTecs is not alone in this market. This kind of capability has attracted interest from companies developing chips for data centers, as well as automakers and industrial operations. Being able to utilize data to predict failures and address them before problems arise has ramifications that go well beyond the chip.
Generating better data, gathering data from more sources, and being able to correlate all of that data poses some sweeping possibilities for the entire chip industry, including everything from improved reliability to predictive analytics. Rather than having a device fail without warning, it can be repaired or replaced as needed.
The chip industry is just at the beginning of this shift, but over the next decade it will likely define every step of design through manufacturing, improving everything from yield to reliability, security and safety.