Improving STDF processing into a database
To implement a yield analysis system like yieldHUB, a database is usually required. There is a step of processing the STDF datalogs and storing the processed data into the database.
The reason for doing this is that the STDF is designed to be efficient in saving the data near real time as the test program is running. However, this data structure is not efficient for reporting purposes. The data needs to be organized into a different data structure designed to be efficient for reporting.
In implementing the yieldHUB database, the main goals are speed and the least amount of storage. In most cases, database tables with table-level locking are sufficient for this purpose.
However, in situations when the datalogs have several thousands of tests, the sheer number of tests to save into the database table highlights the inefficiency of the table-level locking, especially when there are >100 active users generating large reports.
How slow processing can happen
The inefficiency is inherent in table-level locking for mixed insert/update and select operations. One process that performs an insert or update operation locks the whole table. If a reporting query is called that needs several seconds to finish is followed immediately by an insert or update operation on the same table, the insert operation has to wait until the report query is finished. Subsequently, any further select operations will have to wait. This can quickly degenerate into a very slow processing situation.
On yieldHUB, if there are 12,000 tests on four test sites, there are actually 60,000 insert operations for the per site statistics. The fifth one is for the overall statistics. This can take several minutes to complete. In addition, processing is usually done using several instances so if there are 10 instances processing datalogs of the same program, that makes the number of insert operations to 600,000. The table locks that would occur cause more wait time for other processes and the application would appear to be slow. And if the storage device doesn’t have a high IOPS, it would cause high I/O wait.
Recommended Solution
To solve this issue, it is recommended to use row-level locking for the tables that store test statistics and information. Row level locking allows select queries to occur even if there is an insert or update operation happening.
On yieldHUB’s database tables for the test statistics, row-level locking tables use 54% more storage. So if such tables are used, make sure the storage has enough capacity for the additional storage requirement.
We have seen a 3x increase in processing speed overall as well as a more responsive application just by changing the tables that store test information into row-level locking tables. However, because of the significant increase in storage requirement, this solution should be done only if storage is available and if there is a large volume of datalogs to process most of the time and users are active during processing.