Tang Liu: Reflections on Product Quality - UT in TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 唐刘:关于产品质量的思考 - UT in TiDB

| username: TiDB社区小助手

Author: PingCAP Tang Liu

A long time ago, I noticed a risk in the TiDB codebase: many of our unit tests (UT) appear to be UT tests but are actually integration tests (IT). That is, many of our interfaces are tested using SQL. This has been a topic of extensive discussion within PingCAP. There are strong reasons for supporting SQL-based testing, as SQLite uses this method, and such tests are very easy to write. However, there are also strong reasons against this approach: it’s challenging to improve branch test coverage, and even simple functionalities need to be covered through SQL, making testing not straightforward.

The discussion isn’t about right or wrong. To me, both UT and IT are indispensable and must be done. However, for TiDB, we need to ensure that unit tests are genuinely unit tests. So, in mid-2023, I decided to migrate most of the SQL test code from the current UT code to a separate IT directory (related work can be seen at Split integration tests(IT) and unit tests(UT) in TiDB repo). At the same time, we started refactoring the code to make it easier for everyone to write UT tests. Here, I briefly record why I made this decision.

Before starting, there are a few declarations:

  • What I say may not be correct. I will periodically refresh my understanding.
  • This is merely my own thoughts on quality, a summary of my experience at PingCAP, and may not necessarily apply to other companies.

Basic Assurance of Quality

I love playing with LEGO and know that when I get a new LEGO toy, like a train or space shuttle, if I don’t follow the instructions and just randomly assemble it, I’m unlikely to get the final product. LEGO instructions and packaging are almost always organized by modules, meaning I need to open a bag of parts, assemble a module, and then iterate until the final product is assembled. If any module is assembled incorrectly, it will be a failure. So, after assembling a module, I carefully check if it matches the instructions and imagine if it fits the final product. If it does, I proceed to the next module.

This is the power of modularization. Developing TiDB follows the same principle. TiDB is built from modules, and if any module has issues, TiDB’s quality is at risk, leading to bugs. To reduce bugs, a good way is to test modules individually, which is what we call UT testing.

So, my first reason is - “UT is the basic assurance of software quality.” When designing modules, we need to clearly define each interface, its inputs, outputs, and expected behavior. To ensure our definitions are accurately implemented, I believe the best way is through UT. Of course, there are some things to note:

  • UT cannot guarantee finding all bugs. Even if a module achieves 100% code coverage through UT, it doesn’t ensure the module is bug-free. UT only ensures the module’s interface behavior meets our expectations, but it doesn’t cover the interaction between modules.
  • The higher the module level, the less benefit UT provides compared to lower-level modules. Higher-level modules are more complex, and for testing convenience, we often mock many dependent modules. Mocking is an abstraction, and since it’s an abstraction, we can’t fully cover all behaviors of the actual module, making it easy to miss bugs. However, this is not a reason to skip UT. UT helps us understand module interface definitions more clearly.
  • UT is not omnipotent; it needs to be combined with IT and other testing methods to ensure quality.

Prerequisite for Efficiency Improvement

Earlier, I mentioned the importance of modularization and UT for modularization from my LEGO journey. From another perspective, good modularization and UT can improve our development efficiency.

Here’s a specific example of why I initiated the separation of UT and IT in TiDB. One important reason is that running UT tests locally takes about 50 minutes, and it’s unlikely to pass all tests completely.

First, 50 minutes is a huge waste of development efficiency. Imagine making a change and having to wait 50 minutes to ensure all local UT tests pass before submitting the code. How can efficiency improve? So, developers cleverly avoided this issue by directly submitting PRs to GitHub, triggering our internal CI system to run these tests, while continuing to develop the next feature locally. This seems ideal, but it shifts the pressure to the internal CI system. We then need to solve the CI system bottleneck, including but not limited to buying more CI machines and introducing distributed cache to cache unchanged test results. However, as more developers submit PRs and add more tests, CI runs slower and slower.

Another point is that making a change and running UT tests locally often results in failures. This is mainly because many modules are tested through SQL. For SQL testing, we need to create a mock storage in the test, initialize some schemas with DDL, and start many additional features. This often leads to resource contention on local machines, causing many tests with timeout checks to fail. To solve this, developers again submit PRs and let the internal CI run the tests.

This is known as the tragedy of the commons in systems. To solve this, one way is to truly separate UT and IT, making UT simple and efficient, while SQL tests in UT are run separately on the actual TiDB. Although it’s not as simple as it sounds, we achieved significant results based on this direction. The overall UT test time has dropped to over 10 minutes, and UT can now run completely locally. This is a qualitative improvement in development efficiency.

Indirect Reflection of Code Complexity

Earlier, I mentioned that UT is a good way to verify if a module’s implementation meets its interface definition. So, I have an extreme view: “If a module can’t be well tested with UT, its complexity needs attention.”

If a module is complex, its maintainability will be poor, and any changes to it may introduce more risks, such as more bugs (of course, even simple modules can introduce bugs with changes, which is the fun of software engineering). However, in my view, being able to change a module is still good. If a module is complex, developers may find it hard to change the code, or even dare not change it, turning the module into a “legacy codebase,” severely affecting future development efficiency.

Unfortunately, TiDB’s current code has this issue. Whether it’s the session or the optimizer, it’s hard to test them with simple UT, which is why we need to test with SQL at a higher level. So fundamentally, even after separating UT and IT, we still have a long way to go. Fortunately, everyone has noticed this issue and started improving, such as splitting and refactoring the session to reduce dependencies on it, facilitating future UT tests. These tasks, like the UT and IT separation, will take years of investment, but I believe they will significantly benefit TiDB’s future.

Conclusion

TiDB is already a very complex system, and a good way to handle complexity is “modularization.” To ensure a module meets design expectations, UT is indispensable in my view. This is why I insist on writing real UT in TiDB. Another aspect is that the current code complexity has deterred many non-PingCAP community developers from contributing to TiDB, which I don’t want to see.

Hence, we have recently undertaken a series of refactoring and splitting tasks, hoping to gradually reduce TiDB’s complexity and make it easier to maintain and iterate.

Product Quality Thoughts Series

| username: TiDBer_iLonNMYE | Original post link

Got it.
In my humble opinion, besides UT and IT, there is another layer of testing, which is the Regression Test (let’s call it RT) for the entire version. It checks whether there are any bugs or SQL regressions in the new version. RT testing is different from UT interface testing and IT SQL testing. It can be done using a test system of an internal business system with a certain amount of data, observing the real business load.

| username: shigp_TIDBER | Original post link

Learned a lot, wish TiDB all the best!