My Story with TiDB | Top 1 in TiDB Repository Commits, Over 1000 PRs by Brother Amao

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 我和 TiDB 的故事 | TiDB 仓库 commit TOP 1,PR 数超过 1000 的阿毛哥

| username: tiancaiamao

Hello everyone, I am Mao Kangli, a R&D engineer from PingCAP. My GitHub ID is tiancaiamao, and I currently rank top 1 in the number of commits in the TiDB repository. I have contributed over 1000 PRs across various TiDB-related projects.

Encounter

My first PR in the TiDB repository was on June 21, 2016. Due to work reasons, I joined PingCAP in April 2016. Initially, I worked on TiKV for a while but found writing Rust a bit uncomfortable, so I switched to TiDB to write Go. Personally, I find writing in Go more comfortable. From then on, I continuously contributed PRs to TiDB, and this has been going on for many years.

Choosing to Join PingCAP

I got into Go language relatively early, and there weren’t many companies using Go at that time, PingCAP being one of them. I also noticed TiDB, a “star” project implemented in Go.

A turning point was that there weren’t many active Go language users early on. I had a habit of blogging, mainly about Go at that time. Dongxu noticed me through my blog and invited me for a chat. That’s how the spark ignited, and I joined the company.

Initially, I was in Guangzhou, while PingCAP was in Beijing, so I had some concerns about changing cities. Fortunately, PingCAP’s culture supports remote work, which was very attractive. Looking back now, as a veteran, I’ve been working remotely for many years and am used to it. But at that time, it was quite shocking—very open and free culture, relaxed work atmosphere, focusing on results rather than the process.

Now PingCAP has offices in several major cities across the country, and many colleagues have joined from overseas. A team might be spread across several places, and not everyone can be in the same office. People are used to “online” collaboration—submitting PRs, reviewing PRs, online meetings, etc., which aligns well with the open-source collaboration model.

In my team, we have members in several cities in China, as well as in Europe and North America. This creates a problem of spanning multiple time zones, making it hard to gather everyone for a meeting. The solution is, for example, having a meeting with Europe today and syncing with North America tomorrow. Since we are in different cities, team members might not have met offline. During team-building events, we might find that we’ve reviewed each other’s PRs and discussed issues online but still “don’t know” each other, so we jokingly call it “meeting online friends.”

Talking About TiDB Contributors

TiDB contributors come from various industries and have different identities, but they all share a common identity: TiDB contributor.

For example, there are students. Especially from some top-ranked universities in China, students are very strong, paying attention to open-source projects and contributing PRs even before graduation. When I graduated, I had no concept of open-source projects. Now it’s very competitive; without contributions to open-source projects or internships at big companies, you might not even pass the resume screening. I often joke that I’m glad I joined early; otherwise, if they came in first and then interviewed me, I would be crushed by them.

Regarding students, we sometimes organize activities targeting universities. For example, the TiDB-challenge-program attracts some university students. We release specific small projects, and I have mentored some students. They are really excellent, and the quality of contributors attracted through open-source is often higher than those who submit resumes. So it’s also a good source for recruitment. In the early years, we would send small gifts to external contributors who contributed code to TiDB, like cultural shirts. If we spotted a “wild” contributor wearing our cultural shirt, we would quickly ask for their resume. Internal referrals have rewards, so we call such people “walking iPhones.”

Back to the background of TiDB contributors, besides students, there are also those from academia. One impressive example is Samuel, a foreign university professor. He developed a testing tool called Squirrel, published a paper, and used some fuzz testing methods on TiDB, helping us find many bugs. Later, his PhD student, GitHub ID bajinsheng, continued to delve deeper based on Samuel’s work, creating a tool called sqlancer, which also found many bugs for us. The tool was so powerful and efficient that one person’s combat power exceeded that of a team. During that time, he found more bugs than our entire QA team combined. You can see these records in GitHub issues labeled fuzz/sqlancer. Contributions are not limited to code; helping find bugs and identify issues is also a form of contribution.

Similarly, there’s the TAOBench author, who also published a paper. They built a performance testing tool, comparing TiDB with other database products in their experiments. These research works help improve our product performance, and we later incorporated these results into our benchmark testing workload, running them regularly. Strictly speaking, the TAOBench author is from Facebook, so they should be considered from the industry, but because of the paper, I still categorize them under academia.

As for the industry, contributions from this sector are even greater. Many of our committers come from external companies. To clarify, those who submit PRs to TiDB are called contributors. When a contributor continuously submits PRs and gains enough understanding of a module, we nominate them as a reviewer. After being active enough and making significant contributions to the entire TiDB, we nominate them as a committer. For example, Li Yulai from SpeedCloud contributed the initial implementation of the plan cache, becoming a committer. Another example is Teacher Xiaoguang, who was a committer while at Zhihu. Internally at PingCAP, the process is the same; employees also need to submit PRs step by step to become a committer or maintainer, with no difference between internal employees and external contributors.

Many companies use TiDB and report bugs, some by submitting issues and others by directly submitting PRs. Especially internet companies, they have a “roll up your sleeves and get to work” style. So these users also become our contributors.

From a geographical perspective, contributors come from all over the world, though more are from China. I remember a funny incident where a Russian guy submitted a PR. As I mentioned, we used to send small gifts to contributors, so our colleague contacted him to send something. He was very skeptical, worried we were scammers or trying to steal his personal information. Eventually, we really sent the gift to Russia, and the guy was very surprised and happy to receive it. Although it wasn’t anything valuable, it’s part of our company culture; we always value developers.

The Path of TiDB Code Contribution

Our development activities are on GitHub, open and transparent. Generally, the daily workflow revolves around issues and PRs. For simple scenarios, directly submitting a PR (pull request) is fine. A PR usually needs approval from at least two reviewers before the code can be merged into the master branch.

For more complex scenarios, such as developing a new feature, the process requires documentation, design, and discussion. The content is also posted on GitHub in the form of issues. For example, see the development process of temporary tables.

I want to emphasize that all of this is open to the public, a truly open-source collaboration model.

User Needs

Issues on GitHub are more developer-oriented. For regular users, they can ask questions on AskTUG, a community for user groups.

We collect user feedback, and needs generally come from several sources: high-demand issues from the community, feedback from enterprise customers during usage, and specific capabilities needed by important PoC customers that we haven’t supported yet.

PMs organize these needs, prioritize them, and schedule them. R&D continuously improves the product. The product iterates constantly, with each minor version bringing some features or improvements, following a rapid iteration model.

Periodically, we release a major version, often with significant features or changes. We now have LTS (long-term support) versions for stability, while users wanting to try new features quickly can use non-LTS versions. This ensures both stability and rapid iteration, meeting different user needs.

Gains

I consider myself to have grown alongside TiDB. In the early days, TiDB’s codebase was relatively small, and the team wasn’t divided into fine-grained modules, so everyone was involved in many modules, allowing a comprehensive understanding of the overall architecture. You can see my code submissions in almost every corner, from the parser, transactions, partition tables, DDL, optimizer, executor, etc. (haha). Now the team is more specialized, focusing on their modules, so they might not have the same learning opportunities I had.

Besides the technical aspects, a significant gain is the comprehensive ability as an engineer. This might sound vague, but only through experience can one understand how to maintain such a large open-source project, ensure product quality, and iterate controllably. Collaborating with others to push things forward is also crucial. Big companies might have standardized testing and review processes, but it’s hard to match the standards of a large open-source project. Each PR submission detail reflects the rigor of the work. In open-source projects, you can see who did what in each submission, watched by the world, and any non-standard behavior is publicly shamed.

Open Source Culture

I believe open source must form a community. Just releasing the code and then ignoring it isn’t open source. But some companies like to play this way, throwing their source code on GitHub after losing internal product competition and claiming it’s open source. There’s no follow-up PR or updates, and no one answers questions or resolves issues.

The concept of open source has become popular, and some startups in the basic software track have high PE valuations. Some people ride this wave, treating open source as a gimmick, which is fake open source. Even more extreme, some companies promote activities to boost stars, offering gifts for liking their GitHub repository. Such stars are meaningless for the product itself. Of course, if investors are foolish, it might fool them: “Look, our star growth trend is even faster than TiDB’s.” Treating open source as a KPI won’t succeed.

Different people have different understandings of open-source culture. For example, the GPL license is a “viral” license, requiring code using it to be open-sourced under the GPL. But it’s an effective weapon against technical monopolies and patent barriers. In contrast, a certain top database company, reportedly with more lawyers than programmers, what contributions have they made to technological progress and benefiting everyone? Having the source code means freedom, not closure—free! Free means freedom, not free of charge. There are also licenses like MIT, which waive many software rights, different from GPL but sharing the concept of freedom.

The main forces driving the open-source movement have changed over time. Early on, there were more individual heroism open-source projects. Now, institutions and large companies lead open-source projects, often driven by interests. Whether to embrace open source is closely related to a company’s perception.

If a company believes code is its core asset, open source is a disadvantage, leading to competitors copying its code and surpassing it, then it should choose closed source. If a company believes developers are its core asset, it should build an open community, attracting more contributors, making the community more active and stronger, and thus improving the product. Copying code is easy, but building an active community has high barriers.

Of course, there are behaviors that split communities. For example, if a company thinks since the code is open, why not fork it, modify it, and sell an enterprise version or something. This is entirely possible as long as it doesn’t violate the open-source license. The essence becomes a game of this company against the entire community. If its investment is large enough, its iteration speed faster than the original project, and the product better, users will be attracted to its camp. But this process is costly, requiring the company’s investment to exceed the total investment of the entire community’s other companies and individual contributors.

To prevent splits, the best approach is to make the community more active, with more participants, driven by multiple companies. This makes the cost of splitting the community unaffordable for a single company. Eventually, they either join or fail in opposition. This is how open-source rules work. A typical example is Chromium; the software’s complexity makes it impossible for any company to fork it and start anew. Once a de facto standard is established, the leading company becomes the biggest beneficiary, a strategic move. Another example is Android; only by building a good ecosystem can countless contributors participate, writing drivers, apps, and various things, making the overall ecosystem stronger.

From this perspective, open-source culture reflects people’s perceptions. Ultimately, things that align with productivity development will win. Compared to closed source/fake open source, those who firmly choose open source from the beginning have a much broader vision.

Suggestions

If I were to give advice to those planning to become TiDB contributors? The hardcore way is to read the f*cking source code. While reading the code, if you find any issues, you can discuss them on GitHub or submit PRs for code improvements. Reading code suits programmers with a certain foundation who enjoy this style.

A smoother way is to start with some documentation, understanding the system’s overall architecture. The official blog is a good learning resource. After getting familiar, you can start with small PRs, searching for issues labeled help wanted in TiDB’s issues, which are generally suitable for newcomers. As for how to submit PRs, the project’s README has a section How to contribute.

The key is to get involved, accept feedback, and keep learning. I believe TiDB will provide contributors with great rewards.

| username: Kongdom | Original post link

Kneeling in admiration of the expert~

| username: tidb菜鸟一只 | Original post link

Awesome, I can’t write code, I can barely read it, so I’ll just admire it.

| username: YY-ha | Original post link

First row, worship!

| username: hey-hoho | Original post link

Looking up to the expert! :eyes:

| username: db_user | Original post link

Worship the expert.

| username: ohammer | Original post link

Worship the expert. Do you have any suggestions for operational DBAs?

| username: ShawnYan | Original post link

Respect to the expert!

| username: Ti青涩 | Original post link

Daily worship of the expert!

| username: benmaoer | Original post link

Brother Amao speaks very truthfully.

| username: 会飞的土拨鼠 | Original post link

Respect to the expert, TiDB database is really quite useful now, thank you, expert.

| username: huanglao2002 | Original post link

Respect!