[Soul Searching] How to Determine if Issues Faced by Community Members Deploying TiDB with Lower-than-Recommended Configurations are Due to Insufficient Resources?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【灵魂拷问】在社区中经常遇到一些低于官方要求配置部署 TiDB 的小伙伴,如何判断他们的问题是否是资源不够引起的?

| username: Billmay表妹

In the community, we often encounter some friends who deploy TiDB with configurations lower than the official requirements.

Question 1:

How to determine if their issues are caused by insufficient resources?

Are there any key errors or issues that can clearly indicate that the problem is due to insufficient memory or resources?

Question 2:

If the configuration is lower than the official requirements, should we first guide them to upgrade their resources according to the official requirements to see if the issue persists?

Question 3:

If there are issues clearly caused by insufficient resources, but the friends do not have additional resources to allocate, how should we guide them in this situation?

| username: tidb菜鸟一只 | Original post link

Let’s look at the overview first. If there are issues with system resource monitoring, we definitely need to check if the system resources are sufficient. Actually, with databases, when you have enough resources, you can just throw resources at it and even the worst SQL will run. If resources are insufficient, you can optimize some parameters. If that still doesn’t work, you need to start optimizing from the SQL or application level, gradually improving from small aspects so that the database can run well.

| username: hecomlilong | Original post link

I think it still depends on TiDB’s positioning. In my personal opinion, 90% of customers in China are still small to medium-sized businesses. Frankly speaking, deploying TiDB requires a certain resource threshold. The situation you mentioned exists because the company didn’t plan to invest resources and just wanted to try out TiDB. As a result, due to insufficient resources, the customer experience was poor. So, isn’t it easier for small to medium-sized customers to use MySQL or PostgreSQL on a single machine directly? Why try TiDB then? The reason is that TiDB is an open company that advocates technology, and its headquarters is in China. They also want to experience the benefits of a distributed database. The more such issues arise, the better it is for TiDB.

Therefore, if TiDB wants to fundamentally solve this problem, it needs to modularize its functions and launch simplified, standard, and flagship versions. Suppose small to medium-sized customers want to experience TiDB but don’t want to invest a lot of resources initially; they can try installing the simplified version. Of course, the simplified version must not be so resource-dependent but still provide a good experience of TiDB’s core distributed functions, albeit with limitations on data scale. Customers should be clearly informed of these limitations. This way, I think it can avoid such problems to the greatest extent. The resource configuration corresponding to the data scale should provide a reasonable experience.

| username: 数据小黑 | Original post link

From personal experience, aside from the relatively consistent issue of OOM, resource insufficiency can lead to various problems. Broadly speaking, there are several progressive measures:

  1. When aware of resource insufficiency, try restarting and see if the issue is resolved.
  2. If restarting doesn’t solve the problem, in a virtualized or cloud environment, temporarily supplement resources to check if the issue is resolved.
  3. If the above steps don’t work, it can be considered that the impact of resources is minimal, and other solutions should be attempted.

Resource insufficiency is a dialectical issue. Even with the official recommended configuration, if the data volume being processed is relatively large or the SQL is complex for the current resources, it can still be understood as resource insufficiency. In such cases, priority should be given to splitting the data or optimizing the SQL.

| username: ohammer | Original post link

Create an official best practices guide.

| username: TiDBer_Shuai | Original post link

Why doesn’t the official resource configuration requirements include recommendations for virtualized environments, such as recommended pCPU models, suitable pCPU to vCPU ratios, and recommended vCPU allocations for each component?

| username: xfworld | Original post link

Question 1:
Resources and usage scenarios need to be matched. Low-end resources can still handle suitable scenarios (@Kongdom has a lot of practical experience). Additionally, having an accurate description of performance requirements for processing capabilities will make the definition of resources more precise.
For example:

  • Requirement to handle 2000 data writes per second, with each data entry being approximately 50 KB.
  • Requirement to support 100 concurrent queries, with the data volume for each query being around 20 million… Query latency can be up to 2 seconds.

For issues related to insufficient resources, you can refer to the general chapter on troubleshooting slow reads and writes. By using the dashboard or Prometheus to obtain relevant metric parameters for comparison, it is easy to observe the problem.

Question 2:
By refining the scenario and requirement descriptions, you can refer to the preset parameters provided by the official documentation. After adjusting some resource configuration parameters, you can proceed with the POC (Proof of Concept). For issues encountered during the POC, you need to troubleshoot and summarize them. Refer to the troubleshooting process mentioned in Question 1 to easily determine whether the issue is caused by insufficient resources.

Question 3:
A mismatch between scenarios and resources will result in not achieving the desired outcome. This needs to be confirmed through the POC, and data should be used to speak for itself. Additionally, collecting and summarizing these issues can help the community and TiDB provide relevant references for improvement and optimization directions.
@Billmay

These are my views on these issues :upside_down_face: (a bit verbose :rofl:)

| username: Billmay表妹 | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.