[TiDBer Chat Session 93] TiDBer Speaks - Providing *** Suggestions for TiDB's Product Development

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【TiDBer 唠嗑茶话会 93】TiDBer 有话说——为 TiDB 的产品发展提 *** 条建议

| username: Billmay表妹

This Topic:

Provide *** Suggestions for TiDB Product Development

After the event, the needs of community members will be summarized and given to the relevant product and research departments.

Requirements:

Do not discuss new requirements, do not discuss new requirements that have not been implemented

Suggestions for TiDB product development, regarding issues that arise due to some detailed problems during the design/development process, leading to technical issues that should not have occurred

You can output according to the following template:

  • Suggestion Name:
  • Reason:
  • Case:
  • Asktug Link:

This post is co-created by TiDB community users. Please do not repeat what has already been listed; you can optimize or list some corresponding cases~

Reference Cases

Suggestion: If new parameters are added to the execution plan in the new version, they should be disabled by default rather than enabled.

Reason:

Unlike MySQL, TiDB generally stores much larger amounts of data compared to MySQL. When upgrading to a new version, some execution plan parameters being enabled by default often catch DBAs off guard, easily causing OOM (Out of Memory). Once OOM occurs, it becomes particularly troublesome to handle, leading to a reluctance to upgrade in the future.

Case:

The default value of tidb_ddl_enable_fast_reorg in v6.5 is ON, but the dependent temp-dir defaults to the /tmp/tidb directory instead of the data-dir, causing many posts reporting DDL errors after the upgrade.

Asktug Link:

Suggestion: New feature parameters in new versions should be disabled by default (e.g., the execution plan cache parameter in version 6.x has an OOM risk when enabled).

Reason:

tidb_enable_prepared_plan_cache was introduced starting from version 6.1, enabling the Prepare and Execute execution plan cache by default. In high TPS write scenarios, the memory of the TiDB compute node rapidly surged to around 20G, causing OOM.

Case:

Production environment case

Asktug Link:

Event Rewards:

Participation Award

Submit product suggestions according to the requirements to receive 100 points reward~
No points for submitting new requirements, chatting, interacting, or suggestions not following the required template.

Event Time:

2023.11.10 - 2023.11.17

The Chat Tea Party is about to reach 100 sessions!

In the 101st session of the Chat Tea Party, the top xxx participants with the most participation will be selected to receive our Ti badge + mystery peripherals.

image

| username: Soysauce520 | Original post link

Suggestion: The “others” type in the OPS view of Grafana monitoring should record details completely.

Reason: Everything outside the code is categorized under “others.” Theoretically, the operations running on TiDB should have detailed records and should not be placed under “others,” leaving people to guess. tidb/parser/ast/ast.go at 533998e5921a8f662c878b60a5d0c9608d52d736 · pingcap/tidb · GitHub

Example: In one monitoring view, the OPS for “others” reached 200, while regular operations like select and update were below 100, making it impossible to see what the cluster was executing.

| username: Kongdom | Original post link

Suggestion: Add a section under the documentation directories [System Variables] and [Configuration File Parameters] to record deprecated system variables or configuration file parameters.

Reason: With new version releases, there may be cases where system variables or configuration file parameters are deprecated. Currently, this information is scattered across each version’s release notes. It is not possible to view all deprecated system variables or configuration file parameters at once.

Example: When upgrading across major versions or multiple minor versions, just looking at the new version’s documentation does not reveal which system variables and configuration file parameters have been deprecated. One has to check each version’s release notes individually, which is time-consuming and labor-intensive.

Asktug link: tidb-server的两个参数是有改变吗? - TiDB 的问答社区

| username: heiwandou | Original post link

  • Suggestion Name: Suggest adding the option to deploy a distributor component (such as HAProxy) in tiup
  • Reason: After deploying the database, it is still necessary to deploy the distributor component. However, the distributor component cannot be monitored and managed like other components, making it impossible to achieve unified management. It is recommended to add the distributor component for unified installation, monitoring, and management.
  • Example: During deployment, you can only choose tidb, tikv, pd, pump, etc., but there is no distributor component, making configuration management prone to errors.
  • Asktug link: haproxy 之后连不上tidb - TiDB 的问答社区
| username: zhanggame1 | Original post link

| username: tidb菜鸟一只 | Original post link

Suggestion: Allow setting the concurrency and sampling rate for automatic statistics collection.

Reason: Currently, automatic collection is single-threaded, making it impossible to complete for large tables, requiring users to manually create scripts for collection. This is not something that should happen in a mature database.

Example: There are many cases in the forum where statistics collection fails due to large tables.

| username: Kongdom | Original post link

See if you can add the related question link from asktug~ :wink:

| username: 像风一样的男子 | Original post link

| username: h5n1 | Original post link

Suggestion: Add a customizable cluster name setting in the top left corner of the dashboard.

Reason: When there are multiple clusters with the same version, it can help better identify which cluster it is.

| username: TiDBer_小阿飞 | Original post link

  • Suggestion Name: TiDB+keepalived Version Issue
  • Suggestion: TiDB should provide a unified access layer and release its own or a tested unified version of the load balancing component.
  • Reason: TiDB’s load balancing strategy uses the HAproxy+keepalived solution. This solution does not have detailed recommendations for version support. The flexibility of external components is a significant risk. There have been issues in the forum caused by load balancing, and problems with the KEEPLIVE version led to packet loss with version 2.0.18, causing business timeouts. This was resolved by switching to the lower version 1.2.8.
  • Case: https://www.yisu.com/zixun/576604.html
  • Asktug Link: 大家都用什么做tidb server的负载均衡? - TiDB 的问答社区
| username: 托马斯滑板鞋 | Original post link

Suggestions: Continuous improvement and optimization of CBO, performance enhancement of TiFlash
Reasons:

  1. The first suggestion comes from the expectations derived from continuous learning through performance comparison tests with competitor products and performance optimization posts by experts on forums.
  2. The second suggestion is based on POC testing, where it was found that there is still a certain gap compared to doris/starrocks (I am thinking of eliminating the data warehouse by horizontally adding TiFlash, thus avoiding the hassle of CDC to data warehouse and simplifying operations). :upside_down_face:
| username: chenhanneu | Original post link

Or it can directly display the current cluster’s name.

| username: Billmay表妹 | Original post link

Also, add the relevant Asktug question links, so the evidence is conclusive hhhh

| username: Billmay表妹 | Original post link

@人如其名 Many of the issues that were detected have actually been resolved in v7.4.0.
You might need to provide more specific feedback, and it would be best to elaborate on a particular point in detail.

| username: xfworld | Original post link

  • Suggestion Name: During tiup installation, allow selection of more Grafana monitoring templates
  • Reason: Tools like DM, BR, Lightning, etc., need to be manually integrated into the same Prometheus setup. It would be beneficial to achieve automated configuration and integration.
  • Example: Monitoring is scattered, requiring a lot of manual processing to meet this need. Can a more convenient method be added?
  • Asktug Links:
    使用 DM 迁移数据 | PingCAP 文档中心
    TiDB Lightning 监控告警 | PingCAP 文档中心
| username: 芮芮是产品 | Original post link

Suggestion: The new version should ideally add TiKV on S3

Because of energy saving and emission reduction
To save money, TiDB can attract more customers

| username: 芮芮是产品 | Original post link

Suggested Name: tiup installation with more Grafana monitoring templates
Reason: Tools like DM, BR, Lightning, etc., need to be manually integrated into the same Prometheus setup. It would be great to achieve automated configuration and integration.
Example: Monitoring is scattered, requiring a lot of manual processing to meet this need. Can a more convenient method be added…

Suggested Name: tikv on s3
Reason: Energy saving and emission reduction
Example: A 30TB MySQL snapshot backup costs 300 RMB per day, and RDS databases also incur significant costs.

| username: 有猫万事足 | Original post link

| username: Billmay表妹 | Original post link

What is the principle that can save money?