# Debunking Myths: Exploring TiDB and Testing the Differences Between OB and TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: # 破除迷信,研究了解tidb,测试ob和tidb之间的不同之处。

| username: tidb狂热爱好者

Breaking Superstitions, Studying and Understanding TiDB

Recently, I’ve been helping my mom manage her blood sugar levels by using a monitor that records her blood sugar every minute. It’s a non-invasive, subcutaneous device, and one transmitter lasts for 15 days. I noticed that many people who buy such continuous glucose monitors are always concerned about the accuracy of the readings.

I also bought a blood pressure monitor for my mother-in-law, who has hypertension. The readings showed 170/130, which is extremely high. Her reaction was to reject and fear the results, claiming the monitor was wrong and questioning why her blood pressure was still so high despite taking medication. She avoids going to the hospital because she knows her blood pressure is abnormal and doesn’t want to spend money. This is a form of superstition. By not checking and treating it, she risks her health, which will cost even more in the long run.

The same applies in the workplace. Many developers don’t understand SQL or databases and find them mysterious. They rely on DBAs to solve problems without getting involved in database management themselves. This behavior is similar to my mother-in-law’s; humans fear the unknown. Our bodies and TiDB are similar systems. Maintaining health requires monitoring weight, diet, blood sugar, and blood pressure. Maintaining database health requires monitoring CPU, memory, read I/O, write I/O, tables with more than six indexes, unused indexes, the largest temporary tables, and read/write permissions. These are straightforward tasks that can help maintain any database system.

However, maintaining TiDB is different. As a distributed database, it requires additional attention to read hotspots, write hotspots, I/O distribution, and network conditions. TiDB provides visual interfaces for these aspects. The convenience of operational tools and visual monitoring interfaces is where OB falls short. A system that’s too complex can burden the DBA’s mental load.

Some drawbacks of OB include:

  1. It has nodes that only log data without replicating it.
  2. It requires careful design of partition keys to distribute data.
  3. It lacks intuitive visual management tools, making it a black box for users.
  4. Recently, it introduced columnar data, row data, column indexes, and row indexes. While this adds data types, it also increases the user’s mental load. For example, if a table has column storage, do you still need a column index? Probably not. A column index is essentially a sub-table of a standard table, accelerating statistics for that column—a new concept.

These are some immature thoughts I’ve had while maintaining database systems. I’ve applied some of these ideas to my work, such as using scripts to manage database permissions.

Here’s a script for managing TiDB passwords:

import sys
import random
import string

database = str(sys.argv[1])
rds = str(sys.argv[2])

r1 = ''.join(random.sample(string.ascii_letters + string.digits, 20))
r2 = ''.join(random.sample(string.ascii_letters + string.digits, 20))

print(f"create database if not exists {database} default character set utf8mb4 default collate utf8mb4_bin;")
print(f"create user '{database}data'@'172.%' identified by '{r1}';")
print(f"grant select, insert, update, delete on {database}.* to '{database}data'@'172.%';")
print(f"grant create, drop, alter, index, create view on {database}.* to '{database}data'@'172.%';")
print(f"create user '{database}opr'@'172.%' identified by '{r2}';")
print(f"grant select, insert, update, delete on {database}.* to '{database}opr'@'172.%';")
print(f"use {database};")
print(f"eval $(op signin); op item get \"{rds}\" --format json | op item edit \"{rds}\" {database}data=\"{r1}\" {database}opr=\"{r2}\";")

This script, combined with a password recording tool, manages passwords.

To query the largest tables:

SELECT table_schema, table_name, table_rows, TRUNCATE(data_length/1024/1024/1024, 2) AS 'table_GB', TRUNCATE(index_length/1024/1024, 2) AS 'index_GB' 
FROM information_schema.tables 
ORDER BY data_length DESC, index_length DESC 
LIMIT 50;

To write an automated script to inspect the production system for the most time-consuming SQL queries without indexes:

SELECT FLOOR(UNIX_TIMESTAMP(MIN(summary_begin_time))) AS agg_begin_time, 
       FLOOR(UNIX_TIMESTAMP(MAX(summary_end_time))) AS agg_end_time, 
       ANY_VALUE(digest_text) AS agg_digest_text, 
       ANY_VALUE(digest) AS agg_digest, 
       SUM(exec_count) AS agg_exec_count, 
       SUM(sum_latency) AS agg_sum_latency, 
       MAX(max_latency) AS agg_max_latency, 
       MIN(min_latency) AS agg_min_latency, 
       CAST(SUM(exec_count * avg_latency) / SUM(exec_count) AS SIGNED) AS agg_avg_latency, 
       CAST(SUM(exec_count * avg_mem) / SUM(exec_count) AS SIGNED) AS agg_avg_mem, 
       MAX(max_mem) AS agg_max_mem, 
       ANY_VALUE(schema_name) AS agg_schema_name, 
       ANY_VALUE(plan_digest) AS agg_plan_digest, 
       query_sample_text, 
       index_names 
FROM `INFORMATION_SCHEMA`.`CLUSTER_STATEMENTS_SUMMARY_HISTORY` 
WHERE index_names IS NULL AND query_sample_text > '' 
GROUP BY schema_name, digest 
ORDER BY agg_sum_latency DESC 
LIMIT 10;

This SQL query identifies the slowest queries without indexes.

To find the most time-consuming SQL queries executed frequently:

SELECT FLOOR(UNIX_TIMESTAMP(MIN(summary_begin_time))) AS agg_begin_time, 
       FLOOR(UNIX_TIMESTAMP(MAX(summary_end_time))) AS agg_end_time, 
       ANY_VALUE(digest_text) AS agg_digest_text, 
       ANY_VALUE(digest) AS agg_digest, 
       SUM(exec_count) AS agg_exec_count, 
       SUM(sum_latency) AS agg_sum_latency, 
       MAX(max_latency) AS agg_max_latency, 
       MIN(min_latency) AS agg_min_latency, 
       CAST(SUM(exec_count * avg_latency) / SUM(exec_count) AS SIGNED) AS agg_avg_latency, 
       CAST(SUM(exec_count * avg_mem) / SUM(exec_count) AS SIGNED) AS agg_avg_mem, 
       MAX(max_mem) AS agg_max_mem, 
       ANY_VALUE(schema_name) AS agg_schema_name, 
       ANY_VALUE(plan_digest) AS agg_plan_digest, 
       query_sample_text, 
       index_names 
FROM `INFORMATION_SCHEMA`.`CLUSTER_STATEMENTS_SUMMARY_HISTORY` 
GROUP BY schema_name, digest 
ORDER BY agg_sum_latency DESC 
LIMIT 10;

To identify the slowest SQL query from the past hour for developers to optimize:

SELECT query_time, query 
FROM information_schema.CLUSTER_SLOW_QUERY 
WHERE is_internal = false AND Time > DATE_ADD(NOW(), INTERVAL -1 HOUR) AND user <> 'root' AND query_time > 7 
ORDER BY query_time DESC 
LIMIT 1;

I enjoy reading and believe that learning is a lifelong journey. If you don’t change yourself, you can’t progress. You’ll end up complaining about society’s unfairness, low income, and difficult life. Your work and income are the results of your own efforts and struggles.

| username: TiDBer_q2eTrp5h | Original post link

Worth learning, a very positive learning attitude.