Periodic Inspection Issues

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 阶段性巡检问题

| username: TiDBer_小阿飞

It’s the end of the year, and we need to inspect all databases and produce inspection reports. For TiDB inspections, using Dashboard and Prometheus is sufficient for daily checks, but for inspections at specific stages, does anyone have any inspection scripts to share? Any help would be greatly appreciated!

PS: I found some, but they seem a bit inadequate.

I’m thinking of including, but not limited to, the following major sections:
Basic database information (users, database names, database versions, total size, data file locations)
Cluster hardware information (CPU, disk usage, memory, network)
Status of each module (TiDB, PD, TiKV, TiFlash)
Top 10 largest tables
Instance configuration
Lock status
Thread status
Transaction status
SQL section (Top 10 SQL, slow queries, SQL summary, DDL statements)
Index section (usage, tables without indexes)
High availability status (network load, storage, compute nodes)
Database performance queries (load information, which tables need to be analyzed, backup and recovery)
Error information (cluster logs)
Summary (including optimization suggestions and error handling)

| username: Kongdom | Original post link

:yum: Marking this for later, waiting for the experts~

| username: 最强王者 | Original post link

Waiting for the expert.

| username: 最强王者 | Original post link

I think every aspect is important, such as table heat and instance configuration.

| username: Kongdom | Original post link

The key is to create a script. It’s easy to understand when looking at it directly, but converting it into a script is difficult. :joy:

| username: dba远航 | Original post link

I developed a script similar to Oracle’s AWR for the MySQL series, but it’s not convenient to make it public.

| username: zhang_2023 | Original post link

Awesome, big boss!

| username: DBRE | Original post link

Understood. Please provide the Chinese text you would like translated into English.

| username: zhanggame1 | Original post link

There is a cluster diagnosis feature in the dashboard that can generate a report similar to an Oracle AWR report, collecting cluster configuration and performance-related data.