Slow Joint Query in TiDB Cluster

translator_bot · June 23, 2024, 1:20am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB集群联合查询慢

| username: TiDBer_ps8e9OT5

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.0
[Encountered Problem] TiDB cluster joint query is sometimes fast and sometimes slow
[Reproduction Path] Direct SQL query
[Problem Phenomenon and Impact]
When using a single table query, the response is normal:
SELECT infoid FROM push WHERE userid = 188902 ORDER BY id DESC LIMIT 5
When using a joint database and table query, the response time is between 2 seconds and 60 seconds
SELECT bsp.infoid, dfb.pname FROM base.push bsp
LEFT JOIN global.info dfb ON bsp.infoid = dfb.infoid
WHERE userid = 188902 AND dfb.toptype_code = ‘06’ ORDER BY bsp.date DESC LIMIT 5

Database base table push has 1.3 billion records
CREATE TABLE push (
id bigint(20) NOT NULL AUTO_INCREMENT,
userid int(11) NOT NULL COMMENT ‘User ID’,
infoid varchar(255) COLLATE utf8mb4_general_ci NOT NULL COMMENT ‘Record ID’,
keys varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL,
date bigint(20) NOT NULL,
isvisit tinyint(4) DEFAULT NULL COMMENT ‘Visited; 0: No 1: Yes’,
isv tinyint(4) DEFAULT NULL COMMENT,
type tinyint(4) DEFAULT NULL COMMENT,
PRIMARY KEY (id),
KEY userid (userid),
KEY date (date),
KEY infoid (infoid)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci

Database global table info has 150 million records
CREATE TABLE dws_f_bid_baseinfo (
id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
infoid varchar(32) NOT NULL COMMENT ‘Data ID’,
a_code varchar(10) DEFAULT NULL,
c_code varchar(10) DEFAULT NULL,
d_code varchar(10) DEFAULT NULL,
yx decimal(15,4) DEFAULT NULL,
zb decimal(15,4) DEFAULT NULL,
zk decimal(10,4) DEFAULT NULL,
names varchar(500) DEFAULT NULL,
toptype_code varchar(15) DEFAULT NULL,
s_code varchar(15) DEFAULT NULL,
pn varchar(500) DEFAULT NULL,
pc varchar(100) DEFAULT NULL,
be_code varchar(15) DEFAULT NULL,
ptime datetime DEFAULT NULL,
ctime datetime NOT NULL,
botime datetime DEFAULT NULL,
isvf tinyint(1) NOT NULL DEFAULT ‘0’,
ps_id varchar(100) DEFAULT NULL,
d_id varchar(45) DEFAULT NULL,
u varchar(5000) DEFAULT NULL,
p varchar(2000) DEFAULT NULL,
mp tinyint(1) NOT NULL DEFAULT ‘0’,
s1 varchar(100) DEFAULT NULL,
uptime datetime DEFAULT NULL,
crtime datetime DEFAULT NULL,
br_id varchar(32) DEFAULT NULL,
ag_id varchar(32) DEFAULT NULL,
PRIMARY KEY (id) /T![clustered_index] NONCLUSTERED/,
KEY infoid (infoid)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

[Attachment]
Machine 8*32G, KV storage is SSD disk

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

translator_bot · June 23, 2024, 1:20am

| username: h5n1 | Original post link

Please upload the explain analyze results of the join query.

translator_bot · June 23, 2024, 1:20am

| username: TiDBer_ps8e9OT5 | Original post link

The load on the DB node is very high, with CPU and memory usage consistently above 90%.

translator_bot · June 23, 2024, 1:20am

| username: h5n1 | Original post link

The index on the date column for the bsp table is not being used correctly. Use SPM to temporarily bind it. It should use the index on the userid column. First, check if the execution plan is correct after adding the hint, and then bind it.

CREATE GLOBAL BINDING
FOR 
SELECT bsp.infoid, dfb.pname 
FROM base.push bsp
LEFT JOIN global.info dfb ON bsp.infoid = dfb.infoid
WHERE userid = 188902 AND dfb.toptype_code = '06' 
ORDER BY bsp.date DESC 
LIMIT 5
USING
SELECT /*+ use_index(bsp, `userid`)*/ bsp.infoid, dfb.pname 
FROM base.push bsp
LEFT JOIN global.info dfb ON bsp.infoid = dfb.infoid
WHERE userid = 188902 AND dfb.toptype_code = '06' 
ORDER BY bsp.date DESC 
LIMIT 5;

translator_bot · June 23, 2024, 1:20am

| username: TiDBer_ps8e9OT5 | Original post link

Okay, I’ll give it a try.

translator_bot · June 23, 2024, 1:20am

| username: 近墨者zyl | Original post link

Learning, learning.

translator_bot · June 23, 2024, 1:20am

| username: h5n1 | Original post link

Try using TiFlash.

translator_bot · June 23, 2024, 1:20am

| username: TiDBer_ps8e9OT5 | Original post link

Well, is the pressure of this scale of data on TiDB or TiKV now? Is there any use in expanding the DB nodes?

translator_bot · June 23, 2024, 1:20am

| username: h5n1 | Original post link

High CPU utilization in TiDB can have an impact.

translator_bot · June 23, 2024, 1:20am

| username: TiDBer_ps8e9OT5 | Original post link

Now there are two DB nodes. One of the high-usage nodes is also deployed with PD, Grafana, and Alertmanager, while the other DB node only has DB and PD. Expanding another DB now probably won’t help much, right?

translator_bot · June 23, 2024, 1:20am

| username: h5n1 | Original post link

You mentioned earlier that the DB CPU reached 90%. First, check if there are any SQL queries that can be optimized. If there’s nothing to optimize, then consider scaling.

translator_bot · June 23, 2024, 1:20am

| username: TiDBer_ps8e9OT5 | Original post link

Well, thank you very much.
There shouldn’t be any issues with the cluster deployment, right?

translator_bot · June 23, 2024, 1:20am

| username: 人如其名 | Original post link

Brother, post the explain analyze information for both the fast and slow queries in the slow query log, otherwise, we won’t know anything without the information. Or at least post an execution plan to take a look.

translator_bot · June 23, 2024, 1:20am

| username: TiDBer_ps8e9OT5 | Original post link

Thank you, the issue has been resolved. Due to the large amount of data and high concurrency, it is not suitable for join queries. Creating a wide table will suffice.

translator_bot · June 23, 2024, 1:20am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.