Adding an Index to a 7 Million Row Table: ALTER TABLE `database_class`.`table_group` ADD INDEX `idx_id` (`myid`);

translator_bot · June 21, 2024, 3:13am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 在一个700w的表中添加索引 ALTER TABLE database_class.table_group ADD INDEX idx_id (myid);

| username: 天下无贼

Currently, there are 6 TiKV nodes, one of which is faulty and offline. There are also 9 TiDB nodes, with one node offline as well. When adding this index ALTER TABLE database_class.table_group ADD INDEX idx_id (myid);, there has been no response. I am not sure if it is related to the offline nodes or the size of the table. Which logs should I start with to find the error? I haven’t seen any errors related to adding the index in the TiDB or TiKV logs. Seeking guidance from experts.

translator_bot · June 21, 2024, 3:13am

| username: 天下无贼 | Original post link

The error above is from TiDB. Do I need to recover this TiKV or completely remove and exclude it from the cluster in order to add an index?

translator_bot · June 21, 2024, 3:13am

| username: changpeng75 | Original post link

Is 192.168.1.11 the disconnected TiKV node?

translator_bot · June 21, 2024, 3:13am

| username: 小龙虾爱大龙虾 | Original post link

It might be related. What’s the situation with this TiKV node? Without touching this TiKV for now, you can try turning off this parameter: 系统变量 | PingCAP 文档中心.

translator_bot · June 21, 2024, 3:13am

| username: 天下无贼 | Original post link

Yes, I just changed tikv13:20160, which is the IP 192.168.1.11. This tikv node went down.

translator_bot · June 21, 2024, 3:13am

| username: 天下无贼 | Original post link

I’ll give it a try, boss.

translator_bot · June 21, 2024, 3:13am

| username: 天下无贼 | Original post link

700 million records, adding an index keeps running for half an hour without any progress. [2024/02/13 13:45:28.280 +08:00] [INFO] [index.go:620] [“[ddl] run add index job”] [job=“ID:3864, Type:add index, State:running, SchemaState:none, SchemaID:88, TableID:2905, RowCount:0, ArgLen:6, start time: 2024-02-13 13:39:17.254 +0800 CST, Err:[ddl:-1][Lightning:KV:ErrCheckMultiIngest]check multi-ingest support error: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing: dial tcp 192.168.1.11:20160: i/o timeout”, ErrCount:59, SnapshotVersion:0, UniqueWarnings:0”] [indexInfo=“{“id”:6,“idx_name”:{“O”:” idx_id “,“L”:” idx_id “},“tbl_name”:{“O”:”“,“L”:”“},“idx_cols”:[{“name”:{“O”:“user_id”,“L”:“user_id”},“offset”:2,“length”:-1}],“state”:0,“backfill_state”:0,“comment”:”“,“index_type”:1,“is_unique”:false,“is_primary”:false,“is_invisible”:false,“is_global”:false,“mv_index”:false}”] Error, RowCount:0 always shows 0, not a single row has been affected.

translator_bot · June 21, 2024, 3:13am

| username: changpeng75 | Original post link

The usual approach is to scale down this TiKV, at which point it will enter the offline state. In this state, the TiKV will perform leader transfer and region balance. Once both leader_count/region_count show that the transfer or balance is complete, the TiKV will change from Offline to Tombstone. This state indicates that the TiKV is fully offline, and you can safely clean up the TiKV in this state using the remove-tombstone interface. After that, change the IP and then scale up this TiKV again.

translator_bot · June 21, 2024, 3:13am

| username: changpeng75 | Original post link

Is the region on this TiKV currently undergoing a Leader election?

translator_bot · June 21, 2024, 3:13am

| username: 江湖故人 | Original post link

Taking the faulty TiKV node completely offline to the Tombstone state should solve the issue. Confirm that the leader_count and region_count of the faulty node are 0:
select store_id, address, leader_count, region_count from information_schema.tikv_store_status;

translator_bot · June 21, 2024, 3:13am

| username: Jellybean | Original post link

“It seems that I just changed tikv13:20160 to 192.168.1.11, and this tikv node went down.”

May I ask if you need to change the IP? If you need to change the IP, you should scale down the old IP and scale up the new IP. You cannot directly change the IP of the tikv node, otherwise, some abnormal situations may occur.

Additionally, to check the progress of adding an index, you can use admin show ddl jobs to view the execution queue information. It is also possible that other tasks are blocking it. First, check the DDL execution queue situation.

translator_bot · June 21, 2024, 3:13am

| username: 春风十里 | Original post link

ADMIN SHOW DDL JOBS;
Check the status and other information of this index addition?

SELECT * FROM mysql.tidb_mdl_view
Check if there are any metadata locks?

translator_bot · June 21, 2024, 3:13am

| username: 数据库真NB | Original post link

Two approaches?

Take the faulty nodes of TiDB and TiKV offline, then try again.
Check the data distribution of the table you want to add the index to, and see if it’s an issue with the table itself.