Unable to add new index, TiDB log indicates "cannot get disk capacity at /tmp/tidb/tmp_ddl-4000: no such file or directory"

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 新增索引加不上,tidb日志提示cannot get disk capacity at /tmp/tidb/tmp_ddl-4000: no such file or directory

| username: TiDBer_EBJSNMUw

[TiDB Usage Environment] Production Environment / Test / Poc
Production Environment
[TiDB Version]
v7.1.0
[Reproduction Path] What operations were performed to encounter the issue
Added an index
[Encountered Issue: Problem Phenomenon and Impact]
Unable to add
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
global:
user: tidb
ssh_port: 22
ssh_type: builtin
deploy_dir: /tidb-deploy
data_dir: /tidb-data
resource_control:
memory_limit: 8G
os: linux
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
deploy_dir: /tidb-deploy/monitor-9100
data_dir: /tidb-data/monitor-9100
log_dir: /tidb-deploy/monitor-9100/log
server_configs:
tidb:
log.slow-threshold: 300
tikv:
memory-usage-limit: 6G
readpool.coprocessor.use-unified-pool: true
readpool.storage.use-unified-pool: false
storage.block-cache.capacity: 2G
pd:
replication.enable-placement-rules: true
replication.location-labels:
- host
tidb_dashboard: {}
tiflash:
logger.level: info
tiflash-learner: {}
pump: {}
drainer: {}
cdc: {}
kvcdc: {}
grafana: {}
tidb_servers:

  • host: 192.168.0.148
    ssh_port: 22
    port: 4000
    status_port: 10080
    deploy_dir: /tidb-deploy/tidb-4000
    log_dir: /tidb-deploy/tidb-4000/log
    arch: amd64
    os: linux
    tikv_servers:
  • host: 192.168.0.148
    ssh_port: 22
    port: 20160
    status_port: 20180
    deploy_dir: /tidb-deploy/tikv-20160
    data_dir: /tidb-data/tikv-20160
    log_dir: /tidb-deploy/tikv-20160/log
    config:
    server.labels:
    host: logic-host-1
    arch: amd64
    os: linux
    tiflash_servers:
  • host: 192.168.0.148
    ssh_port: 22
    tcp_port: 9000
    http_port: 8123
    flash_service_port: 3930
    flash_proxy_port: 20170
    flash_proxy_status_port: 20292
    metrics_port: 8234
    deploy_dir: /tidb-deploy/tiflash-9000
    data_dir: /tidb-data/tiflash-9000
    log_dir: /tidb-deploy/tiflash-9000/log
    arch: amd64
    os: linux
    pd_servers:
  • host: 192.168.0.148
    ssh_port: 22
    name: pd-192.168.0.148-2379
    client_port: 2379
    peer_port: 2380
    deploy_dir: /tidb-deploy/pd-2379
    data_dir: /tidb-data/pd-2379
    log_dir: /tidb-deploy/pd-2379/log
    arch: amd64
    os: linux
    monitoring_servers:
    Resource Usage
    image

[Attachment: Screenshot/Log/Monitoring]
tidb.log:
[2023/07/18 16:55:52.747 +08:00] [WARN] [session.go:2197] [“compile SQL failed”] [conn=984320192580431043] [error=“[schema:1146]Table ‘supeng.resource_lock’ doesn’t exist”] [SQL=“SELECT COUNT(1) FROM resource_lock where 1!=1”]
[2023/07/18 16:55:52.747 +08:00] [INFO] [conn.go:1184] [“command dispatched failed”] [conn=984320192580431043] [connInfo=“id:984320192580431043, addr:192.168.0.148:59418 status:10, collation:utf8mb4_general_ci, user:root”] [command=Query] [status=“inTxn:0, autocommit:1”] [sql=“SELECT COUNT(1) FROM resource_lock where 1!=1”] [txn_mode=PESSIMISTIC] [timestamp=0] [err=“[schema:1146]Table ‘supeng.resource_lock’ doesn’t exist”]
[2023/07/18 16:55:52.861 +08:00] [INFO] [session.go:3852] [“CRUCIAL OPERATION”] [conn=984320192580431043] [schemaVersion=93] [cur_db=supeng] [sql=“CREATE TABLE resource_lock(\nid BIGINT(64) AUTO_INCREMENT,\nresource VARCHAR(128),\nip VARCHAR(128),\ncreateTime DATETIME,\nupdateTime DATETIME,\nauthorId BIGINT(64),\nisDeleted BOOLEAN DEFAULT 0,\nPRIMARY KEY (id)\n) ENGINE=InnoDB CHARSET=utf8”] [user=root@%]
[2023/07/18 16:55:52.881 +08:00] [INFO] [ddl_worker.go:238] [“[ddl] add DDL jobs”] [“batch count”=1] [jobs=“ID:127, Type:create table, State:queueing, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:2, start time: 2023-07-18 16:55:52.842 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0; “] [table=true]
[2023/07/18 16:55:52.881 +08:00] [INFO] [ddl.go:1056] [”[ddl] start DDL job”] [job=“ID:127, Type:create table, State:queueing, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:2, start time: 2023-07-18 16:55:52.842 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0”] [query=“CREATE TABLE resource_lock(\nid BIGINT(64) AUTO_INCREMENT,\nresource VARCHAR(128),\nip VARCHAR(128),\ncreateTime DATETIME,\nupdateTime DATETIME,\nauthorId BIGINT(64),\nisDeleted BOOLEAN DEFAULT 0,\nPRIMARY KEY (id)\n) ENGINE=InnoDB CHARSET=utf8”]
[2023/07/18 16:55:52.891 +08:00] [INFO] [ddl_worker.go:980] [“[ddl] run DDL job”] [worker=“worker 1, tp general”] [job=“ID:127, Type:create table, State:queueing, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:0, start time: 2023-07-18 16:55:52.842 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0”]
[2023/07/18 16:55:52.915 +08:00] [INFO] [domain.go:240] [“diff load InfoSchema success”] [currentSchemaVersion=93] [neededSchemaVersion=94] [“start time”=1.446636ms] [gotSchemaVersion=94] [phyTblIDs=“[126]”] [actionTypes=“[3]”]
[2023/07/18 16:55:52.919 +08:00] [INFO] [domain.go:833] [“mdl gets lock, update to owner”] [jobID=127] [version=94]
[2023/07/18 16:55:52.957 +08:00] [INFO] [ddl_worker.go:1204] [“[ddl] wait latest schema version changed(get the metadata lock if tidb_enable_metadata_lock is true)”] [ver=94] [“take time”=53.013379ms] [job=“ID:127, Type:create table, State:done, SchemaState:public, SchemaID:88, TableID:126, RowCount:0, ArgLen:2, start time: 2023-07-18 16:55:52.842 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0”]
[2023/07/18 16:55:52.971 +08:00] [INFO] [ddl_worker.go:601] [“[ddl] finish DDL job”] [worker=“worker 1, tp general”] [job=“ID:127, Type:create table, State:synced, SchemaState:public, SchemaID:88, TableID:126, RowCount:0, ArgLen:0, start time: 2023-07-18 16:55:52.842 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0”]
[2023/07/18 16:55:52.979 +08:00] [INFO] [ddl.go:1158] [“[ddl] DDL job is finished”] [jobID=127]
[2023/07/18 16:55:52.979 +08:00] [INFO] [callback.go:128] [“performing DDL change, must reload”]
[2023/07/18 16:55:52.979 +08:00] [INFO] [split_region.go:85] [“split batch regions request”] [“split key count”=1] [“batch count”=1] [“first batch, region ID”=14] [“first split key”=74800000000000007e]
[2023/07/18 16:55:52.981 +08:00] [INFO] [session.go:3852] [“CRUCIAL OPERATION”] [conn=984320192580431043] [schemaVersion=94] [cur_db=supeng] [sql=“Create UNIQUE Index UX_resource_lock_resource ON resource_lock(resource)”] [user=root@%]
[2023/07/18 16:55:52.984 +08:00] [INFO] [split_region.go:187] [“batch split regions complete”] [“batch region ID”=14] [“first at”=74800000000000007e] [“first new region left”=“{Id:1025 StartKey:7480000000000000ff7b00000000000000f8 EndKey:7480000000000000ff7e00000000000000f8 RegionEpoch:{ConfVer:1 Version:65} Peers:[id:1026 store_id:1 ] EncryptionMeta: IsInFlashback:false FlashbackStartTs:0}”] [“new region count”=1]
[2023/07/18 16:55:52.984 +08:00] [INFO] [split_region.go:236] [“split regions complete”] [“region count”=1] [“region IDs”=“[1025]”]
[2023/07/18 16:55:52.988 +08:00] [INFO] [ddl_worker.go:238] [“[ddl] add DDL jobs”] [“batch count”=1] [jobs=“ID:128, Type:add index, State:queueing, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:6, start time: 2023-07-18 16:55:52.941 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0; “] [table=true]
[2023/07/18 16:55:52.988 +08:00] [INFO] [ddl.go:1056] [”[ddl] start DDL job”] [job=“ID:128, Type:add index, State:queueing, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:6, start time: 2023-07-18 16:55:52.941 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0”] [query=“Create UNIQUE Index UX_resource_lock_resource ON resource_lock(resource)”]
[2023/07/18 16:55:52.998 +08:00] [INFO] [ddl_worker.go:980] [“[ddl] run DDL job”] [worker=“worker 3, tp add index”] [job=“ID:128, Type:add index, State:queueing, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:0, start time: 2023-07-18 16:55:52.941 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0”]
[2023/07/18 16:55:52.998 +08:00] [INFO] [index.go:620] [“[ddl] run add index job”] [job=“ID:128, Type:add index, State:running, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:6, start time: 2023-07-18 16:55:52.941 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0”] [indexInfo=“{"id":1,"idx_name":{"O":"UX_resource_lock_resource","L":"ux_resource_lock_resource"},"tbl_name":{"O":"","L":""},"idx_cols":[{"name":{"O":"resource","L":"resource"},"offset":1,"length":-1}],"state":0,"backfill_state":0,"comment":"","index_type":1,"is_unique":true,"is_primary":false,"is_invisible":false,"is_global":false,"mv_index":false}”]
[2023/07/18 16:55:53.002 +08:00] [INFO] [backend_mgr.go:74] [“[ddl-ingest] ingest backfill is not available”] [error=“cannot get disk capacity at /tmp/tidb/tmp_ddl-4000: no such file or directory”]
[2023/07/18 16:55:53.002 +08:00] [ERROR] [ddl_worker.go:942] [“[ddl] run DDL job error”] [worker=“worker 3, tp add index”] [error=“cannot get disk capacity at /tmp/tidb/tmp_ddl-4000: no such file or directory”]
[2023/07/18 16:55:53.013 +08:00] [INFO] [ddl_worker.go:825] [“[ddl] run DDL job failed, sleeps a while then retries it.”] [worker=“worker 3, tp add index”] [waitTime=1s] [error=“cannot get disk capacity at /tmp/tidb/tmp_ddl-4000: no such file or directory”]
[2023/07/18 16:55:54.013 +08:00] [INFO] [ddl_worker.go:1184] [“[ddl] schema version doesn’t change”]
[2023/07/18 16:55:54.019 +08:00] [INFO] [ddl_worker.go:980] [“[ddl] run DDL job”] [worker=“worker 2, tp add index”] [job=“ID:128, Type:add index, State:running, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:0, start time: 2023-07-18 16:55:52.941 +0800 CST, Err:[ddl:-1]cannot get disk capacity at /tmp/tidb/tmp_ddl-4000: no such file or directory, ErrCount:1, SnapshotVersion:0, UniqueWarnings:0”]
[2023/07/18 16:55:54.020 +08:00] [INFO] [index.go:620] [“[ddl] run add index job”] [job=“ID:128, Type:add index, State:running, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:6, start time: 2023-07-18 16:55:52.941 +0800 CST, Err:[ddl:-1]cannot get disk capacity at /tmp/tidb/tmp_ddl-4000: no such file or directory, ErrCount:1, SnapshotVersion:0, UniqueWarnings:0”] [indexInfo=“{"id":1,"idx_name":{"O":"UX_resource_lock_resource","L":"ux_resource_lock_resource"},"tbl_name":{"O":"","L":""},"idx_cols":[{"name":{"O":"resource","L":"resource"},"offset":1,"length":-1}],"state":0,"backfill_state":0,"comment":"","index_type":1,"is_unique":true,"is_primary":false,"is_invisible":false,"is_global":false,"mv_index":false}”]

tipd.log:
[2023/07/18 16:55:52.980 +08:00] [INFO] [cluster_worker.go:145] [“alloc ids for region split”] [region-id=1025] [peer-ids=“[1026]”]
[2023/07/18 16:55:52.984 +08:00] [INFO] [region.go:679] [“region Version changed”] [region-id=14] [detail=“StartKey Changed:{7480000000000000FF7B00000000000000F8} → {7480000000000000FF7E00000000000000F8}, EndKey:{748000FFFFFFFFFFFFF900000000000000F8}”] [old-version=64] [new-version=65]
[2023/07/18 16:55:52.984 +08:00] [INFO] [cluster_worker.go:237] [“region batch split, generate new regions”] [region-id=14] [origin="id:1025 start_key:"7480000000000000FF7B00000000000000F8" end_key

| username: tidb菜鸟一只 | Original post link

It seems that I have seen a similar error on the forum. You can manually create the corresponding directory /tmp/tidb/tmp_ddl-4000.

| username: 啦啦啦啦啦 | Original post link

Did you upgrade from a lower version? Try manually creating this directory.

| username: TiDBer_EBJSNMUw | Original post link

After creating, another error was reported.
[2023/07/19 15:36:25.517 +08:00] [INFO] [session.go:3852] [“CRUCIAL OPERATION”] [conn=984320192580431773] [schemaVersion=94] [cur_db=supeng] [sql=“Create UNIQUE Index UX_resource_lock_resource ON resource_lock(resource)”] [user=root@%]
[2023/07/19 15:36:25.527 +08:00] [INFO] [ddl_worker.go:238] [“[ddl] add DDL jobs”] [“batch count”=1] [jobs=“ID:133, Type:add index, State:queueing, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:6, start time: 2023-07-19 15:36:25.492 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0; “] [table=true]
[2023/07/19 15:36:25.527 +08:00] [INFO] [ddl.go:1056] [”[ddl] start DDL job”] [job=“ID:133, Type:add index, State:queueing, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:6, start time: 2023-07-19 15:36:25.492 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0”] [query=“Create UNIQUE Index UX_resource_lock_resource ON resource_lock(resource)”]
[2023/07/19 15:36:25.537 +08:00] [INFO] [ddl_worker.go:980] [“[ddl] run DDL job”] [worker=“worker 2, tp add index”] [job=“ID:133, Type:add index, State:queueing, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:0, start time: 2023-07-19 15:36:25.492 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0”]
[2023/07/19 15:36:25.538 +08:00] [INFO] [index.go:620] [“[ddl] run add index job”] [job=“ID:133, Type:add index, State:running, SchemaState:none, SchemaID:88, TableID:126, RowCount:0, ArgLen:6, start time: 2023-07-19 15:36:25.492 +0800 CST, Err:, ErrCount:0, SnapshotVersion:0, UniqueWarnings:0”] [indexInfo=“{"id":1,"idx_name":{"O":"UX_resource_lock_resource","L":"ux_resource_lock_resource"},"tbl_name":{"O":"","L":""},"idx_cols":[{"name":{"O":"resource","L":"resource"},"offset":1,"length":-1}],"state":0,"backfill_state":0,"comment":"","index_type":1,"is_unique":true,"is_primary":false,"is_invisible":false,"is_global":false,"mv_index":false}”]
[2023/07/19 15:36:25.538 +08:00] [INFO] [config.go:109] [“[ddl-ingest] initial memory setting for ingest”] [“local writer memory cache size”=134217728] [“engine memory cache size”=536870912] [“range concurrency”=16]
[2023/07/19 15:36:25.538 +08:00] [INFO] [config.go:124] [“[ddl-ingest] change memory setting for ingest”] [“local writer memory cache size”=67108864] [“engine memory cache size”=268435456] [“range concurrency”=16]
[2023/07/19 15:36:25.538 +08:00] [INFO] [backend_mgr.go:120] [“[ddl-ingest] create local backend for adding index”] [keyspaceName=]
[2023/07/19 15:36:25.539 +08:00] [INFO] [client.go:311] [“[pd] create pd client with endpoints and keyspace”] [pd-address=“[192.168.0.148:2379]”] [keyspace-id=0]
[2023/07/19 15:36:25.540 +08:00] [INFO] [pd_service_discovery.go:543] [“[pd] switch leader”] [new-leader=http://192.168.0.148:2379] [old-leader=]
[2023/07/19 15:36:25.540 +08:00] [INFO] [pd_service_discovery.go:175] [“[pd] init cluster id”] [cluster-id=7242901540266953513]
[2023/07/19 15:36:25.540 +08:00] [INFO] [client.go:386] [“[pd] changing service mode”] [old-mode=UNKNOWN_SVC_MODE] [new-mode=PD_SVC_MODE]
[2023/07/19 15:36:25.540 +08:00] [INFO] [tso_client.go:230] [“[tso] switch dc tso allocator serving address”] [dc-location=global] [new-address=http://192.168.0.148:2379]
[2023/07/19 15:36:25.541 +08:00] [INFO] [tso_dispatcher.go:290] [“[tso] tso dispatcher created”] [dc-location=global]
[2023/07/19 15:36:25.541 +08:00] [INFO] [client.go:428] [“[pd] service mode changed”] [old-mode=PD_SVC_MODE] [new-mode=PD_SVC_MODE]
[2023/07/19 15:36:25.541 +08:00] [ERROR] [backend_mgr.go:96] [“[ddl-ingest] build ingest backend failed”] [“job ID”=133] [error=“[Lightning:Config:ErrInvalidSortedKVDir]invalid sorted-kv-dir ‘/tmp/tidb/tmp_ddl-4000/133’ for local backend, please change the config or delete the path: mkdir /tmp/tidb/tmp_ddl-4000/133: permission denied”]
[2023/07/19 15:36:25.541 +08:00] [ERROR] [ddl_worker.go:942] [“[ddl] run DDL job error”] [worker=“worker 2, tp add index”] [error=“[Lightning:Config:ErrInvalidSortedKVDir]invalid sorted-kv-dir ‘/tmp/tidb/tmp_ddl-4000/133’ for local backend, please change the config or delete the path: mkdir /tmp/tidb/tmp_ddl-4000/133: permission denied”]

| username: 啦啦啦啦啦 | Original post link

Insufficient permissions, assign the directory group to the tidb user.

| username: Kongdom | Original post link

Indeed, I have encountered it once myself.

| username: TiDBer_EBJSNMUw | Original post link

Got it, thanks a lot.

| username: zhanggame1 | Original post link

Small issue, I’ve seen it several times on the forum. If there’s no directory, just add one; if there is, just change the permissions.

| username: WalterWj | Original post link

If the disk performance on the TiDB server is average, you can disable the related feature for fast DDL.

| username: 天下无贼 | Original post link

Grant permissions to that directory. Should it be done on TiKV or TiDB? Also, there is no tidb-4000 directory under /tmp on both TiDB and TiKV. Should it be created on all TiKV hosts, all TiDB instance hosts, or on all TiDB and TiKV hosts?

| username: 天下无贼 | Original post link

/tmp/tidb/tmp_ddl-4000 This directory does not exist.

| username: 啦啦啦啦啦 | Original post link

Create it under all TiDB instances.

| username: 天下无贼 | Original post link

Create /tmp/tidb/tmp_ddl-4000 on the host of all TiDB instances, then chown -R root:tidb /tmp/tidb/

| username: 天下无贼 | Original post link

Is this correct, boss?

| username: 啦啦啦啦啦 | Original post link

Yes, try again after creating it to see if you can perform DDL operations normally.

| username: 天下无贼 | Original post link

Okay, I’ll give it a try.

| username: 天下无贼 | Original post link

After creating on all TiDB instances using the deployment user, you can then create it. However, I haven’t understood the underlying principle.

| username: 啦啦啦啦啦 | Original post link

It’s a bug. The directory wasn’t created automatically after the upgrade, but it was fixed in later versions. I remember it was introduced with the feature that makes online index creation 10 times faster.
Refer to this article, which explains why temporary files are used:

| username: 天下无贼 | Original post link

Okay, boss.

| username: TiDBer_57GSL9zB | Original post link

I used chown tidb.tidb -R /tmp/tidb like this, and only created it under tikv.