Adding Index, TiDB Keeps Restarting: panic: runtime error: invalid memory address or nil pointer dereference

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 加索引,TiDB不断重启:panic: runtime error: invalid memory address or nil pointer dereference

| username: 我是咖啡哥

[TiDB Usage Environment] Test
[TiDB Version]
Upgraded from V6.1.0 to V6.5.0, then to V7.1.0
[Reproduction Path]
Adding an index to a 30 million row table.

[root@127.0.0.1][tpcc][08:40:23]> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.25-TiDB-v7.1.0 |
+--------------------+
1 row in set (0.00 sec)

[root@127.0.0.1][tpcc][08:40:42]> select count(0) from tpcc.customer;
+----------+
| count(0) |
+----------+
| 30000000 |
+----------+
1 row in set (4.00 sec)

[root@127.0.0.1][tpcc][08:40:55]> set global tidb_ddl_reorg_worker_cnt=4;
Query OK, 0 rows affected (0.04 sec)

[root@127.0.0.1][tpcc][08:41:20]> show variables like 'tidb_ddl_%';
+--------------------------------+--------------+
| Variable_name                  | Value        |
+--------------------------------+--------------+
| tidb_ddl_disk_quota            | 107374182400 |
| tidb_ddl_enable_fast_reorg     | ON           |
| tidb_ddl_error_count_limit     | 512          |
| tidb_ddl_flashback_concurrency | 64           |
| tidb_ddl_reorg_batch_size      | 256          |
| tidb_ddl_reorg_priority        | PRIORITY_LOW |
| tidb_ddl_reorg_worker_cnt      | 4            |
+--------------------------------+--------------+
7 rows in set (0.01 sec)

[root@127.0.0.1][tpcc][08:41:39]> alter table tpcc.customer add index idx_01(c_city);
ERROR 2013 (HY000): Lost connection to MySQL server during query
[root@127.0.0.1][tpcc][08:42:58]> 

[Encountered Problem: Symptoms and Impact]

TiDB Server keeps restarting, and the tidb_stderr.log is as follows: the log for each restart is the same as below.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x110 pc=0x4917721]

goroutine 573 [running]:
github.com/pingcap/tidb/session.(*session).TxnInfo(0xc2f043e000)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:545 +0x141
github.com/pingcap/tidb/session.GetStartTSFromSession({0x546bfc0?, 0xc2f043e000?})
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:3822 +0xce
github.com/pingcap/tidb/server.(*Server).GetInternalSessionStartTSList(0xc01ef2e480)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:960 +0x20c
github.com/pingcap/tidb/domain/infosync.(*InfoSyncer).ReportMinStartTS(0xc0002a0b40, {0x5d716a8, 0xc0006a2480})
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/domain/infosync/info.go:802 +0x82
github.com/pingcap/tidb/domain.(*Domain).infoSyncerKeeper(0xc000d66680)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/domain/domain.go:671 +0x42a
github.com/pingcap/tidb/util.(*WaitGroupEnhancedWrapper).Run.func1()
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/util/wait_group_wrapper.go:96 +0x77
created by github.com/pingcap/tidb/util.(*WaitGroupEnhancedWrapper).Run
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/util/wait_group_wrapper.go:91 +0xcc

It has restarted 30 times in half an hour!!
[Attachments: Screenshots/Logs/Monitoring]
tidb0626.log (3.4 MB)

| username: 我是咖啡哥 | Original post link

Could it be a memory issue? :messages log:


Jun 26 09:15:14 host_130 systemd[1]: tidb-4000.service holdoff time over, scheduling restart.
Jun 26 09:15:14 host_130 systemd[1]: Stopped tidb service.
Jun 26 09:15:14 host_130 systemd[1]: Started tidb service.
Jun 26 09:15:33 host_130 kernel: EDAC MC1: 6 CE memory read error on CPU_SrcID#0_MC#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x1d76294 offset:0xfc0 grain:32 syndrome:0x0 -  OVERFLOW err_code:0x0101:0x0091 socket:0 imc:1 rank:1 bg:3 ba:0 row:0xef62 col:0x138)
Jun 26 09:15:39 host_130 kernel: EDAC MC1: 11 CE memory read error on CPU_SrcID#0_MC#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x1d76294 offset:0xfc0 grain:32 syndrome:0x0 -  OVERFLOW err_code:0x0101:0x0091 socket:0 imc:1 rank:1 bg:3 ba:0 row:0xef62 col:0x138)
Jun 26 09:15:40 host_130 kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#0_MC#1_Chan#1_DIMM#0 (channel:1 slot:0 page:0x1d76294 offset:0xfc0 grain:32 syndrome:0x0 -  err_code:0x0101:0x0091 socket:0 imc:1 rank:1 bg:3 ba:0 row:0xef62 col:0x138)
Jun 26 09:15:53 host_130 systemd[1]: tidb-4000.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 26 09:15:53 host_130 systemd[1]: Unit tidb-4000.service entered failed state.

| username: 裤衩儿飞上天 | Original post link

How many TiDB servers? Are the other TiDB servers also restarting?

| username: 裤衩儿飞上天 | Original post link

It feels like a memory issue. If the memory is fine, then it’s a bug. :smiling_imp:

| username: 我是咖啡哥 | Original post link

A TiDB server, I set up the cluster on just one physical machine.

| username: tidb菜鸟一只 | Original post link

How much memory is allocated to one TiDB instance?

| username: aytrack | Original post link

This bug is the same as nil point panic in `session.session#TxnInfo()` method. · Issue #43829 · pingcap/tidb · GitHub, and it has not been fixed in version 7.1 yet.

| username: dba-kit | Original post link

I tried it, but couldn’t reproduce the issue. However, my TiDB server has more resources allocated, and the data volume is an order of magnitude smaller.

| username: 我是咖啡哥 | Original post link

Temporary solution:
–Find the job id and manually cancel it

admin show ddl jobs;
ADMIN CANCEL DDL JOBS 23490;

Execute multiple times during the restart interval.

Afterwards, rebuilding the index still encountered the restart issue. Creating other indexes first and then creating this one did not cause a restart.

| username: 有猫万事足 | Original post link

There’s a bit of luck involved. :joy:

| username: zhanggame1 | Original post link

Is it a bug?

| username: mayjiang0203 | Original post link

Issue: nil point panic in `session.session#TxnInfo()` method. · Issue #43829 · pingcap/tidb · GitHub

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.