Issue with TiFlash Component Not Functioning Properly After Upgrading TiDB Cluster Version from v5.4.0 to v6.5.0

username: vcdog

[TiDB Usage Environment] Production Environment
[TiDB Version] Upgraded from v5.4.0 to v6.5.0, currently, all components in the TiDB cluster are version v6.5.0
[Reproduction Path] After upgrading the TiDB cluster version from v5.4.0 to v6.5.0, the TiFlash component is not working properly.
tiup cluster display my_cluster_name

Then, following the official documentation, I tried to scale in and out the TiFlash component, and the status was normal again.
However, as soon as a few small tables are loaded into TiFlash, the disconnected issue occurs.

At the same time, logging into the remote TiFlash server, a large number of core files are generated in the deployment path, as follows:

Each core file is 1G in size, with a total of 70G+.

Checking the error logs, the following errors are reported:

[Resource Configuration]
username: vcdog

The above is the error log for TiFlash.

username: tidb狂热爱好者

Deleting and rebuilding will solve the problem.

username: vcdog

The very strange thing is that we have two production environment clusters, both of which were upgraded from version v5.4.0 to version v6.5.0. In one cluster, TiFlash works normally, but in the other, TiFlash does not work properly.

username: wzf0072

The errors are all related to the table with physical_table_id=1130. Should we disable the TIFLASH REPLICA for this table first and then enable it again?

username: vcdog

I have already set the replica count of all tables in TiFlash to 0 and restarted TiFlash, but the same error still occurs, and a large number of core files are generated.

username: tidb菜鸟一只

Let’s see what table this is.

username: vcdog

I think I’ve roughly found the cause of the problem. I’ll verify it again today, and if it is indeed the cause, I’ll post the verification results.

username: tidb狂热爱好者

What is the reason?

username: 裤衩儿飞上天

Is it an issue with AVX2 instructions?

username: vcdog

The hypothesis verification failed, so the entire replica cluster had to be destroyed and rebuilt:

  1. Stop the ticdc synchronization task from the primary cluster to the replica cluster.
  2. Destroy the replica cluster.
  3. Backup and export the primary cluster data.
  4. Import the data into the replica cluster.
  5. Load tiflash on the replica cluster.
username: system

