Performance with large number of tables using TiFlash

Application environment:

PoC

TiDB version:

7.0

Reproduction method:

N/A

Problem:

The docs (Create TiFlash Replicas | PingCAP Docs) say “It is recommended that you do not replicate more than 1,000 tables because this lowers the PD scheduling performance. This limit will be removed in later versions.”

I’m wondering how significant the impact is, particularly as the number of tables grows well beyond that limit (say 10k - 30k tables of all sizes from 0 K to 200 G).

Are we talking 5% - 10% slower or grinding to a halt? I’m just looking for a guess based on other peoples experiences since I’m sure it will vary based on dataset, hardware, topology, etc.

Any info / update / timeline on removing or lessening that impact?

Thanks!

Resource allocation:

Servers have 12 cores, 64 GB of RAM, and 4 TB NVME SSDs.

Attachment:

I would expect a small performance degradation and not a grinding halt. There some steps you can take to deal with this like using more powerful hardware for PD or otherwise give more resources to it.

With TiFlash you often don’t need to put all tables on TiFlash, which can help to reduce the total number of tables in TiFlash.

Great, thanks for the insights.