Count-Min Sketch Description Issue in Statistics

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 统计信息Count-Min Sketch描述问题

| username: FutureDB

The Count-Min Sketch section of the statistics says:

  • Modify the two parameters WITH NUM CMSKETCH DEPTH and WITH NUM CMSKETCH WIDTH mentioned in Statistics Collection - Manual Collection. These two parameters affect the number of hash buckets and the probability of collisions. Increasing them appropriately can reduce the probability of conflicts, but it will also affect the memory usage of the statistics. You can adjust them according to the specific situation. In TiDB, the default value for DEPTH is 5, and the default value for WIDTH is 2048.

Question: It says here that increasing DEPTH and WIDTH will reduce the probability of hash collisions. Is there a problem with this? Normally, increasing WIDTH will reduce the probability of hash collisions, but why would increasing DEPTH reduce the probability of hash collisions? Shouldn’t this increase the probability of hash collisions?

| username: TiDBer_jYQINSnf | Original post link

From the structure of the hash table, an increase in depth also means an increase in total capacity. Does this mean that with an increase in total capacity, the probability of conflicts decreases? I’ll look into the code when I have time.

| username: Kongdom | Original post link

:thinking: This sentence means that width affects the number of buckets?