Some Questions About Generating Data for TiDB Stress Testing

Quickly get started with HTAP. The command:

tiup bench tpch --sf=1 prepare

generates 6 million data, which feels a bit small. I want to generate 20 to 30 million data for testing. Is there any way to do this based on the current method or any other simple recommendations? Thanks.

Build it once, export it, and next time you want to use it, import it back in. It’s much faster than building it again.

See if the data generation function of Navicat can meet your needs.

Navicat’s data generation also supports TiDB, right? I’ll upgrade and give it a try. The results were average when I used it with MySQL before.

Supported, the generation speed is average, do not check the use of transactions.

You can write a program in Python, it’s very simple.

Here’s a relatively simple one I wrote some time ago. It supports some variables and parallel insertion. You can refer to it.

Too slow, you need to group the insert values into batches of ten thousand in a single SQL statement.

:joy: That’s true, currently it’s just being used for test data. I’ll modify it later.

You can first execute four different scripts with the same batch of data, then export the data as SQL, clear it, and re-import it. Modify the number of threads, then execute several different scripts, and repeat the cycle. However, it seems that whether or not you clear the data doesn’t have much impact, as the resulting performance metrics don’t differ significantly.

Data generated by any tool is relatively simple. To make it more realistic, write your own script.

The insert part of sysbench can be completed.

Use sysbench to generate data.

Is this step creating data?

You can use tiup bench to open multiple warehouses and threads, then prepare, or specify to export CSV. Sysbench is also an option.

Sysbench data generation

