Some Questions About Generating Data for TiDB Stress Testing

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 想进行压测TIDB,造数据的一点疑问

| username: TiDBer_JlY1JCJ5

Quickly get started with HTAP. The command:

tiup bench tpch --sf=1 prepare

generates 6 million data, which feels a bit small. I want to generate 20 to 30 million data for testing. Is there any way to do this based on the current method or any other simple recommendations? Thanks.

| username: 小龙虾爱大龙虾 | Original post link

Build it once, export it, and next time you want to use it, import it back in. It’s much faster than building it again.

| username: forever | Original post link

See if the data generation function of Navicat can meet your needs.

| username: TiDBer_cZ23QmYB | Original post link


| username: TiDBer_JlY1JCJ5 | Original post link

Navicat’s data generation also supports TiDB, right? I’ll upgrade and give it a try. The results were average when I used it with MySQL before.

| username: zhanggame1 | Original post link

Supported, the generation speed is average, do not check the use of transactions.

| username: Jolyne | Original post link

You can write a program in Python, it’s very simple.

| username: forever | Original post link

Here’s a relatively simple one I wrote some time ago. It supports some variables and parallel insertion. You can refer to it.

| username: zhanggame1 | Original post link

Too slow, you need to group the insert values into batches of ten thousand in a single SQL statement.

| username: Jolyne | Original post link

:joy: That’s true, currently it’s just being used for test data. I’ll modify it later.

| username: 随缘天空 | Original post link

You can first execute four different scripts with the same batch of data, then export the data as SQL, clear it, and re-import it. Modify the number of threads, then execute several different scripts, and repeat the cycle. However, it seems that whether or not you clear the data doesn’t have much impact, as the resulting performance metrics don’t differ significantly.

| username: oceanzhang | Original post link

Data generated by any tool is relatively simple. To make it more realistic, write your own script.

| username: dba远航 | Original post link

The insert part of sysbench can be completed.

| username: andone | Original post link

Use sysbench to generate data.

| username: Kongdom | Original post link

Is this step creating data?

| username: tidb菜鸟一只 | Original post link

You can use tiup bench to open multiple warehouses and threads, then prepare, or specify to export CSV. Sysbench is also an option.

| username: TiDBer_gxUpi9Ct | Original post link

Sysbench data generation

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.