Support for Encryption When Dumping Backup Files

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dumping在备份文件时候支持加密

| username: dba-kit

The BR backup data encryption feature supports encrypting files in the cloud using a given key during backup. I hope dumpling also supports this logic.

| username: Eason | Original post link

Dumpling is not planned to be supported in the future… We currently hope to consolidate these tools and make the product more focused. The specific plan is:

  1. For data coming in, if not through the pipeline, it will go through the kernel using the import into syntax, supporting local/S3, and eventually deprecating the external tool Lightning.
  2. For data going out, if not through the pipeline, it can also go through the kernel, planned to use the export syntax, supporting local/S3, and eventually deprecating the tool Dumpling.
  3. The synchronization tools for data in and out are DM (in-bound) and CDC (out-bound).

Other tools are not in the plan. Additionally, for the mentioned scenario, what issues are there with using BR for backup and encryption, and why consider Dumpling?

| username: DBAER | Original post link

Mark it, it should be a logical backup.

| username: dba-kit | Original post link

What does “link” refer to here? Actually, as a user, I feel that maintaining independent binary tools is more flexible for the following reasons:

  1. Currently, both dumpling and lightning are standalone binary files. I can execute export/import tasks on other machines outside the TiDB cluster. Especially for lightning, it consumes a lot of resources during import. If it can only be triggered internally in TiDB via SQL in the future, it will definitely affect the stability of the cluster. Even though it is now possible to limit background tasks to certain tidb-server nodes, the operational cost is relatively high and not as flexible as using standalone binaries, which provide better isolation.
  2. Since TiDB is positioned to be MySQL-compatible, I often use lightning/dumpling tools to replace myloader/mydumper tools for importing/exporting MySQL. These tools can actually enhance PingCAP’s reputation and attract some MySQL users.

It would be a pity to abandon the standalone tool mode for the sake of internal integration.

| username: dba-kit | Original post link

This involves two different dimensions of backup. Currently, we use both modes. BR is for physical backup, which we use for disaster recovery; dumpling is for logical backup, mainly for archiving purposes. When archiving historical data to OSS, we also don’t want to store it in plaintext.

| username: forever | Original post link

Abandoning dumpling and lightning doesn’t seem very wise. DM and CDC are relatively heavy, and logical backups are still more convenient for import and export.

| username: zhaokede | Original post link

Dumpling is a logical export tool, and the files are in SQL and CSV or cloud formats. This does not support encryption.

| username: Eason | Original post link

  1. Choosing an independent node to deploy TiDB or Lightning, why is Lightning more cost-effective and flexible? Can you elaborate on this? In the context of deploying only a few Lightning instances, the import cost, stability, and performance of Lightning and TiDB nodes should be similar. However, once the data volume increases and multiple Lightning instances are needed to run in parallel, due to the lack of scheduling management in Lightning, the upper limits of stability and performance are relatively low. TiDB can directly solve this issue through internal parallel scheduling management. Background resource groups are a form of isolation scheme, and independent TiDB nodes are also a form of isolation scheme, so there is no need to worry too much about resource isolation issues.
| username: Eason | Original post link

After deploying TiDB, you can directly import and export using the following methods. Do you still feel that this doesn’t meet your needs?

Import: import into from local/S3
Export: export into local/S3

| username: ShawnYan | Original post link

Returning to the topic’s question, what types of encryption and compression will import/export support?

| username: dba-kit | Original post link

The machine costs are indeed similar. What I mainly refer to is the complexity of the operation. When the data volume is small, using the IMPORT method to import data is quite convenient. However, in scenarios where large volumes of data (in the terabytes or even tens of terabytes) need to be imported, Lightning will inevitably use a lot of CPU and memory, making it very unsuitable to be placed together with a TiDB server that has business access, as it can easily affect normal business requests. Under this premise, it is definitely necessary to use a separate machine to complete the large data import work. Here are a few issues:

  1. If Lightning is used as a regular binary, I just need to copy the binary file over. Even if this machine has a high load, there is no need to worry because it is a normal phenomenon. Additionally, this machine can be used for other purposes and does not need to be part of the TiDB cluster.
  2. If machine isolation is achieved by expanding TiDB-server nodes, the cost of expansion operations is higher. Moreover, during the import period, the CPU and memory load of the IMPORT nodes will be very high, which will definitely trigger the default alarm rules of the cluster. Nowadays, with refined management, the cost of explaining such alarms will also increase.
  3. If it is built into TiDB as a function, it essentially turns a tool that could be suitable for the entire MySQL ecosystem into a TiDB-specific tool, which does not align well with PingCAP’s open-source philosophy.

Therefore, unless it significantly increases development difficulty, it is recommended to maintain the current state—providing both IMPORT INTO and independent deployment of Lightning.

| username: dba-kit | Original post link

Actually, this is indeed a key point. Is there any consideration for encryption in future import/export?

| username: 人如其名 | Original post link

Is there any plan to support cross-database import? Currently, using dumpling and then lightning requires landing, which not only takes a long time but also requires a lot of disk space.