Does Flink still run on HDFS when integrated with TiDB?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb与flink结合时,flink还基于hdfs运行吗?

| username: TiDBer_8rWAgqMU

After using TICDC to send data to Kafka, which is then consumed in real-time by Flink, I previously used HDFS+Hive+HBase but have now switched to TiDB directly, while still retaining Flink for real-time processing. However, there’s an issue with Flink: the previous Flink tasks were based on HDFS, which has now been replaced by TiDB. So, how should I handle Flink in this case?

Here was my previous Flink configuration:
1663839961(1)

Now that TiDB has replaced HDFS+Hive+HBase, which state.backend should I use for Flink?

| username: xfworld | Original post link

S3 protocol, or other distributed file protocols are all acceptable.

| username: 特雷西-迈克-格雷迪 | Original post link

Flink does not need to run based on HDFS, it only requires partial local disk storage.

| username: TiDBer_8rWAgqMU | Original post link

Local disks are not distributed, so there will be data loss issues when a machine goes down.

| username: TiDBer_8rWAgqMU | Original post link

Could you recommend a distributed file system? HDFS is quite heavy. Does your TiDB have a distributed file system? I see that many of your company’s customer solutions use Flink. May I ask what they use to store the checkpoint?

| username: xfworld | Original post link

As mentioned, any S3-compatible service can be used, such as Alibaba OSS, AWS (Amazon) S3, and Tencent Cloud COS.

| username: TiDBer_8rWAgqMU | Original post link

Sure, I’ll look it up online.
Thank you very much.