FlinkCDC connecting to TiDB results in garbled text

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: FlinkCDC连接tidb乱码

| username: juecong

【TiDB Version】
6.1.0

【Problem Encountered】
new TiKVSnapshotEventDeserializationSchema() {
@Override
public void deserialize(
Kvrpcpb.KvPair record, Collector out)
throws Exception {
System.out.println(“==============”);
System.out.println(record.getValue().toStringUtf8());
int serializedSize = record.getSerializedSize();
out.collect(RowKey.decode(record.getValue().toByteArray()).toString());
}

                                @Override
                                public TypeInformation<String> getProducedType() {
                                    return BasicTypeInfo.STRING_TYPE_INFO;
                                }
                            })

The printed statement is garbled: e !"#$&'( " @ A ] c o { � � � � � [{�������1s{��0000004c0f0446aab996f270ef2371ef�7Nantong Engineering Co., Ltd. A[“Henan Provincial Department of Transportation”] Provincial Credit Rating Enterprise Honor Credit Rating Highway and Waterway Transportation Category Henan Province�A Highway Construction Enterprise Credit Evaluation Nantong Engineering Co., Ltd. won the 2020 Credit Rating Highway and Waterway Transportation Category Af48405721ba019df0526adefbfcf94872021-04-08 P�2020�a�aHenan Provincial Department of Transportation ���Yang f���

【Reproduction Path】
Flinkcdc connects to TiDB

【Problem Phenomenon and Impact】
Need to obtain correct text or JSON format data

【Attachment】

| username: xfworld | Original post link

The original version does not have transcoding operations, you can try it out.
Also, what encoding is used for the database cluster?

| username: juecong | Original post link

Yes, I also copied it directly from the tutorial. If you do it this way, the output will be like this:

.
The database encoding:

| username: 消息终结者 | Original post link

Encountered the same problem, spent the whole day without solving it, planning to try the SQL method :rofl:

| username: juecong | Original post link

The SQL method works, but after reading data for a while, it reported a java.io.EOFException error. So I thought of switching to the API method to debug the cause.

| username: zzzzzz | Original post link

You can try a few more encoding conversions inside to rule it out.

| username: Peiqi | Original post link

I read the data through Flink CDC’s API, and it seems to be processed by TiKV, not garbled. I am currently encountering this issue as well and do not yet know how to restore the data obtained from TiKV to its original form.

| username: Tank001 | Original post link

Starting with SQL :smiley:

| username: 天下第一帅 | Original post link

The frustrating part is that record.getValue() and record.getKey() cannot be directly parsed with toString!

Refer to the source code of RowDataTiKVSnapshotEventDeserializationSchema and make some modifications.
Object tikvValues =
decodeObjects(
record.getValue().toByteArray(),
RowKey.decode(record.getKey().toByteArray()).getHandle(),
tableInfo);

I hope the official team can fix this issue soon! The usability is quite poor.

| username: 天下第一帅 | Original post link

  • List item
| username: 天下第一帅 | Original post link

The frustrating part is that record.getValue() and record.getKey() cannot be directly parsed with toString!

Refer to the source code of RowDataTiKVSnapshotEventDeserializationSchema and make some modifications.
Object tikvValues =
decodeObjects(
record.getValue().toByteArray(),
RowKey.decode(record.getKey().toByteArray()).getHandle(),
tableInfo);

I hope the official team can fix this issue soon! The usability is quite poor.

| username: Jellybean | Original post link

May I ask, did the method you posted at the end solve the garbled text issue in the title?

| username: 天下第一帅 | Original post link

Solved, no problem.

| username: 天下第一帅 | Original post link

This is actually just Kvrpcpb.KvPair. The official documentation does not provide an explanation, so you need to refer to RowDataTiKVSnapshotEventDeserializationSchema for parsing.

| username: TiDBer_UhEaT67M | Original post link

How should I do this, boss? Is there an example to refer to? :grinning:

| username: 消息终结者 | Original post link

Handle it like this:
Map<String, String> map = new HashMap<>();
map.put(“tikv.grpc.timeout_in_ms”, “30000”);
map.put(“tikv.grpc.keepalive_time”, “30000”);

TiConfiguration tiConfiguration = TDBSourceOptions.getTiConfiguration(“localhost:2379”, map);
TiSession session = TiSession.create(tiConfiguration);
TiTableInfo tableInfo = session.getCatalog().getTable(“database_name”, “table_name”);

Object objArray = TableCodec.decodeObjects(valueByteArray, RowKey.decode(keyByteArray).getHandle(), tableInfo);

| username: swino | Original post link

Learned.

| username: TiDb_01 | Original post link

Hello, could you please provide a more complete example? I tried it here but it didn’t work.

| username: 哈喽沃德 | Original post link

Great master, I’ve learned a lot.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.