TiSpark Security

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tispark 安全

| username: 小鱼吃大鱼

As long as TiSpark knows the PD port, it can obtain database information. How is security considered in this case? As long as any server has Spark installed and knows the PD port configuration, it can access the database data. How can data security be ensured?

| username: shiyuhang0 | Original post link

You can refer to the authentication mechanism at https://github.com/pingcap/tispark/blob/master/docs/authorization_userguide.md

| username: 小鱼吃大鱼 | Original post link

I still don’t understand what you mean. This authentication is configured in the configuration file. If I don’t configure it, there will be no authentication. If I am malicious and want to obtain database information, I can just install TiSpark on any server, and without authentication, as long as I configure the PD address and port, I can read the database content. How can data security be ensured in this case?

| username: xuexiaogang | Original post link

So, does that mean as long as any machine with network access installs the client, it can log into the database? Isn’t that also a problem?
The database connection information is supposed to be confidential and only known to the administrator.

| username: 小鱼吃大鱼 | Original post link

This is different. For a database, permissions need to be granted for the client to connect. Even if the client knows the database, it cannot retrieve data without the necessary permissions. However, TiSpark does not require authorization or control. As long as the PD port is known, data can be retrieved without needing a username or password. There is no concept of an administrator.

| username: ealam_小羽 | Original post link

Both the machine and the port require some whitelists or authorized access to enter, right?
If the machine’s external network port can be accessed freely, the machine itself has security risks.
Similar to this: 阿里云登录 - 欢迎登录阿里云,安全稳定的云计算服务平台

| username: shiyuhang0 | Original post link

If authentication is enabled, theoretically there is no need to configure the PD address, but without authentication, the issue you mentioned will indeed occur. Authentication is a new feature starting from version 2.5.0, and it still needs to support backward compatibility without enabling it.

| username: 小鱼吃大鱼 | Original post link

It’s not external network access, it’s internal network access. The current issue is that any server within the internal network, as long as it can connect to the PD address and knows the port, can deploy TiSpark on it without any permissions and access the database data. This is a security risk, right? Unless what can be done is to add a port whitelist to prevent other machines from connecting.