[AICon 2024 Review, with Materials] How Does TiDB Utilize Copilot to Optimize Database Operations, Enhance User Experience, and Improve Internal Efficiency?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【AICon 2024 回顾,附资料】TiDB 如何利用 Copilot 优化数据库操作,提升用户体验与内部效率?

| username: TiDB社区小助手

Last Saturday, Li Li, head of PingCAP AI Lab, was invited to participate in the AICon 2024 Beijing event organized by InfoQ. He shared how TiDB uses Copilot to optimize database operations, enhancing user experience and internal efficiency! Hope this helps everyone! :partying_face:

About Li Li: Head of PingCAP AI Lab, with research areas covering recommendation systems and reinforcement learning. He participated in developing a Go algorithm based on reinforcement learning that defeated the then-world Go champion Park Junghwan. At PingCAP, he is responsible for building the Auto-Diagnosis system, promoting the development of autonomous driving database cloud, continuously focusing on AI application innovation, and driving its implementation and integration into production, committed to transforming enterprise AI applications.

:star2: PPT Download:

AICon2024-Database Copilot in the Database Field - PingCAP Li Li.pdf (42.7 MB)

In the face of growing data volumes, complex business logic, and the pursuit of higher performance and reliability, database systems face numerous challenges. Effectively handling large-scale data while ensuring data security and privacy is a current issue that needs to be addressed. With the continuous evolution of artificial intelligence technology, the application of LLM has become a hot topic in the database field. LLM technology can not only optimize the performance and reliability of database systems but also provide more intelligent solutions for database queries and process optimization. Below, Li Li will discuss these challenges and share the application prospects and solutions of LLM technology in the database field.

Challenges and Application Cases in the Database Field

InfoQ: What are the most pressing challenges in the database field currently? How do you think these challenges affect the performance and reliability of database systems?

Li Li: One of the most pressing challenges in the current database field is ++ how to handle and analyze the growing data volume while maintaining efficient performance and reliability ++. This challenge is mainly reflected in the following aspects:

Firstly, the continuous growth of data scale is a significant challenge. With the rapid development of fields such as IoT, social media, and enterprise applications, data volume is growing exponentially. This not only requires databases to effectively store and manage massive data but also to optimize storage structures, indexing mechanisms, and query processing to maintain efficient performance.

Secondly, the increasing demand for real-time data processing. Modern business scenarios, such as real-time analytics and online transaction processing, require database systems to ensure extremely low latency while handling large amounts of data. This poses higher requirements for database design and optimization.

Thirdly, data security and privacy protection is also a major challenge. With frequent data breaches, how to protect data security through encryption, access control, and other measures to prevent unauthorized access or leakage has become an important aspect of database system design.

Additionally, we face challenges in handling diverse data types and complex data relationships. Modern databases not only need to handle structured data but also effectively manage semi-structured and unstructured data. At the same time, the relationships between data have become more complex, posing new requirements for database models and query languages.

Finally, high availability and disaster recovery capabilities are increasingly important to enterprises. Any data loss or service interruption can lead to significant business losses, so ensuring high availability and rapid recovery of databases is crucial.

These challenges directly affect the performance and reliability of database systems. For example, lacking effective indexing and query optimization techniques when handling large-scale data will result in slow query speeds, severely impacting user experience. Similarly, if security measures are inadequate, data may face risks of leakage or damage, affecting the overall reliability of the system.

InfoQ: What are the application cases of LLM technology in the database field? Can you share some specific examples and how these cases use LLM technology to solve existing database system challenges?

Li Li: LLM technology has a wide range of applications in the database field, significantly contributing to both user experience improvement and internal operation optimization.

Firstly, in terms of user perception, LLM technology can greatly simplify user interaction with databases. For example, document-based ChatBots like TiDB Bot can support user queries on Slack or Cloud platforms. These ChatBots can understand user query intentions and provide suggestions on database configuration, log management, slow query optimization, etc. This not only improves user operational convenience but also helps users manage databases more effectively.

Additionally, LLM technology can help users generate SQL queries directly through natural language (NL2SQL). This means that even if users are not familiar with SQL syntax, they can obtain data by describing their query needs. Furthermore, we can extend this technology to convert raw data into business insights (NL2Insight), providing deeper data analysis and business insights beyond just generating SQL.

In terms of diagnosis and fault recovery, LLM technology also shows great potential. By integrating into ChatBot-based systems, LLM can use logs, slow queries, performance metrics, and other information to provide deeper domain judgment and business problem analysis. This helps reduce Mean Time to Repair (MTTR), enabling even non-professional users to quickly diagnose and resolve issues.

In internal usage not directly perceived by users, LLM technology also plays an important role. For example, in automated testing, LLM can be used to generate test cases for database systems, improving test coverage and efficiency. In code review, LLM can help analyze code quality and style consistency, enhancing development efficiency. Additionally, LLM can automate the generation of performance analysis reports, fault reports, etc., helping technical teams quickly obtain key information and manage enterprise internal knowledge bases, improving information sharing and retrieval efficiency.

InfoQ: What are Flow and Agent applications you mentioned? Can you explain these technical directions in detail?

Li Li: In LLM applications, we can distinguish three technical levels: Wrapper, Flow, and Agent. Each level represents different complexities and application scenarios of interacting with LLM.

  1. LLM Wrapper:

This is the most basic application level, involving a single interaction with LLM. At this level, the user’s request is directly sent to the model, and the model returns a response. The capability ceiling of this method is directly limited by the model’s reasoning ability. It is suitable for the early stages of business when enterprises are looking for product-market fit (PMF) and can quickly develop and iterate.

  1. Flow (DAG):

At the Flow level, business logic is constructed through Directed Acyclic Graphs (DAG) to achieve multiple interactions with LLM. Each interaction focuses on solving a specific problem, such as intent judgment, content rewriting, providing answers, or criticism. This method effectively overcomes the limitations of single interactions and supports building more complex applications. It is suitable for scenarios where there is a clear understanding of how to use LLM to solve business problems and requires handling more complex logic and improving accuracy.

  1. Agent (Loop):

The Agent level is constructed based on Loop+Feedback. Here, LLM can autonomously decide and execute the necessary steps based on human input, self-evaluate for anomalies after completion, and adjust accordingly. Through this method, LLM can significantly improve the accuracy of results and solve more complex problems. The logic of building an Agent is fundamentally different from traditional applications, akin to building a team or company, where each Agent is a capable workforce. Through the mutual supplementation of numerous Agents, a relatively reasonable decision is ultimately made.

These technical directions have no absolute pros and cons; the key is to choose the most suitable technical level for current business needs. As the business develops and needs change, it may be necessary to migrate from one level to another to adapt to more complex scenarios and improve overall system performance.

A table like this can clearly further understand the differences between different levels of applications.

Copilot Technology Practice in Databases

InfoQ: What are the typical steps when using LLM to solve practical problems? What are the main challenges and difficulties you encounter in these steps? How do you address these challenges?

Li Li: Using LLM to solve practical problems typically involves several key steps, each with its unique challenges.

Typical business implementation steps include:

  1. Business insight and demand judgment: This is the first step of the project, requiring a deep understanding of business needs and pain points.
  2. Experiment and feasibility analysis: In this step, we conduct preliminary experiments to test the applicability and effectiveness of LLM.
  3. Application type iteration: Depending on the complexity of the scenario, we may start with Wrapper and gradually iterate to Flow and Agent.
  4. Feedback design and collection: Design effective feedback mechanisms to collect user feedback, which is crucial for model optimization.
  5. Continuous optimization design and implementation: Continuously optimize models and business processes based on collected feedback.

The main challenges we encounter in these steps include:

  • Business understanding: Deeply understanding business needs often requires close collaboration with business parties, which may involve personally participating in business processes.
  • Model capability: There may be a gap between the theoretical capability of the model and its actual application effect.
  • Tool maturity: The tools available in the market may not be mature enough, sometimes requiring open-source contributions or self-development.
  • LLM stability: Including the stability of responses and processes, which needs to be addressed through carefully designed prompts and process control mechanisms.
  • LLM response format and quality: Ensuring that LLM responses meet business needs and are of high quality requires continuous feedback and optimization.

Response strategies:

  • Deep collaboration: Work closely with business parties to ensure a thorough understanding of business needs.
  • Continuous experimentation: Continuously adjust and optimize LLM applications through ongoing experiments and feasibility analysis.
  • Feedback mechanisms: Design effective feedback mechanisms, such as ChatBot’s likes and similarity evaluations, SQL correctness evaluations, etc., to collect user feedback and optimize models.
  • Enhance tools and processes: Develop or improve tools, optimize business processes, and improve the application effect and stability of LLM.

InfoQ: What is Copilot, and how does it differ from other AI application methods?

Li Li: Copilot is a very specific interaction method in AI applications, seeking a balance between flexibility and ease of use, aiming to reduce users’ cognitive burden while providing effective support.

Core features and applications of Copilot:

Copilot can be seen as a user’s “digital assistant” or “coach,” embedded in the user’s workflow to provide proactive suggestions and support. The key to this method is that it can capture a large amount of contextual information, enabling AI to provide more accurate and useful suggestions. For example, GitHub Copilot provides code suggestions in a programming environment, while Database Copilot may assist in database management or query optimization.

Compared to other AI application methods, the main difference of Copilot lies in its level of integration and interaction method. For example, compared to traditional chatbots, Copilot is more deeply integrated into users’ specific tasks and workflows, rather than just providing general conversational support.

Comparison with other AI application methods:

  1. General Chat: This method offers the highest degree of freedom, allowing users to interact with the system freely using natural language. However, it may not be as user-friendly in tasks requiring specific guidance or operations, such as ChatGPT.
  2. Specialized Chat: This method provides more specialized support by limiting the scope of discussion topics. It is more instructive than general chat but sacrifices some flexibility, such as TiDB Bot, which only discusses TiDB issues.
  3. AI-enabled Features: This method offers the highest ease of use, typically enabling specific AI functions through direct buttons or switches. Its results are more predictable but have the least flexibility, such as Notion AI’s continuation and translation features.

InfoQ: What is Database Copilot?

Li Li: Database Copilot is an AI application embedded in users’ workflows to provide real-time guidance and suggestions, improving users’ work efficiency and decision quality. This application mode is similar to a co-pilot in flight, providing assistance and ensuring the correctness and safety of operations.

Core features and working methods of Database Copilot:

  • Integration with workflows: Database Copilot is deeply integrated into users’ database operations and query processes. It analyzes users’ inputs and behaviors, as well as the database’s responses and status, providing real-time feedback and suggestions.
  • Proactive suggestions: Unlike traditional tools, Database Copilot adopts a proactive approach, recommending optimal operation steps or query optimization suggestions based on the current context and historical operation patterns.
  • Context awareness: It can understand and analyze users’ operational context in the database, enabling it to provide more accurate and useful suggestions.
  • Learning and adaptation: Database Copilot continuously learns users’ operational habits and preferences, optimizing its suggestion algorithms to provide more personalized support.

Application scenarios:

  • Query optimization: For complex SQL queries, Copilot can provide performance optimization suggestions, helping users rewrite queries to improve execution efficiency.
  • Error diagnosis: When users encounter query errors or performance bottlenecks, Copilot can provide diagnostic information and repair suggestions.
  • Learning assistance: For users unfamiliar with database operations, Copilot can serve as a real-time learning tool, providing operational guidance and best practices.

Challenges and countermeasures:

  • User trust: Building user trust in Copilot’s suggestions is a challenge. To address this, we ensure that all suggestions are based on best practices and precise data analysis, while providing sufficient explanations and documentation support.
  • Balancing automation and control: Excessive automation may make users feel out of control. We address this by providing adjustable automation levels and detailed user control options.
  • Continuous learning: To maintain Copilot’s effectiveness, we continuously collect user feedback and operational data for training and optimizing models.

InfoQ: How do you ensure the business logic correctness of generated SQL statements when handling complex business logic and rules? How do you verify that Copilot-generated SQL statements meet business requirements?

Li Li: Ensuring the business logic correctness of generated SQL statements is a multi-step process involving schema enhancement, continuous optimization, and feedback collection. I can elaborate on this process in several parts.

  1. Schema Enhancement:

When importing data, we provide detailed descriptions of the database schema, including column descriptions, table descriptions, table relationships, database descriptions, and main entities. This helps the model better understand the data structure and business context.

  1. Corpus Construction:

We introduce business-related corpora during data import, including schema information, domain knowledge, and specific SQL cases. These corpora help the model learn the language and logic of specific business domains.

  1. Prompt (Self-Q&A):

We optimize the query generation process using self-Q&A methods, including task rewriting, entity extraction, sub-question generation and answering, and query merging. This step is crucial for ensuring the logical correctness of generated SQL statements.

  1. Self-Fix:

We optimize the generated SQL to ensure its executability. If a query execution fails, the system feeds the error information back to LLM and attempts to generate a new query until a correctly executable result is obtained.

  1. Continuous Optimization:

A continuously learning and adapting Agent is responsible for optimizing SQL query generation and execution.

The Agent’s working mechanism includes the following aspects:

  • Automated monitoring and feedback loop:

    • The Agent continuously monitors the effectiveness of database operations, including query execution time, resource consumption, and other key performance indicators. It also collects user feedback on query results, such as likes, dislikes, and modifications.
    • Based on this data, the Agent can automatically identify which queries need optimization and which have achieved good performance.
  • Dynamic learning and adjustment:

    • The Agent uses machine learning algorithms to analyze the collected data and learn how to improve the structure and logic of SQL statements. This includes selecting more effective indexes, adjusting query structures, and optimizing joins and filter conditions.
    • It can also dynamically adjust query strategies based on the real-time state of the database to adapt to changes in data volume, database load, and other external conditions.
  • Generating and testing new query schemes:

    • After identifying queries that need optimization, the Agent automatically generates a series of improved query schemes. These schemes are executed in a safe test environment to evaluate their performance and accuracy.
    • By comparing the execution results of different schemes, the Agent can select the optimal query scheme and recommend it to users or automatically apply it to the production environment.
  • Continuous iteration and optimization:

    • This process is ongoing. The Agent continuously iterates and optimizes its learning models and query generation algorithms to adapt to new business needs and technological changes.
    • It also regularly cleans and updates its corpus, removing outdated or inefficient data to ensure the quality and relevance of learning resources.

Efficient Operation and Data Security of AI Agents

InfoQ: When deploying LLM Agents, how do you balance service overhead and real-time performance to ensure efficient system operation and response speed?

Li Li: This is indeed a significant challenge when deploying LLM Agents. Since LLM Agents typically require multiple interactions with LLM, this not only increases operational costs but may also affect response speed. The root cause here is the multiple interactions with LLM (commonly 20-30 times) + LLM’s inherent processing speed + high LLM costs.

We have adopted several strategies to balance these needs and ensure efficient system operation.

  • Optimizing interaction frequency and processing speed:
    • We use different levels of LLM applications for different types of tasks. For tasks requiring quick responses, we may use Wrapper or Flow modes, which involve fewer interactions (usually 1-5 times) and can be completed in a shorter time.
    • For tasks that can tolerate longer processing times, such as background analysis and fault diagnosis, we use Agents mode. Although this mode requires more interactions (20-30 times), it can handle more complex logic and loops, providing deeper analysis.
  • Enhancing LLM’s operational efficiency:
    • We adjust the hardware configuration of the service as needed, such as adding more GPU resources to shorten processing time.
  • Cost control and business adaptation:
    • We closely monitor service costs and performance, ensuring that costs are minimized without sacrificing user experience.
    • We choose the most suitable LLM application mode based on the characteristics and needs of different businesses. For example, in some non-real-time application scenarios, longer response times are acceptable, allowing the use of more complex Agents mode to improve analysis accuracy and depth.
  • Hybrid use of different modes:
    • In some cases, we embed Agents as a tool within Flow to solve specific, error-prone problems. This hybrid mode can solve complex problems while ensuring efficiency.

InfoQ: When LLM Agents need to access relevant business or customer data to improve service capabilities, how do you protect the security and privacy of this data? What measures do you take to ensure data confidentiality and integrity?

Li Li: Protecting customer data security and privacy is our top priority. We take multi-layered measures to ensure data confidentiality and integrity, covering both system-level and model-level security strategies.

  • System-level measures:
    • Data access control: We use TiDB and TiDB Cloud to manage data, which have strong data access control features. Unless explicitly authorized by the user, business data cannot be accessed. This ensures the legality and security of data access.
    • Data anonymization: When handling business data, we anonymize the data. This means that before data is used by LLM Agents, all sensitive information is removed or replaced to ensure that even if the data is leaked, it cannot be maliciously exploited.
  • Model-level measures:
    • Security strategies for internal and third-party models: For internal models, we can strictly control the data processing and storage environment. For interactions involving third-party models, we take the following measures:
      • Data minimization and anonymization: We minimize the amount of data processed and anonymize all sensitive data. This includes using data masking or pseudonymization techniques to ensure that sensitive information is not
| username: Kongdom | Original post link

:call_me_hand: Awesome, awesome~

| username: YuchongXU | Original post link

Data security and ease of use

| username: 希希希望啊 | Original post link

Awesome!

| username: Kuber | Original post link

Is it possible to provide an SQL query and receive optimization suggestions and other application scenarios? Or proactively push issues present in the current database environment and provide recommendations?

| username: YY-ha | Original post link

Got it! We will also ask the teacher to take a look~