[Hackathon 2022 Idea Pool] Looking for Creative Partners~~

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 【Hackathon 2022 创意库】寻找创意的小伙伴看过来~~

| username: TiDB社区小助手

Looking for inspiration? You’ve come to the right place!

Hackathon 2022 Creative Library is here, featuring contributions from community members. We also welcome more creative ideas from everyone in the comments section.

  • Creative Contributions: Simply reply with your ideas in the comments! If your idea is included in this post, you can earn 100 community points per idea (up to 300 points per person). However, there is no limit to the number of ideas you can submit!
  • Finding Inspiration: Below are sources of inspiration for your ideas. Participants in the Hackathon can also find creative inspiration here.

Inspiration Hub

TiDB Hackathon 2022 Informal Talks - Creative Brainstorming Session, featuring Dongxu and senior architects brainstorming online, generating numerous creative ideas.

Application Group

Idea Video Start Time Description
Develop using TiDB Cloud API 05:51 Resources: TiDB Cloud Developer Tier (free clusters on TiDB Cloud)
Improve TiDB usability 08:12 Easy scaffolding, one-click deployment of a simple TiDB front-end and back-end application TiDB application market, establishing a commercial ecosystem of TiDB application templates
TiDB CI/CD components TiDB CI/CD components for automated testing, with examples: Creating TiDB on GitHub Action: GitHub - Icemap/tidb-action: GitHub Action to setup a TiDB database by Docker Creating TiDB image in Java unit tests: Add TiDB module by Icemap · Pull Request #5511 · testcontainers/testcontainers-java · GitHub
OSS Insight Market Place 23:08 Abstract frequently used components into a marketplace for free selection
Provide graphical interface through SQL 24:42 BI tool? How is it different from Metabase/SuperSet/Tableau?
TiDB → TiFlow → Kafka → Doris/ClickHouse data pipeline 24:50 Create a data pipeline to write data to other databases for AP analysis [optional] Customizable Kafka Consumer to consume data to any database [optional] Use Flink to complete customizable Source/Sink for general data transfer
Multi-tenant with ProxySQL and TiDB 37:34 Idea: Use ProxySQL, add a prefix to all usernames/table names at the proxy layer, then use Placement Rules for strategy writing TiDB and ProxySQL integration
SSO for TiDB login authentication 42:00 Idea: Write an OAuth service, construct Token ↔ temporary username/password mapping, only this service knows the Root account, manage temporary users, such as automatic renewal, password change reminders, dynamic passwords, etc. [optional] Combine with common ORM/Driver for OAuth login configuration, automatically complete username/password retrieval, login, etc. [optional] Combine with Multi-tenant with ProxySQL and TiDB to achieve SSO + multi-tenant
Data analysis 51:26 TiDB is an HTAP database, you can use it for some analysis work Use the SDK contributed by Dongxu to TiDB Cloud, create a TiDB Cloud Developer Tier (free), then import data and analyze
Web3 dataset 57:11 Since blockchain is a distributed database and is public, you can use tools like Ethereum ETL to import data into TiDB for real-time analysis ethereum-etl: Ethereum ETL, can convert Ethereum data to CSV format with one click, real-time data access to TiDB Refer to NFTScan, OSSInsight, OssInsight - NFT Insight WorkShop
Basic Linux tools 61:01 Use inotify tool to monitor file changes, use a mechanism similar to TiCDC to import data into TiDB, adapt to S3
Vercel integration 64:04 Implement Vercel Template for OSSInsight Use Vercel Serverless API to transfer MySQL data to TiDB [optional] Use the Terraform provided by TiDB at the end of September to manage the cluster lifecycle
OSS Insight multi-language implementation 69:20 Implement OSS Insight using various languages/frameworks
Continue TiClick 66:21 TiClick can be created this year through TiDB Cloud’s OpenAPI

TiDB Product Group

Idea Video Start Time Description
TiDB supports ClickHouse/Doris as engine for read-only external tables 0:24:45 From Zhang Xingye 0:41:10 From Huang Dongxu Use ClickHouse/Doris as TiDB’s read-only external table engine. Let TiDB be a common user query entry, TiDB can connect to multiple different databases/engines underneath. For example, CREATE TABLE xxx (xxx) engine=Doris or CREATE TABLE xxx (xxx) engine=ClickHouse. After creating a read-only external table, users can access these external tables through TiDB, and TiDB will read the corresponding database/engine data through the internal engine driver. The difficulty of this idea may lie in how to make the user experience smoother.
Implement multi-tenant isolation based on Placement Rules 0:37:25 A “poor man’s” multi-tenant isolation solution, adding a TCP proxy layer before TiDB, all user requests go through this proxy, and this proxy adds a user prefix to the table names based on the username, then forwards to TiDB. Then use Placement Rule to isolate data from different tenants.
Support importing data from AWS S3 using SQL syntax 0:48:55 For example: LOAD S3 ...
TiDB Cloud SDK 0:51:26 Allow users to automatically create dev-tier using SDK.
Improve TiDB’s caching capability 0:51:51 Currently, TiDB Cache Table can only cache 64 MB, which is too small.
Row-based permission management 1:03:09 Allow a user to access only certain rows of a table.
Improve PD TSO bottleneck (single point) issue 1:19:30 Make TSO more scalable when deploying in multiple data centers.
Improve fault recovery speed in multi-data center deployment 1:20:25 In the cloud, when deploying TiDB clusters across multiple data centers, can another data center’s TiDB immediately provide service when one data center goes down?

Easter Egg Group

The Easter Egg Group cannot modify TiDB/TiKV source code, only use plugins for hacking

Idea Video Start Time Description
TiDB UDF entry search 75:09 The current TiDB Plugin API is very weak, only for auditing. Can it support a plugin system without modifying the TiDB kernel? SHOW PLUGINS ADMIN PLUGIN Hint: The current Plugin API can be enhanced in a special way without modifying the kernel code.
TiKV UDF entry search 77:46 TiKV has an undocumented mechanism called Coprocessor Plugin, although undocumented and rarely used, it is usable.

Creative Contributors:

  • Huang Dongxu, PingCAP Co-founder and CTO
  • Yao Wei, PingCAP Global Community Ecosystem Leader
  • Zhang Xingye, Multi-point System Architect
  • Cheng Chen, PingCAP Product Manager

Project Examples

Some project examples for reference, hoping to give you some direction and inspiration:

  1. OSSInsight is an insight tool built on billions of GitHub events data. As long as you can write SQL, you can build a powerful and cool data insight tool based on Docusaurus and Apache ECharts.

  2. TiDB & Snowflake Demo is an e-commerce system built on TiDB and Snowflake, using TiDB’s powerful real-time HTAP capabilities and Snowflake’s offline analysis capabilities to handle large amounts of data in the system.

  3. Ti-Click was one of the top 20 projects in TiDB Hackathon 2021. The project quickly builds Example Apps based on TiDB through an online IDE, helping developers quickly learn TiDB.

  4. Bookshop is an online bookstore application built on TiDB. You can use it to learn how to import table structures and data, and how to write SQL based on this data. This article will introduce how to import the application’s table structure and data, as well as the definition of its data table structure.

  5. For TiDB beginners, we provide a cloud-native development environment usage guide based on Gitpod. You can start a remote TiDB development environment directly from your browser or desktop IDE to quickly experience TiDB’s capabilities. We have written a new TiDB Developer Documentation to help application developers get started with TiDB in the shortest time.

Project Ideas from the Community

Application Group:

  1. Best practices for Oracle migration tools and migration solutions

  2. A migration verification tool for migrating from MySQL to TiDB, allowing developers to modify incompatible table structures, stored procedures, and functions in advance.

  3. Lightweight TiDB

Product Group:

TiDB:

  1. Intelligent operation and maintenance platform, including common operation and maintenance operations, intelligent operation and maintenance to detect various problems in advance (SQL performance issues, hardware and software issues, usage issues, deployment issues, parameter issues, etc.), with certain self-healing capabilities, strong alerting, and remote handling capabilities

  2. Optimizer supports INDEX SKIP SCAN: https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html#range-access-skip-scan

  3. A small tool that provides suggestions for SQL adjustment and optimization, for example: taking over a database with SQL performance issues, but not being familiar with the overall business logic, especially for DBAs. If SQL execution is slow, this tool can provide feasible suggestions and general optimization directions based on the overall SQL operation, allowing quick identification of problems and SQL optimization.

  4. Define the running level of the entire cluster (a set of parameters and configurations), define the usage level of the tidb-server node, and automatically or manually specify the level to run.

  5. Support for custom SQL functions and stored procedures

  6. Statistics on query statements (e.g., how many SQLs were executed on a certain day, the types, total number, and proportion of each type of SQL, index usage, etc.)

  7. Richer point-in-time recovery solutions, as the use cases for recover and flashback are limited

  8. Improve CDC’s avro format

  9. Support for migrating data from SqlServer to TiDB

Dashboard:

  1. Enrich the Dashboard’s features, such as managing DM tasks and TiCDC changefeeds.

  2. Can’t see default parameters and their meanings, always have to refer to the official website. If I have a Dashboard for cluster status viewing, it should ideally not require logging into the cluster to query via command line, otherwise, I have to look back and forth.

  3. Optimize the display of monitoring, if common problem monitoring could be listed together, it would be better, without switching back and forth. For example, in cases of slow reads and writes.

  4. There are many system and cluster parameters, can they be categorized on the dashboard? For example, if I need to optimize read/write hotspots, related parameters can be displayed, and even modified directly on the dashboard.

  5. Hope the Dashboard’s features are richer, allowing daily inspections to be completed directly on the Dashboard.

  6. More detailed descriptions and explanations for monitoring panels and SQL optimization. There are many monitoring items, many of which are not used in daily operations, and when problems arise, it’s unclear which monitoring to check. This can be optimized. SQL optimization documentation is also not very helpful for tuning, and execution plans are sometimes hard to understand. More detailed explanations would be better.

  7. Simple interactive interface and advanced charts to assist DBAs in inspection logs and business report management, reducing the chances of platform downtime.

Tikv:

  1. Currently, TiKV disk size configuration does not exceed 2 TB, it would be better to support SSDs over 8T, saving deployment costs for enterprises.

TiUP:

  1. Create a TUI for TiUP, it’s not elegant to list and copy IP and port before using various ctl commands.

  2. TiUP has a requirement: TiUP generates a key pair for mutual trust with the target machine during deployment, but in some scenarios, the target machine’s authorized_keys is refreshed regularly or modification is not allowed, meaning only specified keys can be used to log in. Can TiUP always use a specified key without creating a new one?

  3. Store TiUP configuration data in PD, any node installing TiUP and accessing PD can manage the cluster, avoiding single point data loss issues.

  4. TiUP version rollback operations

TiSpark:

  1. Can TiSpark’s calculation results be written back to TiFlash? This would include offline calculations in TiDB’s architecture, allowing us to abandon the complex Hadoop setup.

Direct Links to Competition Information

| username: BraveChen | Original post link

Following this topic.

| username: 数据小黑 | Original post link

TiDB or TiFlash should have the capability of external tables by now. TiFlash + external tables + object storage + Parquet + Spark can take OLAP to the next level.

| username: STRRL | Original post link

This document is private, I want to refer to it but can’t see it:

| username: TiDB社区小助手 | Original post link

Here you go: ProxySQL 集成指南 | PingCAP 文档中心 Updated the link.