The Story of TiDB and Electricity Theft Analysis

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb和窃电分析的故事

| username: tidb狂热爱好者

Those data statistics and TiDB stories

When I was a child, my dad did something very interesting.
At that time, we lived in an apartment building. Each household had its own meter plus a main meter. The electricity bill was collected by each household in turn. Anyway, from the time I can remember, the total of the main meter and each household’s meter never matched. It was always off by 20 or 30 yuan. The household responsible for the discrepancy had to cover it, or it was evenly distributed among all households, indicating that someone in the building was stealing electricity.
Later, it was discovered that the household in unit 302 was stealing electricity. They had two college students and were short on money, so they used their electrical knowledge to steal some electricity.
This simple addition and subtraction caught the thief. Why did we suspect them? They used only 1 kWh of electricity per month.
It was really absurd—absurd to the extreme.
When I was studying machine learning on Alibaba Cloud, there was an article about electricity theft analysis.

Using TiDB to predict stock returns and analyze stock cycles

User Electricity Theft Detection

  1. Enter the Designer page.
  2. Log in to the PAI console.
  3. In the left navigation bar, click Workspace List, and on the workspace list page, click the name of the workspace to be operated on to enter the corresponding workspace.
  4. In the workspace page’s left navigation bar, select Model Development and Training > Visual Modeling (Designer) to enter the Designer page.
  5. Build the workflow.
  6. On the Designer page, click the Preset Templates tab.
  7. In the template list’s User Electricity Theft Detection area, click Create.
  8. In the New Workflow dialog box, configure parameters (you can use all default parameters).

Among them: Workflow Data Storage is configured as the OSS Bucket path, used to store temporary data and models generated during workflow operation.

  1. Click OK.

You need to wait about ten seconds for the workflow to be successfully created.

  1. In the workflow list, double-click the User Electricity Theft Detection workflow to enter the workflow.
  2. The system automatically builds the workflow based on the preset template, as shown below.

Area Description
Statistical Analysis:
* Use the **Correlation Matrix** component to observe the impact of each feature on electricity theft.
* Use the **Data View** to see the data distribution relationship between each feature column and the target column. In this workflow, the **feature columns** are power_usage_decline_level, line_loss_rate, and warning_num, and the **target column** is is_theft.|

|②|Split the dataset into training and prediction datasets in an 8:2 ratio.|
|③|Use the Logistic Regression Binary Classification component to perform regression modeling on the training dataset. The training feature columns in this workflow are power_usage_decline_level, line_loss_rate, and warning_num, and the target column is is_theft.|
|④|Use the Prediction component to predict the model’s effect on the prediction dataset, and use the Binary Classification Evaluation component to evaluate the model’s prediction accuracy.|
3. Run the workflow and view the output results.

  1. Click the run button at the top of the canvas image.
  2. After the workflow runs, right-click the Correlation Matrix on the canvas, and in the shortcut menu, click Visual Analysis.
  3. In the Correlation Matrix dialog box, view the impact of each feature on electricity theft.

The relationship between the features power_usage_decline_level, line_loss_rate, and warning_num and whether the user is an electricity thief (is_theft) is not obvious, indicating that the features determining whether a user is an electricity thief are not singular.

  1. Right-click the Binary Classification Evaluation on the canvas, and in the shortcut menu, click Visual Analysis.
  2. In the Evaluation Report dialog box, click the Evaluation Chart tab to view the model evaluation metrics.

The closer the AUC value is to 1, the higher the model’s prediction accuracy. In this article, the AUC value reaches above 0.8, indicating that the user electricity theft model’s prediction accuracy is very high.

I will write about how to use TiDB for electricity theft analysis later. After all, we are also an AI database, haha.

When I was sent by Tang Cheng to work in the power sector, I was very puzzled as to why electricity meter statistics had to be at the second level. It had to produce 40TB of data a day. PostgreSQL was not enough; Hive was also needed to calculate the data. That is, the same data was stored in Oracle, PG, and Bigtable. The data center occupied an entire building.
Later, I realized that power companies are the major users of machine learning.
Electricity cannot be stored; it must be produced and used in equal amounts. It requires predicting peaks and troughs. This is our major employment direction.

Enough said, I need to pay my 5000-6000 yuan electricity bill.


| username: Jack-li | Original post link

Hahaha, interesting.

| username: Kongdom | Original post link

:yum: Interesting~

| username: zhaokede | Original post link

Isn’t this calculated by the difference between the incoming and outgoing current of the electric meter?

| username: TiDBer_JUi6UvZm | Original post link

Why are you so productive?

| username: jiayou64 | Original post link

Next step, gas :smiley:

| username: 呢莫不爱吃鱼 | Original post link

:+1: :+1: :+1:

| username: Daniel-W | Original post link

Impressive

| username: 数据库真NB | Original post link

This is done with a data-driven approach, analyzing data correlations. Not bad.

| username: TiDBer_QYr0vohO | Original post link

:+1:

| username: kelvin | Original post link

:+1: