Python批量刪除mysql中千萬級大量數(shù)據(jù)

場景描述

線上mysql數(shù)據(jù)庫里面有張表保存有每天的統(tǒng)計結果,每天有1千多萬條,這是我們意想不到的,統(tǒng)計結果咋有這么多。運維找過來,磁盤占了200G,最后問了運營,可以只保留最近3天的,前面的數(shù)據(jù),只能刪了。刪,怎么刪?
因為這是線上數(shù)據(jù)庫,里面存放有很多其它數(shù)據(jù)表,如果直接刪除這張表的數(shù)據(jù),肯定不行,可能會對其它表有影響。嘗試每次只刪除一天的數(shù)據(jù),還是卡頓的厲害,沒辦法,寫個Python腳本批量刪除吧。
具體思路是:

  1. 每次只刪除一天的數(shù)據(jù);
  2. 刪除一天的數(shù)據(jù),每次刪除50000條;
  3. 一天的數(shù)據(jù)刪除完,開始刪除下一天的數(shù)據(jù);

Python代碼

# -*-coding:utf-8 -*-

import sys

# 這是我們內部封裝的Python Module
sys.path.append('/var/lib/hadoop-hdfs/scripts/python_module2')
import keguang.commons as commons
import keguang.timedef as timedef
import keguang.sql.mysqlclient as mysql

def run(starttime, endtime, regx):
    tb_name = 'statistic_ad_image_final_count'
    days = timedef.getDays(starttime,endtime,regx)
    # 遍歷刪除所有天的數(shù)據(jù)
    for day in days:
        print '%s 數(shù)據(jù)刪除開始'%(day)
        mclient = getConn()
        sql = '''
        select 1 from %s where date = '%s' limit 1
        '''%(tb_name, day)
        print sql
        result = mclient.query(sql)
        # 如果查詢到了這一天的數(shù)據(jù),繼續(xù)刪除
        while result is not ():
            sql = 'delete from %s where date = "%s" limit 50000'%(tb_name, day)
            print sql
            mclient.execute(sql)
            sql = '''
            select 1 from %s where date = '%s' limit 1
            '''%(tb_name, day)
            print sql
            result = mclient.query(sql)
        print '%s 數(shù)據(jù)刪除完成'%(day)
        mclient.close()

# 返回mysql 連接
def getConn():
    return mysql.MysqlClient(host = '0.0.0.0', user = 'test', passwd = 'test', db= 'statistic')

if __name__ == '__main__':
    regx = '%Y-%m-%d'
    yesday = timedef.getYes(regx, -1)
    starttime = '2019-08-17'
    endtime ='2019-08-30'
    run(starttime, endtime, regx)

循環(huán)判斷數(shù)據(jù),如果有,繼續(xù)刪除當天50000條數(shù)據(jù);否則,開始刪除下一天的數(shù)據(jù)。花了半個小時,終于刪除完了。

作者:柯廣的網絡日志 ? Python批量刪除mysql中千萬級大量數(shù)據(jù)


微信公眾號:Java大數(shù)據(jù)與數(shù)據(jù)倉庫