pt-osc 3.0.13 mysql5.7.24

0、mysql ddl操作的流程思路(来自percona)

一、pt-osc的缺点:
1.原表不能存在触发器,因为pt-osc需要通过触发器将原表copy数据阶段产生的数据应用到新表去。
2.表必须具有主键和唯一键。
3.原表不能是其他外键的父表,需要添加—alter-foreign-keys-method参数即可。
4.字段属性为NOT NULL时,必须有DEFAULT属性,否则会报错。
5.可能会导致主从数据延迟
6.如果运行过程中报错了,无法从上一个位置继续进行,需要从头开始
7.不支持MySQL5.7的虚拟列功能
二、pt-osc工作过程创建一个和要执行 alter 操作的表一样的新的空表结构(是alter之前的结构)在新表执行alter table 语句(速度应该很快)在原表中创建触发器3个触发器分别对应insert,update,delete操作以一定大小chunk从原表拷贝数据到临时表,拷贝过程中通过原表上的触发器在原表进行的写操作都会更新到新建的临时表Rename原表到old表中,在把临时表Rename为原表如果有参考该表的外键,根据alter-foreign-keys-method参数的值,检测外键相关的表,做相应设置的处理默认最后将旧原表删除
通过源码查看一下pt-osc的操作步骤
[root@hpaybbcx2:/root]#egrep ‘Step’ /usr/local/bin/pt-online-schema-change
# Step 1: Create the new table.
# Step 2: Alter the new, empty table. This should be very quick,
# Step 3: Create the triggers to capture changes on the original table and
# Step 4: Copy rows.
# Step 5: Rename tables: orig -> old, new -> orig
# Step 6: Update foreign key constraints if there are child tables.
# Step 7: Drop the old table.

三、常用选项说明
–host=xxx –user=xxx –password=xxx
连接实例信息,缩写-h xxx -u xxx -p xxx,密码可以使用参数–ask-pass 手动输入。
–alter
结构变更语句,不需要 ALTER TABLE关键字。与原始ddl一样可以指定多个更改,用逗号分隔。
绝大部分情况下表上需要有主键或唯一索引,因为工具在运行当中为了保证新表也是最新的,需要旧表上创建 DELETE和UPDATE 触发器,同步到新表的时候有主键会更快。个别情况是,当alter操作就是在c1列上建立主键时,DELETE触发器将基于c1列。
子句不支持 rename 去给表重命名。
alter命令不支持给索引重命名,需要先drop再add,在pt-osc也一样。(mysql 5.7 支持 RENAME INDEX old_index_name TO new_index_name);但给字段重命名,千万不要drop-add,整列数据会丢失,使用change col1 col1_new type constraint(保持类型和约束一致,否则相当于修改 column type,不能online)
子句如果是add column并且定义了not null,那么必须指定default值,否则会失败。
如果要删除外键(名 fk_foo),使用工具的时候外键名要加下划线,比如–alter “DROP FOREIGN KEY _fk_foo”
D=db_name,t=table_name <<<<<<注意格式
指定要ddl的数据库名和表名
–max-load
默认为Threads_running=25。每个chunk拷贝完后,会检查 SHOW GLOBAL STATUS 的内容,检查指标是否超过了指定的阈值。如果超过,则先暂停。这里可以用逗号分隔,指定多个条件,每个条件格式: status指标=MAX_VALUE或者status指标:MAX_VALUE。如果不指定MAX_VALUE,那么工具会这只其为当前值的120%。
因为拷贝行有可能会给部分行上锁,Threads_running 是判断当前数据库负载的绝佳指标。
–max-lag
默认1s。每个chunk拷贝完成后,会查看所有复制Slave的延迟情况(Seconds_Behind_Master)。要是延迟大于该值,则暂停复制数据,直到所有从的滞后小于这个值。–check-interval配合使用,指定出现从库滞后超过 max-lag,则该工具将睡眠多长时间,默认1s,再检查。如–max-lag=5 –check-interval=2。
熟悉percona-toolkit的人都知道–recursion-method可以用来指定从库dsn记录。另外,如果从库被停止,将会永远等待,直到从开始同步,并且延迟小于该值。
–chunk-time
默认0.5s,即拷贝数据行的时候,为了尽量保证0.5s内拷完一个chunk,动态调整chunk-size的大小,以适应服务器性能的变化。
也可以通过另外一个选项–chunk-size禁止动态调整,即每次固定拷贝 1k 行,如果指定则默认1000行,且比 chunk-time 优先生效
–set-vars
使用pt-osc进行ddl要开一个session去操作,set-vars可以在执行alter之前设定这些变量,比如默认会设置–set-vars “wait_timeout=10000,innodb_lock_wait_timeout=1,lock_wait_timeout=60”
因为使用pt-osc之后ddl的速度会变慢,所以预计2.5h只能还不能改完,记得加大wait_timeout。
–dry-run
创建和修改新表,但不会创建触发器、复制数据、和替换原表。并不真正执行,可以看到生成的执行语句,了解其执行     步骤与细节,和–print配合最佳
–execute
确定修改表,则指定该参数。真正执行alter。–dry-run与–execute必须指定一个,二者相互排斥
–chunk-size
默认为1000 row

三、使用疑惑(限制)
3.1原表上不能有触发器存在
这个在mysql 5.7.7+版本得以解决,并且在这个pt版本中也有参数来做支持了

3.2通过触发器写数据到临时新表,会不会出现数据不一致或异常(可能会)
在pt-osc一张表时,有应用发生delete操作,通过trigger触发删除操作,此时old表老数据还没有通过chunk复制到新表中,trigger动作结束后,chunk copy的老数据出现在新表中,导致数据不一致

3.3.为什么外键那么特殊
假设 tb1 是要修改的表,tb2 有外键依赖于 tb1,_tb1_new 是 alter tb1 产生的新临时表。
这里的外键不是看tb1上是否存在外键,而是作为子表的 tb2。主要问题在 rename tb1 时,tb1“不存在”导致tb2的外键认为参考失败,不允许rename。
pt-osc提供–alter-foreign-keys-method选项来决定怎么处理这种情况:
rebuild_constraints,优先采用这种方式
它先通过 alter table tb2 drop fk1,add _fk1 重建外键参考,指向新表
再 rename tb1 tb1_old, _tb1_new tb1 ,交换表名,不影响客户端
删除旧表 tb1_old
但如果字表tb2太大,以致alter操作可能耗时过长,有可能会强制选择 drop_swap。
涉及的主要方法在 pt-online-schema-change 文件的 determine_alter_fk_method, rebuild_constraints, swap_tables三个函数中。
drop_swap,
禁用t2表外键约束检查 FOREIGN_KEY_CHECKS=0
然后 drop tb1 原表
再 rename _tb1_new tb1
这种方式速度更快,也不会阻塞请求。但有风险,
第一,drop表的瞬间到rename过程,原表t1是不存在的,遇到请求会报错;
第二,如果因为bug或某种原因,旧表已删,新表rename失败,那就太晚了,但这种情况很少见。
This method forces –no-swap-tables and –no-drop-old-table.
表间存在外键参考关系,建议不通过表定义强制约束

3.4.使用之前需要对磁盘容量进行评估
使用OSC会使增加一倍的空间,包括索引
而且在 Row Based Replication 下,还会写一份binlog。不要想当然使用–set-vars去设置 sql_log_bin=0,因为在这个session级别,alter语句也要在从库上执行,除非你对从库另有打算。

3.5.使用pt-osc原生5.6 online ddl相比,如何选择
online ddl在必须copy table时成本较高,不宜采用
pt-osc工具在存在触发器时,不适用
修改索引、外键、列名时,优先采用online ddl,并指定 ALGORITHM=INPLACE
其它情况使用pt-osc,虽然存在copy data
pt-osc比online ddl要慢一倍左右,因为它是根据负载调整的
无论哪种方式都选择的业务低峰期执行
特殊情况需要利用主从特性,先alter从库,主备切换,再改原主库

四、操作流程分析1、开启模拟业务

./sysbench --mysql-user=mzhang --mysql-db=ptosc --mysql-password=mzhang --mysql-host=10.148.180.103 --mysql-port=3307 --test=tests/db/oltp.lua --oltp_tables_count=5 --oltp-table-size=1000 --rand-init=on prepare

./sysbench --mysql-user=mzhang --mysql-db=ptosc --mysql-password=mzhang --mysql-host=10.148.180.103 --mysql-port=3307 --test=tests/db/oltp.lua --oltp_tables_count=5 --oltp-table-size=100000 --num-threads=8 --oltp-read-only=off --report-interval=10 --rand-type=uniform --max-time=12000 --max-requests=0 --percentile=99 run

2、通过general log来看pt-osc流程

pt-online-schema-change --host=10.148.180.103 --port=3307 --user=mzhang --password=mzhang \ 
--alter "add column c5 varchar(8) not null default '' " \
D=ptosc,t=sbtest1 --execute --recursion-method=none
2019-08-06T07:34:06.885522Z 81 Connect mzhang@10.148.180.103 on ptosc using TCP/IP
2019-08-06T07:34:06.885749Z 81 Query set autocommit=1
2019-08-06T07:34:06.885989Z 81 Query SHOW VARIABLES LIKE 'innodb\_lock_wait_timeout'
2019-08-06T07:34:06.887454Z 81 Query SET SESSION innodb_lock_wait_timeout=1
2019-08-06T07:34:06.887606Z 81 Query SHOW VARIABLES LIKE 'lock\_wait_timeout'
2019-08-06T07:34:06.888503Z 81 Query SET SESSION lock_wait_timeout=60
2019-08-06T07:34:06.888652Z 81 Query SHOW VARIABLES LIKE 'wait\_timeout'
2019-08-06T07:34:06.889558Z 81 Query SET SESSION wait_timeout=10000
2019-08-06T07:34:06.889705Z 81 Query SELECT @@SQL_MODE
2019-08-06T07:34:06.889836Z 81 Query SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'*/
2019-08-06T07:34:06.889968Z 81 Query SELECT @@server_id /*!50038 , @@hostname*/

2019-08-06T07:34:06.890471Z 82 Connect mzhang@10.148.180.103 on ptosc using TCP/IP
2019-08-06T07:34:06.890581Z 82 Query set autocommit=1
2019-08-06T07:34:06.890749Z 82 Query SHOW VARIABLES LIKE 'innodb\_lock_wait_timeout'
2019-08-06T07:34:06.892091Z 82 Query SET SESSION innodb_lock_wait_timeout=1
2019-08-06T07:34:06.892216Z 82 Query SHOW VARIABLES LIKE 'lock\_wait_timeout'
2019-08-06T07:34:06.893003Z 82 Query SET SESSION lock_wait_timeout=60
2019-08-06T07:34:06.893128Z 82 Query SHOW VARIABLES LIKE 'wait\_timeout'
2019-08-06T07:34:06.893933Z 82 Query SET SESSION wait_timeout=10000
2019-08-06T07:34:06.894017Z 82 Query SELECT @@SQL_MODE
2019-08-06T07:34:06.894098Z 82 Query SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'*/
2019-08-06T07:34:06.894195Z 82 Query SELECT @@server_id /*!50038 , @@hostname*/

2019-08-06T07:34:06.894360Z 81 Query SHOW VARIABLES LIKE 'wsrep_on'
2019-08-06T07:34:06.895293Z 81 Query SHOW VARIABLES LIKE 'version%'
2019-08-06T07:34:06.896313Z 81 Query SHOW ENGINES
2019-08-06T07:34:06.896626Z 81 Query SHOW VARIABLES LIKE 'innodb_version'
2019-08-06T07:34:06.897846Z 81 Query SHOW VARIABLES LIKE 'innodb_stats_persistent'
2019-08-06T07:34:06.898858Z 81 Query SHOW GLOBAL STATUS LIKE 'Threads_running'
2019-08-06T07:34:06.899581Z 81 Query SHOW GLOBAL STATUS LIKE 'Threads_running'
2019-08-06T07:34:06.900290Z 81 Query SELECT CONCAT(@@hostname, @@port)
2019-08-06T07:34:06.900556Z 81 Query SHOW TABLES FROM `ptosc` LIKE 'sbtest1'
2019-08-06T07:34:06.900740Z 81 Query SELECT VERSION()
2019-08-06T07:34:06.900853Z 81 Query SHOW TRIGGERS FROM `ptosc` LIKE 'sbtest1' <<<查看操作表上是否已存在触发器
2019-08-06T07:34:06.901211Z 81 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */
2019-08-06T07:34:06.901287Z 81 Query USE `ptosc`
2019-08-06T07:34:06.901423Z 81 Query SHOW CREATE TABLE `ptosc`.`sbtest1`
2019-08-06T07:34:06.901574Z 81 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */
2019-08-06T07:34:06.901907Z 81 Query EXPLAIN SELECT * FROM `ptosc`.`sbtest1` WHERE 1=1
2019-08-06T07:34:06.902262Z 81 Query SELECT table_schema, table_name FROM information_schema.key_column_usage WHERE referenced_table_schema='ptosc' AND referenced_table_name='sbtest1'
2019-08-06T07:34:06.921415Z 81 Query SHOW VARIABLES LIKE 'wsrep_on'
2019-08-06T07:34:06.922466Z 81 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */
2019-08-06T07:34:06.922531Z 81 Query USE `ptosc`
2019-08-06T07:34:06.922608Z 81 Query SHOW CREATE TABLE `ptosc`.`sbtest1`
2019-08-06T07:34:06.922733Z 81 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */
2019-08-06T07:34:06.922906Z 81 Query CREATE TABLE `ptosc`.`_sbtest1_new` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`k` int(10) unsigned NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=99995 DEFAULT CHARSET=utf8 MAX_ROWS=1000000 <<<<基于老表的定义创建一个新表与其结构一样
2019-08-06T07:34:06.956386Z 81 Query ALTER TABLE `ptosc`.`_sbtest1_new` add column c5 varchar(8) not null default '' <<<<在新表上添加新字段
2019-08-06T07:34:07.002789Z 81 Query /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */
2019-08-06T07:34:07.002903Z 81 Query USE `ptosc`
2019-08-06T07:34:07.003017Z 81 Query SHOW CREATE TABLE `ptosc`.`_sbtest1_new`
2019-08-06T07:34:07.003216Z 81 Query /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */
2019-08-06T07:34:07.003833Z 81 Query SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'DELETE' AND ACTION_TIMING = 'AFTER' AND TRIGGER_SCHEMA = 'ptosc' AND EVENT_OBJECT_TABLE = 'sbtest1'
2019-08-06T07:34:07.004407Z 81 Query SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'UPDATE' AND ACTION_TIMING = 'AFTER' AND TRIGGER_SCHEMA = 'ptosc' AND EVENT_OBJECT_TABLE = 'sbtest1'
2019-08-06T07:34:07.004843Z 81 Query SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'INSERT' AND ACTION_TIMING = 'AFTER' AND TRIGGER_SCHEMA = 'ptosc' AND EVENT_OBJECT_TABLE = 'sbtest1'
2019-08-06T07:34:07.005313Z 81 Query SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'DELETE' AND ACTION_TIMING = 'BEFORE' AND TRIGGER_SCHEMA = 'ptosc' AND EVENT_OBJECT_TABLE = 'sbtest1'
2019-08-06T07:34:07.005749Z 81 Query SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'UPDATE' AND ACTION_TIMING = 'BEFORE' AND TRIGGER_SCHEMA = 'ptosc' AND EVENT_OBJECT_TABLE = 'sbtest1'
2019-08-06T07:34:07.006234Z 81 Query SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'INSERT' AND ACTION_TIMING = 'BEFORE' AND TRIGGER_SCHEMA = 'ptosc' AND EVENT_OBJECT_TABLE = 'sbtest1' <<<<再次仔细检查是否相关DELETE,UPDATE,INSERT AFTER|BEFOR触发器信息

2019-08-06T07:34:07.006630Z 81 Query CREATE TRIGGER `pt_osc_ptosc_sbtest1_del` AFTER DELETE ON `ptosc`.`sbtest1` FOR EACH ROW DELETE IGNORE FROM `ptosc`.`_sbtest1_new` WHERE `ptosc`.`_sbtest1_new`.`id` <=> OLD.`id`
2019-08-06T07:34:07.033552Z 81 Query CREATE TRIGGER `pt_osc_ptosc_sbtest1_upd` AFTER UPDATE ON `ptosc`.`sbtest1` FOR EACH ROW BEGIN DELETE IGNORE FROM `ptosc`.`_sbtest1_new` WHERE !(OLD.`id` <=> NEW.`id`) AND `ptosc`.`_sbtest1_new`.`id` <=> OLD.`id`;REPLACE INTO `ptosc`.`_sbtest1_new` (`id`, `k`, `c`, `pad`) VALUES (NEW.`id`, NEW.`k`, NEW.`c`, NEW.`pad`);END
2019-08-06T07:34:07.034996Z 81 Query CREATE TRIGGER `pt_osc_ptosc_sbtest1_ins` AFTER INSERT ON `ptosc`.`sbtest1` FOR EACH ROW REPLACE INTO `ptosc`.`_sbtest1_new` (`id`, `k`, `c`, `pad`) VALUES (NEW.`id`, NEW.`k`, NEW.`c`, NEW.`pad`)<<<<<创建触发器,这里注意update时,pt做了两件事情分别在新表里删除操作记录,然后在replace into一条记录

2019-08-06T07:34:07.036635Z 81 Query EXPLAIN SELECT * FROM `ptosc`.`sbtest1` WHERE 1=1
2019-08-06T07:34:07.037567Z 81 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `ptosc`.`sbtest1` FORCE INDEX(`PRIMARY`) ORDER BY `id` LIMIT 1 /*first lower boundary*/
2019-08-06T07:34:07.037877Z 81 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `ptosc`.`sbtest1` FORCE INDEX (`PRIMARY`) WHERE `id` IS NOT NULL ORDER BY `id` LIMIT 1 /*key_len*/
2019-08-06T07:34:07.098227Z 81 Query EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ * FROM `ptosc`.`sbtest1` FORCE INDEX (`PRIMARY`) WHERE `id` >= '1' /*key_len*/

这个过程是个循环过程
2019-08-06T07:34:07.112361Z 81 Query EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `ptosc`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) ORDER BY `id` LIMIT 999, 2 /*next chunk boundary*/ <<<按照约定chunk来查询数据,减少后续锁持有范围
2019-08-06T07:34:07.112558Z 81 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `ptosc`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) ORDER BY `id` LIMIT 999, 2 /*next chunk boundary*/
2019-08-06T07:34:07.113002Z 81 Query EXPLAIN SELECT `id`, `k`, `c`, `pad` FROM `ptosc`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) AND ((`id` <= '1000')) LOCK IN SHARE MODE /*explain pt-online-schema-change 133584 copy nibble*/
2019-08-06T07:34:07.113349Z 81 Query INSERT LOW_PRIORITY IGNORE INTO `ptosc`.`_sbtest1_new` (`id`, `k`, `c`, `pad`) SELECT `id`, `k`, `c`, `pad` FROM `ptosc`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) AND ((`id` <= '1000')) LOCK IN SHARE MODE /*pt-online-schema-change 133584 copy nibble*/ <<<<向新表中批量插入数据
2019-08-06T07:34:07.190021Z 81 Query SHOW WARNINGS
2019-08-06T07:34:07.190352Z 81 Query SHOW GLOBAL STATUS LIKE 'Threads_running' <<检查一次批量后,mysql thread running负载情况,满足条件,继续下批量循环

2019-08-06T07:34:07.192028Z 81 Query EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `ptosc`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) ORDER BY `id` LIMIT 6526, 2 /*next chunk boundary*/
2019-08-06T07:34:07.192349Z 81 Query SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `ptosc`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) ORDER BY `id` LIMIT 6526, 2 /*next chunk boundary*/
2019-08-06T07:34:07.194379Z 81 Query EXPLAIN SELECT `id`, `k`, `c`, `pad` FROM `ptosc`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) AND ((`id` <= '20450')) LOCK IN SHARE MODE /*explain pt-online-schema-change 133584 copy nibble*/
2019-08-06T07:34:07.194697Z 81 Query INSERT LOW_PRIORITY IGNORE INTO `ptosc`.`_sbtest1_new` (`id`, `k`, `c`, `pad`) SELECT `id`, `k`, `c`, `pad` FROM `ptosc`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) AND ((`id` <= '20450')) LOCK IN SHARE MODE /*pt-online-schema-change 133584 copy nibble*/
2019-08-06T07:34:07.416409Z 81 Query SHOW WARNINGS
2019-08-06T07:34:07.416753Z 81 Query SHOW GLOBAL STATUS LIKE 'Threads_running'
.....
2019-08-06T07:34:07.771898Z 81 Query SHOW VARIABLES LIKE 'version%'
2019-08-06T07:34:07.773669Z 81 Query SHOW ENGINES
2019-08-06T07:34:07.773975Z 81 Query SHOW VARIABLES LIKE 'innodb_version'
2019-08-06T07:34:07.775721Z 81 Query ANALYZE TABLE `ptosc`.`_sbtest1_new` /* pt-online-schema-change */ <<<收集新表统计信息
2019-08-06T07:34:07.779636Z 81 Query RENAME TABLE `ptosc`.`sbtest1` TO `ptosc`.`_sbtest1_old`, `ptosc`.`_sbtest1_new` TO `ptosc`.`sbtest1` <<<rename 表,交换表名
2019-08-06T07:34:07.783906Z 81 Query DROP TABLE IF EXISTS `ptosc`.`_sbtest1_old` <<<默认删除操作后的老表
2019-08-06T07:34:07.807794Z 81 Query DROP TRIGGER IF EXISTS `ptosc`.`pt_osc_ptosc_sbtest1_del`
2019-08-06T07:34:07.808163Z 81 Query DROP TRIGGER IF EXISTS `ptosc`.`pt_osc_ptosc_sbtest1_upd`
2019-08-06T07:34:07.808467Z 81 Query DROP TRIGGER IF EXISTS `ptosc`.`pt_osc_ptosc_sbtest1_ins` <<<<删除pt-osc工具创建的trigger
2019-08-06T07:34:07.808877Z 81 Query SHOW TABLES FROM `ptosc` LIKE '\_sbtest1\_new' <<<查看表结构信息
2019-08-06T07:34:07.809315Z 82 Quit
2019-08-06T07:34:07.809400Z 81 Quit

3、表常用修改操作

如果要验证查看将
–execute 替换成–print –dry-run

添加删除索引

pt-online-schema-change --host=10.148.180.103 --port=3307 --user=mzhang --password=mzhang \ 
--alter "DROP KEY idx_01, ADD KEY idx_02(c5) " \
D=ptosc,t=sbtest1 --execute --recursion-method=none

添加列字段

pt-online-schema-change --host=10.148.180.103 --port=3307 --user=mzhang --password=mzhang \ 
--alter "add column c5 varchar(8) not null default '' " \
D=ptosc,t=sbtest1 --execute --recursion-method=none

修改列类型

pt-online-schema-change --host=10.148.180.103 --port=3307 --user=mzhang --password=mzhang \ 
--alter "CHANGE c5 varchar(60) NOT NULL DEFAULT '' " \
D=ptosc,t=sbtest1 --execute --recursion-method=none

修改删除主键
使用pt-osc去修改删除主键,务必同时添加原主键为 UNIQUE KEY,否则触发delete操作时很有可能导致性能问题
DELETE 触发器:新表上_sbtest1_new的主键字段在旧表sbtest1上不存在,无法根据主键条件触发删除新表_sbtest1_new数据,所以pt-osc修改了trigger

非沙删除主键:CREATE TRIGGER `pt_osc_ptosc_sbtest1_del` AFTER DELETE ON `ptosc`.`sbtest1` FOR EACH ROW DELETE IGNORE FROM `ptosc`.`_sbtest1_new` WHERE `ptosc`.`_sbtest1_new`.`id` <=> OLD.`id`
CREATE TRIGGER `pt_osc_ptosc_sbtest1_del` AFTER DELETE ON `ptosc`.`sbtest1` FOR EACH ROW DELETE IGNORE FROM `ptosc`.`_sbtest1_new` WHERE `ptosc`.`_sbtest1_new`.`id` <=> OLD.`id` AND `ptosc`.`_sbtest1_new`.`k` <=> OLD.`k`
pt-online-schema-change --host=10.148.180.103 --port=3307 --user=mzhang --password=mzhang \ 
--alter "DROP PRIMARY KEY,add column pk int auto_increment primary key,add unique key uk_id_k(id,k)" \
D=ptosc,t=sbtest1 --execute --recursion-method=none

mysql5.7 1032 1062自动化修复

mysql5.7 1032 1062自动修复

随着mysql MS批量式线上部署,带来的问题也较多,今天我们就介绍一下最棘手的一个问题:复制中断,我们该怎么办?

遇到最多的是1032和1062错误,至于错误的原理,这里不想过多分析:记住一条即可,mysql是基于逻辑的复制。很多网上教程使用slave_skip_errors进行跳过(这是掩耳盗铃);在我们的环境里是禁止的。手工修复,这里也不废话:结合业务和报错位置利用mysqlbinlog工具找出报错位置进行解决即可,速度太慢;结合历史的操作中,整个恢复过程估计需要30min-1h,还是需要熟悉业务。为了解决这个问题,利用python写了一个复制报错的1032和1062自动化修复脚本

脚本见https://github.com/trsenzhang/AOMmysql/项目中的replCheckStatus.py

感觉还可以请:star一下,哈哈

满足如下条件可自动修复1032和1062导致复制中断的错误:
1.数据库的binlog是基于ROW格式
2.binlog版本为v4
3.数据库的版本为5.7
4.脚本运行在slave端
5.在进行大批量的数据修复时,拉去master端binlog时,会导致MS之间网络带宽消耗
6.实际上生产中,建议自己的测试环境中测试后,

mysql> select * from sbtest1 where id>=99800;
+-------+-----+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+-------+-----+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 99811 | 100 | 90165491267-07275869234-19011205074-26444610383-72122436809-46694151734-03204627361-95242202339-88068233961-87971936484 | 89276814852-51067567943-14894208093-22203139505-76590959776 |
| 99831 | 100 | 33781043000-25534118395-26582187843-30125017791-34879940111-81915141800-78796130797-48796519170-69301562901-89603068188 | 21473874594-31729191357-49416492581-96928763763-88555030957 |
| 99850 | 100 | 52997681569-07256577082-70696020254-58728313014-64278657880-11737495601-96272462709-19907083505-55161413902-04166827371 | 03943355714-15233939093-77191977933-80871069856-13998664867 |
| 99860 | 100 | 22149089534-99257768432-42374479937-27983548918-34777085571-60122511474-68156039839-93575679803-25858461064-38693071750 | 61706698622-03408243288-81616510318-77675951439-83647865812 |
| 99866 | 100 | 97130276471-62038331816-81543705125-03598493850-89399270787-00769389132-67423665424-86329756265-55055790846-35049166232 | 29710875742-96213160947-25350860664-06705248824-91892213385 |
| 99887 | 100 | 69817170669-41710523781-33204590013-59847767425-06017308425-48878655484-03102622366-86054674368-67448987765-56237448865 | 77033115449-02879844350-86983597796-12759679132-70415094822 |
| 99897 | 100 | 37254441279-25082190545-71567616264-11611592521-17733316489-36596964903-46764438796-48309081109-47254822344-70662900915 | 45935391601-84007613456-49474460762-07492663463-61332150729 |
| 99949 | 100 | | 34580886030-82638848912-17448500901-05181865304-45488435594 |
| 99968 | 100 | | 00859334580-40505270334-09446901828-56582294486-93512954382 |
+-------+-----+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+

mysql> delete from sbtest1 where id>=99800;
Query OK, 9 rows affected (0.02 sec)

mysql> select * from sbtest1 where id>=99800;
Empty set (0.00 sec)

1.模拟一条数据删除操作

update trsen.sbtest1 set c='111111111111' where id=99968;
delete from trsen.sbtest1 where id=99949;

利用脚本修复后结果如下:

slave端数据
mysql> select * from sbtest1 where id>=99800;
+-------+-----+--------------+-------------------------------------------------------------+
| id | k | c | pad |
+-------+-----+--------------+-------------------------------------------------------------+
| 99968 | 100 | 111111111111 | 00859334580-40505270334-09446901828-56582294486-93512954382 |
+-------+-----+--------------+-------------------------------------------------------------+

master端数据:

因为99811-99897区间数据还没有进行操作,主从复制不会出现sql_thread异常
mysql> select * from sbtest1 where id>=99800;
+-------+-----+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+-------+-----+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 99811 | 100 | 90165491267-07275869234-19011205074-26444610383-72122436809-46694151734-03204627361-95242202339-88068233961-87971936484 | 89276814852-51067567943-14894208093-22203139505-76590959776 |
| 99831 | 100 | 33781043000-25534118395-26582187843-30125017791-34879940111-81915141800-78796130797-48796519170-69301562901-89603068188 | 21473874594-31729191357-49416492581-96928763763-88555030957 |
| 99850 | 100 | 52997681569-07256577082-70696020254-58728313014-64278657880-11737495601-96272462709-19907083505-55161413902-04166827371 | 03943355714-15233939093-77191977933-80871069856-13998664867 |
| 99860 | 100 | 22149089534-99257768432-42374479937-27983548918-34777085571-60122511474-68156039839-93575679803-25858461064-38693071750 | 61706698622-03408243288-81616510318-77675951439-83647865812 |
| 99866 | 100 | 97130276471-62038331816-81543705125-03598493850-89399270787-00769389132-67423665424-86329756265-55055790846-35049166232 | 29710875742-96213160947-25350860664-06705248824-91892213385 |
| 99887 | 100 | 69817170669-41710523781-33204590013-59847767425-06017308425-48878655484-03102622366-86054674368-67448987765-56237448865 | 77033115449-02879844350-86983597796-12759679132-70415094822 |
| 99897 | 100 | 37254441279-25082190545-71567616264-11611592521-17733316489-36596964903-46764438796-48309081109-47254822344-70662900915 | 45935391601-84007613456-49474460762-07492663463-61332150729 |
| 99968 | 100 | 111111111111 | 00859334580-40505270334-09446901828-56582294486-93512954382 |
+-------+-----+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+

2.模拟批量删除和更新

update trsen.sbtest1 set c='222222222' where id>=99811 and id<=99850;
delete from trsen.sbtest1 where id >=99860 and id <=99968

slave端数据:

mysql> select * from sbtest1 where id>=99800;
+-------+-----+-----------+-------------------------------------------------------------+
| id | k | c | pad |
+-------+-----+-----------+-------------------------------------------------------------+
| 99811 | 100 | 222222222 | 89276814852-51067567943-14894208093-22203139505-76590959776 |
| 99831 | 100 | 222222222 | 21473874594-31729191357-49416492581-96928763763-88555030957 |
| 99850 | 100 | 222222222 | 03943355714-15233939093-77191977933-80871069856-13998664867 |
+-------+-----+-----------+-------------------------------------------------------------+
3 rows in set (0.00 sec)

master端数据

mysql> select * from sbtest1 where id>=99800;
+-------+-----+-----------+-------------------------------------------------------------+
| id | k | c | pad |
+-------+-----+-----------+-------------------------------------------------------------+
| 99811 | 100 | 222222222 | 89276814852-51067567943-14894208093-22203139505-76590959776 |
| 99831 | 100 | 222222222 | 21473874594-31729191357-49416492581-96928763763-88555030957 |
| 99850 | 100 | 222222222 | 03943355714-15233939093-77191977933-80871069856-13998664867 |
+-------+-----+-----------+-------------------------------------------------------------+
3 rows in set (0.00 sec)

测试中日志信息如下:

[root@dzst151 log]# more auto_check_repl_status_repair-2019-07-02.log 
2019-07-02 16:00:17.522898 : INFO : The platform check pass.
2019-07-02 16:00:17.522898 : INFO : get_rpl_mode r -> ON
2019-07-02 16:00:17.522898 : INFO : rpl_mode 1 
2019-07-02 16:00:17.522898 : INFO : 1032
2019-07-02 16:00:17.522898 : INFO : multi dml have SMTM_END_F,is ok.
2019-07-02 16:00:17.522898 : INFO : end_log_pos :2646
2019-07-02 16:00:17.522898 : INFO : row_recode : ['---line---', '### DELETE `trsen`.`sbtest1`', '### WHERE', '### @1=99949', '### @2=100', "### @3=''", "### @4='34580886030-82638848912-17448500901-05181865304-45488435594'", '
---line---']
2019-07-02 16:00:17.522898 : INFO : sql_list : ['DELETE `trsen`.`sbtest1`', 'WHERE', 'id=99949', 'and k=100', "and c=''", "and pad='34580886030-82638848912-17448500901-05181865304-45488435594'", '---line---']
2019-07-02 16:00:17.522898 : INFO : will execute sql: SELECT 1 from `trsen`.`sbtest1` WHERE id=99949 and k=100 and c='' and pad='34580886030-82638848912-17448500901-05181865304-45488435594';
2019-07-02 16:00:17.522898 : INFO : slave execute sql: error:1032 --run SQL: INSERT INTO `trsen`.`sbtest1` VALUES( 99949 , 100 , '' , '34580886030-82638848912-17448500901-05181865304-45488435594');
2019-07-02 16:00:17.522898 : INFO : slave repaired.
2019-07-02 16:01:10.580024 : INFO : The platform check pass.
2019-07-02 16:01:10.580024 : INFO : get_rpl_mode r -> ON
2019-07-02 16:01:10.580024 : INFO : rpl_mode 1 
2019-07-02 16:01:10.580024 : INFO : 1032
2019-07-02 16:01:10.580024 : INFO : multi dml have SMTM_END_F,is ok.
2019-07-02 16:01:10.580024 : INFO : end_log_pos :3138
2019-07-02 16:01:10.580024 : INFO : row_recode : ['---line---', '### UPDATE `trsen`.`sbtest1`', '### WHERE', '### @1=99968', '### @2=100', "### @3=''", "### @4='00859334580-40505270334-09446901828-56582294486-93512954382'", '
---line---']
2019-07-02 16:01:10.580024 : INFO : sql_list : ['UPDATE `trsen`.`sbtest1`', 'WHERE', 'id=99968', 'and k=100', "and c=''", "and pad='00859334580-40505270334-09446901828-56582294486-93512954382'", '---line---']
2019-07-02 16:01:10.580024 : INFO : will execute sql: SELECT 1 from `trsen`.`sbtest1` WHERE id=99968 and k=100 and c='' and pad='00859334580-40505270334-09446901828-56582294486-93512954382';
2019-07-02 16:01:10.580024 : INFO : slave execute sql: error:1032 --run SQL: INSERT INTO `trsen`.`sbtest1` VALUES( 99968 , 100 , '' , '00859334580-40505270334-09446901828-56582294486-93512954382');
2019-07-02 16:01:10.580024 : INFO : slave repaired.
2019-07-02 16:05:30.649844 : INFO : The platform check pass.
2019-07-02 16:05:30.649844 : INFO : get_rpl_mode r -> ON
2019-07-02 16:05:30.649844 : INFO : rpl_mode 1 
2019-07-02 16:05:30.649844 : INFO : 1032
2019-07-02 16:05:30.649844 : INFO : multi dml have SMTM_END_F,is ok.
2019-07-02 16:05:30.649844 : INFO : end_log_pos :4303
2019-07-02 16:05:30.649844 : INFO : row_recode : ['---line---', '### UPDATE `trsen`.`sbtest1`', '### WHERE', '### @1=99811', '### @2=100', "### @3='90165491267-07275869234-19011205074-26444610383-72122436809-46694151734-0320462
7361-95242202339-88068233961-87971936484'", "### @4='89276814852-51067567943-14894208093-22203139505-76590959776'", '---line---', '### UPDATE `trsen`.`sbtest1`', '### WHERE', '### @1=99831', '### @2=100', "### @3='33781043000
-25534118395-26582187843-30125017791-34879940111-81915141800-78796130797-48796519170-69301562901-89603068188'", "### @4='21473874594-31729191357-49416492581-96928763763-88555030957'", '---line---', '### UPDATE `trsen`.`sbtest1`', '
### WHERE', '### @1=99850', '### @2=100', "### @3='52997681569-07256577082-70696020254-58728313014-64278657880-11737495601-96272462709-19907083505-55161413902-04166827371'", "### @4='03943355714-15233939093-77191977933-808710
69856-13998664867'", '---line---']
2019-07-02 16:05:30.649844 : INFO : sql_list : ['UPDATE `trsen`.`sbtest1`', 'WHERE', 'id=99811', 'and k=100', "and c='90165491267-07275869234-19011205074-26444610383-72122436809-46694151734-03204627361-95242202339-88068233961-8797193
6484'", "and pad='89276814852-51067567943-14894208093-22203139505-76590959776'", '---line---', 'UPDATE `trsen`.`sbtest1`', 'WHERE', 'id=99831', 'and k=100', "and c='33781043000-25534118395-26582187843-30125017791-34879940111-81915141
800-78796130797-48796519170-69301562901-89603068188'", "and pad='21473874594-31729191357-49416492581-96928763763-88555030957'", '---line---', 'UPDATE `trsen`.`sbtest1`', 'WHERE', 'id=99850', 'and k=100', "and c='52997681569-072565770
82-70696020254-58728313014-64278657880-11737495601-96272462709-19907083505-55161413902-04166827371'", "and pad='03943355714-15233939093-77191977933-80871069856-13998664867'", '---line---']
2019-07-02 16:05:30.649844 : INFO : will execute sql: SELECT 1 from `trsen`.`sbtest1` WHERE id=99811 and k=100 and c='90165491267-07275869234-19011205074-26444610383-72122436809-46694151734-03204627361-95242202339-88068233961-879719
36484' and pad='89276814852-51067567943-14894208093-22203139505-76590959776';
2019-07-02 16:05:30.649844 : INFO : slave execute sql: error:1032 --run SQL: INSERT INTO `trsen`.`sbtest1` VALUES( 99811 , 100 , '90165491267-07275869234-19011205074-26444610383-72122436809-46694151734-03204627361-95242202339-88068
233961-87971936484' , '89276814852-51067567943-14894208093-22203139505-76590959776');
2019-07-02 16:05:30.649844 : INFO : will execute sql: SELECT 1 from `trsen`.`sbtest1` WHERE id=99831 and k=100 and c='33781043000-25534118395-26582187843-30125017791-34879940111-81915141800-78796130797-48796519170-69301562901-896030
68188' and pad='21473874594-31729191357-49416492581-96928763763-88555030957';
2019-07-02 16:05:30.649844 : INFO : slave execute sql: error:1032 --run SQL: INSERT INTO `trsen`.`sbtest1` VALUES( 99831 , 100 , '33781043000-25534118395-26582187843-30125017791-34879940111-81915141800-78796130797-48796519170-69301
562901-89603068188' , '21473874594-31729191357-49416492581-96928763763-88555030957');
2019-07-02 16:05:30.649844 : INFO : will execute sql: SELECT 1 from `trsen`.`sbtest1` WHERE id=99850 and k=100 and c='52997681569-07256577082-70696020254-58728313014-64278657880-11737495601-96272462709-19907083505-55161413902-041668
27371' and pad='03943355714-15233939093-77191977933-80871069856-13998664867';
2019-07-02 16:05:30.649844 : INFO : slave execute sql: error:1032 --run SQL: INSERT INTO `trsen`.`sbtest1` VALUES( 99850 , 100 , '52997681569-07256577082-70696020254-58728313014-64278657880-11737495601-96272462709-19907083505-55161
413902-04166827371' , '03943355714-15233939093-77191977933-80871069856-13998664867');
2019-07-02 16:05:30.649844 : INFO : get_rpl_mode r -> ON
2019-07-02 16:05:30.649844 : INFO : rpl_mode 1 
2019-07-02 16:05:30.649844 : INFO : 1032
2019-07-02 16:05:30.649844 : INFO : multi dml have SMTM_END_F,is ok.
2019-07-02 16:05:30.649844 : INFO : end_log_pos :5489
2019-07-02 16:05:30.649844 : INFO : row_recode : ['---line---', '### DELETE `trsen`.`sbtest1`', '### WHERE', '### @1=99860', '### @2=100', "### @3='22149089534-99257768432-42374479937-27983548918-34777085571-60122511474-6815603
9839-93575679803-25858461064-38693071750'", "### @4='61706698622-03408243288-81616510318-77675951439-83647865812'", '---line---', '### DELETE `trsen`.`sbtest1`', '### WHERE', '### @1=99866', '### @2=100', "### @3='97130276471
-62038331816-81543705125-03598493850-89399270787-00769389132-67423665424-86329756265-55055790846-35049166232'", "### @4='29710875742-96213160947-25350860664-06705248824-91892213385'", '---line---', '### DELETE `trsen`.`sbtest1`', '
### WHERE', '### @1=99887', '### @2=100', "### @3='69817170669-41710523781-33204590013-59847767425-06017308425-48878655484-03102622366-86054674368-67448987765-56237448865'", "### @4='77033115449-02879844350-86983597796-127596
79132-70415094822'", '---line---', '### DELETE `trsen`.`sbtest1`', '### WHERE', '### @1=99897', '### @2=100', "### @3='37254441279-25082190545-71567616264-11611592521-17733316489-36596964903-46764438796-48309081109-47254822344-
70662900915'", "### @4='45935391601-84007613456-49474460762-07492663463-61332150729'", '---line---', '### DELETE `trsen`.`sbtest1`', '### WHERE', '### @1=99968', '### @2=100', "### @3='111111111111'", "### @4='00859334580-4
0505270334-09446901828-56582294486-93512954382'", '---line---']
2019-07-02 16:05:30.649844 : INFO : sql_list : ['DELETE `trsen`.`sbtest1`', 'WHERE', 'id=99860', 'and k=100', "and c='22149089534-99257768432-42374479937-27983548918-34777085571-60122511474-68156039839-93575679803-25858461064-3869307
1750'", "and pad='61706698622-03408243288-81616510318-77675951439-83647865812'", '---line---', 'DELETE `trsen`.`sbtest1`', 'WHERE', 'id=99866', 'and k=100', "and c='97130276471-62038331816-81543705125-03598493850-89399270787-00769389
132-67423665424-86329756265-55055790846-35049166232'", "and pad='29710875742-96213160947-25350860664-06705248824-91892213385'", '---line---', 'DELETE `trsen`.`sbtest1`', 'WHERE', 'id=99887', 'and k=100', "and c='69817170669-417105237
81-33204590013-59847767425-06017308425-48878655484-03102622366-86054674368-67448987765-56237448865'", "and pad='77033115449-02879844350-86983597796-12759679132-70415094822'", '---line---', 'DELETE `trsen`.`sbtest1`', 'WHERE', 'id=998
97', 'and k=100', "and c='37254441279-25082190545-71567616264-11611592521-17733316489-36596964903-46764438796-48309081109-47254822344-70662900915'", "and pad='45935391601-84007613456-49474460762-07492663463-61332150729'", '---line---
', 'DELETE `trsen`.`sbtest1`', 'WHERE', 'id=99968', 'and k=100', "and c='111111111111'", "and pad='00859334580-40505270334-09446901828-56582294486-93512954382'", '---line---']
2019-07-02 16:05:30.649844 : INFO : will execute sql: SELECT 1 from `trsen`.`sbtest1` WHERE id=99860 and k=100 and c='22149089534-99257768432-42374479937-27983548918-34777085571-60122511474-68156039839-93575679803-25858461064-386930
71750' and pad='61706698622-03408243288-81616510318-77675951439-83647865812';
2019-07-02 16:05:30.649844 : INFO : slave execute sql: error:1032 --run SQL: INSERT INTO `trsen`.`sbtest1` VALUES( 99860 , 100 , '22149089534-99257768432-42374479937-27983548918-34777085571-60122511474-68156039839-93575679803-25858
461064-38693071750' , '61706698622-03408243288-81616510318-77675951439-83647865812');
2019-07-02 16:05:30.649844 : INFO : will execute sql: SELECT 1 from `trsen`.`sbtest1` WHERE id=99866 and k=100 and c='97130276471-62038331816-81543705125-03598493850-89399270787-00769389132-67423665424-86329756265-55055790846-350491
66232' and pad='29710875742-96213160947-25350860664-06705248824-91892213385';
2019-07-02 16:05:30.649844 : INFO : slave execute sql: error:1032 --run SQL: INSERT INTO `trsen`.`sbtest1` VALUES( 99866 , 100 , '97130276471-62038331816-81543705125-03598493850-89399270787-00769389132-67423665424-86329756265-55055
790846-35049166232' , '29710875742-96213160947-25350860664-06705248824-91892213385');
2019-07-02 16:05:30.649844 : INFO : will execute sql: SELECT 1 from `trsen`.`sbtest1` WHERE id=99887 and k=100 and c='69817170669-41710523781-33204590013-59847767425-06017308425-48878655484-03102622366-86054674368-67448987765-562374
48865' and pad='77033115449-02879844350-86983597796-12759679132-70415094822';
2019-07-02 16:05:30.649844 : INFO : slave execute sql: error:1032 --run SQL: INSERT INTO `trsen`.`sbtest1` VALUES( 99887 , 100 , '69817170669-41710523781-33204590013-59847767425-06017308425-48878655484-03102622366-86054674368-67448
987765-56237448865' , '77033115449-02879844350-86983597796-12759679132-70415094822');
2019-07-02 16:05:30.649844 : INFO : will execute sql: SELECT 1 from `trsen`.`sbtest1` WHERE id=99897 and k=100 and c='37254441279-25082190545-71567616264-11611592521-17733316489-36596964903-46764438796-48309081109-47254822344-706629
00915' and pad='45935391601-84007613456-49474460762-07492663463-61332150729';
2019-07-02 16:05:30.649844 : INFO : slave execute sql: error:1032 --run SQL: INSERT INTO `trsen`.`sbtest1` VALUES( 99897 , 100 , '37254441279-25082190545-71567616264-11611592521-17733316489-36596964903-46764438796-48309081109-47254
822344-70662900915' , '45935391601-84007613456-49474460762-07492663463-61332150729');
2019-07-02 16:05:30.649844 : INFO : will execute sql: SELECT 1 from `trsen`.`sbtest1` WHERE id=99968 and k=100 and c='111111111111' and pad='00859334580-40505270334-09446901828-56582294486-93512954382';
2019-07-02 16:05:30.649844 : INFO : slave repaired.

mysql5.7下mysqldump原理解析

mysql5.7下mysqldump原理解析

本次解析没有从底层源码做深度解析;代码功底有点差,其次耗时太长;基于当前的时间成本考虑,不划算。我们从他们的备份逻辑上来看;
此次针对两种不同的场景进行分析说明

1.非一致性备份

/usr/local/mysql/bin/mysqldump -uroot -p -S /tmp/mysql3306.sock trsen > /data/mysql/mysql3306/backup/trsen_`date +%Y%d%m`_0.sql

流程图:

备份原理解析
1.获取当前GTID信息
2.在需要备份的DB,查看表
3.对表加read lock
4.利用show create table xxx;进行备份(循环至整个DB下表备份结束)
5.释放表上read lock

流程图可以看出:
1.备份数据不一致
2.备份锁表时间时长和备份内容成正比
3.适合非事务引擎的一致性备份

这样的备份脚本,测试中不会凸显问题,一旦在生产中,就会凸显很大的问题

2.一致性备份

/usr/local/mysql/bin/mysqldump -uroot -proot -S /tmp/mysql3306.sock --master-data=2 --single-transaction trsen > /data/mysql/mysql3306/backup/trsen_`date +%Y%d%m`_2.sql

流程图:

FLUSH TABLES:Closes all open tables, forces all tables in use to be closed, and flushes the query cache and prepared statement cache.
FLUSH TABLES also removes all query results from the query cache, like the RESET QUERY CACHE statement.
说白了:mysqldump对当前需要备份的mysql数据库进行简单评估,是否有大事物之,是否有DDL操作;同时减少第二次FTWRL锁表时间;没有则我们就行备份

备份原理解析
1.第一次flush是不需要锁表的,第一次将所有数据刷新到磁盘,检查能否进行加锁一致性备份
2.FTWRL针对全库加global read lock
3.设置事务隔离级别RR,为了下面进行快照上一致性读而做(RR mysql默认的隔离级别)
4.开启一个一致性快照的事务
5.获取GTID
6.释放FTWRL加的 global read lock
7.做一个还原点
8.查看库里所有表的信息
9.利用show create table xxx;进行备份(循环至整个DB下表备份结束),每个表备份结束时回退到还原点
10.释放掉还原点

流程图可以看出:
1.锁表时间短
2.一致性备份(innodb)
3.备份是否能有效进行,取决于第一步下发的flush tables
4.对于非事务引擎,不会进行一致性备份

mysql多线程逻辑备份工具mydumper

mydumper可以进行多线程的备份恢复,从而解决mysqldump单线程的瓶颈,更高效充分利用磁盘io及cpu资源来加快备份恢复

工具下载地址:https://launchpad.net/mydumper/

1.安装依赖包

yum install openssl-devel cmake gcc-c++ glib2-devel pcre-devel zlib-devel mysql-devel

可能有的os里没有glibc-devel.i686这个包,cmake没有问题,但是make时会报出
/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory

2.进行mydumper目录进行cmake编译,如果是如下信息则说明已可用

[root@trsen:/usr/local/mydumper]# cmake .
-- Using mysql-config: /usr/local/mysql/bin/mysql_config
-- Found MySQL: /usr/local/mysql/include, /usr/local/mysql/lib/libmysqlclient.so;/usr/lib64/libpthread.so;/usr/lib64/libm.so;/usr/lib64/librt.so;/usr/lib64/libdl.so
-- checking for module 'libpcre'
-- found libpcre, version 7.8
-- Found PCRE: /usr/include 
CMake Warning at docs/CMakeLists.txt:9 (message):
Unable to find Sphinx documentation generator
-- ------------------------------------------------
-- MYSQL_CONFIG = /usr/local/mysql/bin/mysql_config
-- CMAKE_INSTALL_PREFIX = /usr/local
-- BUILD_DOCS = ON
-- WITH_BINLOG = OFF
-- RUN_CPPCHECK = OFF
-- Change a values with: cmake -D<Variable>=<Value>
-- ------------------------------------------------
-- 
-- Configuring done
-- Generating done
-- Build files have been written to: /usr/local/mydumper

3.make 安装

[root@trsen:/usr/local/mydumper]#make
Scanning dependencies of target mydumper
[ 25%] Building C object CMakeFiles/mydumper.dir/mydumper.c.o
[ 50%] Building C object CMakeFiles/mydumper.dir/server_detect.c.o
[ 75%] Building C object CMakeFiles/mydumper.dir/g_unix_signal.c.o
Linking C executable mydumper
[ 75%] Built target mydumper
Scanning dependencies of target myloader
[100%] Building C object CMakeFiles/myloader.dir/myloader.c.o
Linking C executable myloader
[100%] Built target myloader

会在make的当前目录下生成两个命令

[root@trsen:/usr/local/mydumper]#ls -lat my*
-rwxr-xr-x 1 root root 54058 Jun 12 11:45 myloader
-rwxr-xr-x 1 root root 178215 Jun 12 11:45 mydumper

4.实际测试

[root@trsen:~]#/usr/local/mydumper/mydumper --host=localhost --user=trsen --password=trsen --port=3306 --database=trsendb --outputdir=/data2 --logfile=/data2/mydumper.log --threads=8 --daemon

默认是8个并发线程

mysql> show processlist;
+----+-------------+-----------+------+---------+--------+----------------------------------+-----------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+------+---------+--------+----------------------------------+-----------------------------------------------------------------------------------------+
| 45 | system user | | NULL | Connect | 8861 | Waiting for master to send event | NULL |
| 46 | system user | | NULL | Connect | 419930 | System lock | NULL |
| 60 | trsen | localhost | NULL | Query | 85 | Sending data | SELECT /*!40001 SQL_NO_CACHE */ * FROM `trsendb`.`data_bair` |
| 61 | trsen | localhost | NULL | Query | 8 | Writing to net | SELECT /*!40001 SQL_NO_CACHE */ * FROM `trsendb`.`data_mob` |
| 62 | trsen | localhost | NULL | Query | 100 | Sending data | SELECT /*!40001 SQL_NO_CACHE */ * FROM `trsendb`.`app_encle` |
| 63 | trsen | localhost | NULL | Query | 14 | Writing to net | SELECT /*!40001 SQL_NO_CACHE */ * FROM `trsendb`.`data_motatus` |
| 64 | trsen | localhost | NULL | Query | 70 | Sending data | SELECT /*!40001 SQL_NO_CACHE */ * FROM `trsendb`.`data_extrt_red` |
| 65 | trsen | localhost | NULL | Query | 25 | Writing to net | SELECT /*!40001 SQL_NO_CACHE */ * FROM `trsendb`.`data_j_sre` |
| 66 | trsen | localhost | NULL | Query | 69 | Sending data | SELECT /*!40001 SQL_NO_CACHE */ * FROM `trsendb`.`data_ect_rerd_b` |
| 67 | trsen | localhost | NULL | Query | 100 | Writing to net | SELECT /*!40001 SQL_NO_CACHE */ * FROM `trsendb`.`data_bai_iion_des` |
| 68 | trsen | localhost | NULL | Query | 0 | init | show processlist |
+----+-------------+-----------+------+---------+--------+----------------------------------+-----------------------------------------------------------------------------------------+

测试过程中,因为是虚拟机,我的mysql所在磁盘io几乎达到饱和状态;因为只有一个cpu,user也很高

avg-cpu: %user %nice %system %iowait %steal %idle
78.57 0.00 21.43 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 293.88 4.08 45.92 1044.90 2448.98 69.88 0.12 2.37 1.61 8.06
sdc 0.00 830.61 361.22 121.43 66865.31 7346.94 153.76 7.77 15.75 2.11 101.94
dm-0 0.00 0.00 374.49 1240.82 70163.27 9795.92 49.50 10.52 6.40 0.63 101.94
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

5.实际测试问题

1.经过核实,有跑报表的业务sql,所以避开这个业务高峰期。而且备份的时间窗口需要把握住,不然跑到业务高峰期就蛋疼了

[root@hpaybbcx2:/usr/local/mydumper]#./mydumper --user=root --password=ItRYF75q --socket=/mysql/3306/socket/mysql.socket --database=intrepaydb --outputdir=/u02/mysqlbackup/intrepaydb --long-query-guard=120
** (mydumper:161024): CRITICAL **: There are queries in PROCESSLIST running longer than 120s, aborting dump,
use --long-query-guard to change the guard value, kill queries (--kill-long-queries) or use 
different server for dump

2.备份的时候,要考虑数据库服务器的压力及资源使用情况