触发了MySQL一个bug导致进程不断crash重启

2/13/2017来源:SQL技巧人气:2008

1. 过程

测试机卡死进不去,手动重启了一下。 重启后发现MySQL有张损坏,select特定的条件会报错:

error_code :2013 Lost connection to MySQL server during query

2013断开连接。难道执行超时了?查看mysql errlog

show variables like ‘%log_error%’

2. 查看MySQL errlog

捕获到的errlog:

2017-02-10T05:34:50.248538Z 0 [Note] Starting crash recovery... 2017-02-10T05:34:50.248667Z 0 [Note] Crash recovery finished. 2017-02-10T05:34:50.249289Z 0 [Note] InnoDB: Buffer pool(s) load completed at 170210 13:34:50 2017-02-10T05:34:50.665378Z 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and PRivate key 2017-02-10T05:34:50.665416Z 0 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306 2017-02-10T05:34:50.665443Z 0 [Note] - '0.0.0.0' resolves to '0.0.0.0'; 2017-02-10T05:34:50.665490Z 0 [Note] Server socket created on ip: '0.0.0.0'. 2017-02-10T05:34:50.676767Z 0 [Note] Event Scheduler: Loaded 0 events 2017-02-10T05:34:50.676955Z 0 [Note] /usr/local/mysql/bin/mysqld: ready for connections. Version: '5.7.12-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution 2017-02-10T05:37:03.086835Z 16 [Note] InnoDB: Uncompressed page, stored checksum in field1 64592992, calculated checksums for field1: crc32 542883854/1405161966, innodb 1991279791, none 3735928559, stored checksum in field2 64592992, calculated checksums for field2: crc32 542883854/1405161966, innodb 782200846, none 3735928559, page LSN 1 1279666288, low 4 bytes of LSN at page end 1279666288, page number (if stored to page already) 2550, space id (if created with >= MySQL-4.1.1 and stored already) 719 InnoDB: Page may be an index page where index id is 1112 2017-02-10T05:37:03.086868Z 16 [Note] InnoDB: Index 1112 is `PRIMARY` in table `db1`.`tb1` 2017-02-10T05:37:03.086879Z 16 [Note] InnoDB: It is also possible that your Operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery. 2017-02-10T05:37:03.086952Z 16 [ERROR] InnoDB: Database page corruption on disk or a failed file read of page [page id: space=719, page number=2550]. You may have to recover from a backup. 2017-02-10T05:37:03.086964Z 16 [Note] InnoDB: Page dump in ascii and hex (16384 bytes): len 16384; hex 03d99c60000009f6000009f5000009f7000000014c46287045bf0000000000000000000002cf00163b348056000000003a9400020000005400000000000000000000000000000000045800000000000000000000000000000000000000000100020028696e66696d756d0005000b000073757072656d756d100c06120d09000d0c090906000000001000b3800335ec000000d69ebca400000242027c800001fd8000026fe9ad8fe699a8323030303 InnoDB: End of page dump 2017-02-10T05:37:03.602755Z 16 [Note] InnoDB: Uncompressed page, stored checksum in field1 64592992, calculated checksums for field1: crc32 542883854/1405161966, innodb 1991279791, none 3735928559, stored checksum in field2 64592992, calculated checksums for field2: crc32 542883854/1405161966, innodb 782200846, none 3735928559, page LSN 1 1279666288, low 4 bytes of LSN at page end 1279666288, page number (if stored to page already) 2550, space id (if created with >= MySQL-4.1.1 and stored already) 719 InnoDB: Page may be an index page where index id is 1112 2017-02-10T05:37:03.602788Z 16 [Note] InnoDB: Index 1112 is `PRIMARY` in table `db1`.`tb1` 2017-02-10T05:37:03.602798Z 16 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery. 2017-02-10T05:37:03.602821Z 16 [ERROR] [FATAL] InnoDB: Unable to read page [page id: space=719, page number=2550] into the buffer pool after 100 attempts. The most probable cause of this error may be that the table has been corrupted. Or, the table was compressed with with an algorithm that is not supported by this instance. If it is not a decompress failure, you can try to fix this problem by using innodb_force_recovery. Please see http://dev.mysql.com/doc/refman/5.7/en/ for more details. Aborting... 2017-02-10 13:37:03 0x7fa31c55a700 InnoDB: Assertion failure in thread 140338531772160 in file ut0ut.cc line 920 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 05:37:03 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. Attempting to collect some information that could help diagnose the problem. As this is a crash and something is definitely wrong, the information collection process might fail. key_buffer_size=16777216 read_buffer_size=262144 max_used_connections=7 max_threads=600 thread_count=7 connection_count=7 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 485185 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7fa2d00008c0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7fa31c559e80 thread_stack 0x40000 /usr/local/mysql/bin/mysqld(my_print_stacktrace+0x2c)[0xea97ac] /usr/local/mysql/bin/mysqld(handle_fatal_signal+0x451)[0x7a1861] /lib64/libpthread.so.0(+0xf100)[0x7fa35bf27100] /lib64/libc.so.6(gsignal+0x37)[0x7fa35ab265f7] /lib64/libc.so.6(abort+0x148)[0x7fa35ab27ce8] /usr/local/mysql/bin/mysqld[0x772407] /usr/local/mysql/bin/mysqld(_ZN2ib5fatalD1Ev+0x4d)[0x106ca5d] /usr/local/mysql/bin/mysqld(_Z16buf_page_get_genRK9page_id_tRK11page_size_tmP11buf_block_tmPKcmP5mtr_tb+0x745)[0x10ab895] /usr/local/mysql/bin/mysqld(_Z18btr_validate_indexP12dict_index_tPK5trx_tb+0x11d8)[0x1081428] /usr/local/mysql/bin/mysqld(_ZN11ha_innobase5checkEP3THDP15st_ha_check_opt+0x184)[0xee8524] /usr/local/mysql/bin/mysqld(_ZN7handler8ha_checkEP3THDP15st_ha_check_opt+0x67)[0x7eef07] /usr/local/mysql/bin/mysqld[0xde6853] /usr/local/mysql/bin/mysqld(_ZN19Sql_cmd_check_table7executeEP3THD+0xc1)[0xde7321] /usr/local/mysql/bin/mysqld(_Z21mysql_execute_commandP3THDb+0x5b0)[0xc88660] /usr/local/mysql/bin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x385)[0xc8e865] /usr/local/mysql/bin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0x894)[0xc8f164] /usr/local/mysql/bin/mysqld(_Z10do_commandP3THD+0x177)[0xc909f7] /usr/local/mysql/bin/mysqld(handle_connection+0x278)[0xd48e08] /usr/local/mysql/bin/mysqld(pfs_spawn_thread+0x1b1)[0x1206431] /lib64/libpthread.so.0(+0x7dc5)[0x7fa35bf1fdc5] /lib64/libc.so.6(clone+0x6d)[0x7fa35abe7ced] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7fa2d0003e50): is an invalid pointer Connection ID (thread ID): 16 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 2017-02-10T05:37:03.714250Z mysqld_safe Number of processes running now: 0 2017-02-10T05:37:03.715526Z mysqld_safe mysqld restarted 2017-02-10T05:37:03.873043Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2017-02-10T05:37:03.873145Z 0 [Warning] Insecure configuration for --secure-file-priv: Current value does not restrict location of generated files. Consider setting it to a valid, non-empty path. 2017-02-10T05:37:03.873184Z 0 [Note] /usr/local/mysql/bin/mysqld (mysqld 5.7.12-log) starting as process 30725 ... 2017-02-10T05:37:03.878035Z 0 [Note] InnoDB: PUNCH HOLE support available 2017-02-10T05:37:03.878069Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2017-02-10T05:37:03.878076Z 0 [Note] InnoDB: Uses event mutexes 2017-02-10T05:37:03.878081Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier 2017-02-10T05:37:03.878085Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.3 2017-02-10T05:37:03.878107Z 0 [Note] InnoDB: Using linux native AIO 2017-02-10T05:37:03.878530Z 0 [Note] InnoDB: Number of pools: 1 2017-02-10T05:37:03.878651Z 0 [Note] InnoDB: Using CPU crc32 instructions 2017-02-10T05:37:03.885901Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M 2017-02-10T05:37:03.892177Z 0 [Note] InnoDB: Completed initialization of buffer pool 2017-02-10T05:37:03.893771Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2017-02-10T05:37:03.905229Z 0 [Note] InnoDB: Highest supported file format is Barracuda. 2017-02-10T05:37:03.905932Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 5810685335 2017-02-10T05:37:03.905952Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 5810685344 2017-02-10T05:37:03.906077Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 5810685344 2017-02-10T05:37:03.906089Z 0 [Note] InnoDB: Database was not shutdown normally! 2017-02-10T05:37:03.906094Z 0 [Note] InnoDB: Starting crash recovery. 2017-02-10T05:37:03.915478Z 0 [Note] InnoDB: Last MySQL binlog file position 0 607062, file name mysql-bin.000137 2017-02-10T05:37:04.036610Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2017-02-10T05:37:04.036643Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2017-02-10T05:37:04.036730Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2017-02-10T05:37:04.061727Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2017-02-10T05:37:04.062797Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active. 2017-02-10T05:37:04.062818Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active. 2017-02-10T05:37:04.063411Z 0 [Note] InnoDB: Waiting for purge to start 2017-02-10T05:37:04.113627Z 0 [Note] InnoDB: 5.7.12 started; log sequence number 5810685344 2017-02-10T05:37:04.113997Z 0 [Note] InnoDB: Loading buffer pool(s) from /usr/local/mysql-5.7.12/data/ib_buffer_pool 2017-02-10T05:37:04.114336Z 0 [Note] Plugin 'FEDERATED' is disabled. 2017-02-10T05:37:04.115004Z 0 [Note] Recovering after a crash using /usr/local/mysql/mysql-bin-log/mysql-bin 2017-02-10T05:37:04.115032Z 0 [Note] Starting crash recovery... 2017-02-10T05:37:04.115059Z 0 [Note] Crash recovery finished. 2017-02-10T05:37:04.115588Z 0 [Note] InnoDB: Buffer pool(s) load completed at 170210 13:37:04 2017-02-10T05:37:04.529569Z 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key 2017-02-10T05:37:04.529644Z 0 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306 2017-02-10T05:37:04.529685Z 0 [Note] - '0.0.0.0' resolves to '0.0.0.0'; 2017-02-10T05:37:04.529755Z 0 [Note] Server socket created on IP: '0.0.0.0'. 2017-02-10T05:37:04.541230Z 0 [Note] Event Scheduler: Loaded 0 events 2017-02-10T05:37:04.541419Z 0 [Note] /usr/local/mysql/bin/mysqld: ready for connections. Version: '5.7.12-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Source distribution

3. errlog分析

通过log可以看出,像是触发了MySQL的bug导致crash了,ps -ef | grep mysql看到重启时间确实是重启了,然后连接就断开了,所以才返回2013 Lost connection的错误。 原因是因为系统被强制重启导致MySQL相关核心数据未及时回写。

4. 尝试解决

根据log提供的http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html网址的方法修复,设置

[mysqld] innodb_force_recovery = 4

重启后还是未能修复,估计是彻底损坏了,还好是测试环境,看来系统的突然崩溃对MySQL影响挺大的,数据一定要及时做好备份,还是是测试环境。

5. 给官方提bug

至于为什么数据损坏导致MySQL必然的被重启,这个也相当不合理,已经提给官方MySQL一个bug:https://bugs.mysql.com/bug.php?id=84937 就算有再大的损坏也不能触发MySQL重启啊。