Admin Linux ... DBA MySQL: 2015

Monday, September 21, 2015

MySQL Locking tables details

MySQL allows a client session to acquire a table lock explicitly for preventing other sessions from accessing the table during a specific period. A client session can acquire or release table locks only for itself. It cannot acquire or release table locks for other sessions.

Below are the method to check lock details:

SHOW ENGINE INNODB STATUS
This is generally very good, but it can get truncated, and simply may not contain every bit of info you need.

Enable InnoDB Lock Monitor (enable the InnODb Lock Monitor by simply creating any innodb table named innodb_lock_monitor)
This logs a lot of extra lock information in the SHOW ENGINE INNODB STATUS output, but it can get truncated too.

Run "mysqladmin debug"
Logs all lock info to the error log. Great because it logs all locks (i.e., none truncated) and it logs LOCK TABLE locks, which do not appear in SHOW INNODB STATUS even if on an InnoDB table, because LOCK TABLE is external to the InnoDB storage engine. Not so great because a bit cryptic to read, and I wouldn't solely reply on it, as it's often most helpful in conjuntion with other details).

SHOW FULL PROCESSLIST
This will show all connected threads. Specifically, when it comes to 'hidden' locks, it would show a user that has been connected, but idle (but who could have issued a LOCK TABLE command).

Error log
Of course, always check out the error log for messages and/or anything out of the ordinary. (Not to mention extra data will be logged to it from "mysqladmin debug" and innodb_lock_monitor.)
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCKS, INNODB_LOCK_WAITS, and INNODB_TRX tables
This is only available as of MySQL 5.1 and newer (and you must be using the InnoDB Plugin), but since most of you will be running this, they contain great information. In fact, I've written a past post on Debugging InnoDB Locks using the new InnoDB Plugin’s Information Schema Tables which contains even more details on using these 3 tables for troubleshooting.

SHOW CREATE TABLE outputs for each table involved

Useful Queries

We can use INFORMATION_SCHEMA Tables.

To check about all the locks transactions are waiting for -

SELECT * FROM INNODB_LOCK_WAITS

A list of blocking transactions

SELECT * FROM INNODB_LOCKS WHERE LOCK_TRX_ID IN (SELECT BLOCKING_TRX_ID FROM INNODB_LOCK_WAITS)
SELECT INNODB_LOCKS.* FROM INNODB_LOCKS JOIN INNODB_LOCK_WAITS ON (INNODB_LOCKS.LOCK_TRX_ID = INNODB_LOCK_WAITS.BLOCKING_TRX_ID)

A List of locks on perticular table:

SELECT * FROM INNODB_LOCKS WHERE LOCK_TABLE = db_name.table_name

A list of transactions waiting for locks:

SELECT TRX_ID, TRX_REQUESTED_LOCK_ID, TRX_MYSQL_THREAD_ID, TRX_QUERY FROM INNODB_TRX WHERE TRX_STATE = 'LOCK WAIT'

Other Queries:

-> show open tables WHERE In_use > 0;
-> SHOW OPEN TABLES WHERE `Table` LIKE '%Table Name%' AND `Database` LIKE 'Data BaseName' AND In_use > 0;

http://www.chriscalender.com/advanced-innodb-deadlock-troubleshooting-what-show-innodb-status-doesnt-tell-you-and-what-diagnostics-you-should-be-looking-at/

http://www.xaprb.com/blog/2006/07/31/how-to-analyze-innodb-mysql-locks/

http://www.slideshare.net/valeriikravchuk1/understanding-innodb-locks-and-deadlocks

IOPS measurement

IOPS (input/output operations per second) is the standard unit of measurement for the maximum number of reads and writes to non-contiguous storage locations of devices like hard disk drives (HDD), solid state drives (SSD), and storage area networks (SAN). IOPs is often measured with an open source network testing tool called an Iometer.

Some commonly accepted averages for random IO operations, calculated as 1/(seek + latency) = IOPS:

IOPS calculations

Every disk in your storage system has a maximum theoretical IOPS value that is based on a formula. Disk performance — and IOPS — is based on three key factors:

Rotational speed (aka spindle speed). Measured in revolutions per minute (RPM), most disks you'll consider for enterprise storage rotate at speeds of 7,200, 10,000 or 15,000 RPM with the latter two being the most common. A higher rotational speed is associated with a higher performing disk. This value is not used directly in calculations, but it is highly important. The other three values depend heavily on the rotational speed, so I've included it for completeness.
Average latency. The time it takes for the sector of the disk being accessed to rotate into position under a read/write head.
Average seek time. The time (in ms) it takes for the hard drive's read/write head to position itself over the track being read or written. There are both read and write seek times; take the average of the two values.

To calculate the IOPS range, use this formula: Average IOPS: Divide 1 by the sum of the average latency in ms and the average seek time in ms (1 / (average latency in ms + average seek time in ms).

Sunday, July 26, 2015

InnoDB data off the disk

MySQL's InnoDB storage engine data refresh every situation. This post from InnoDB down, look at the data from the memory to the InnoDB actually written on the media storage device in the end there is a buffer in which to work.

We look through the following figure at the relevant buffer:

From the figure above, we can see that the InnoDB data to disk need to go through

InnoDB buffer pool, Redo log buffer. This is a buffer InnoDB application itself.
page cache / Buffer cache (can be bypassed by o_direct). This is a buffer vfs layer.
Inode cache / directory buffer. This is the buffer vfs layer. Need to refresh by O_SYNC or fsync ().
Write-Back buffer. (You can set parameters to bypass the memory controller)
Disk on-borad buffer. (Can be bypassed by setting the disk controller parameters)

Here we use the term "buffer" (typically buffer) to represent the data write scratch, using the term "cache" (usually cache) to represent the data read scratch. As the name suggests, due to the difference between the rate of the underlying storage device and memory buffer is used to temporarily "slow" to the underlying storage device IO's "red" attack. Cache is mainly in memory temporarily "stored" data read from the disk, for subsequent access to these data without accessing the slow underlying storage device again.

InnoDB layer

The layers are placed in host memory buffer, it is intended primarily at the application layer to manage their own data, read and write operations to avoid slow response times affect the InnoDB.

InnoDB layer consists of two buffer: redo log buffer and innodb buffer pool. redo log buffer for temporarily storing the redo log redo log log write, InnoDB buffer pool from disk storage device to read the InnoDB data, also for InnoDB data write buffer, namely the dirty pages of data. If the host is powered off or abnormal downtime MySQL, innodb buffer pool will not be promptly flushed to disk, so you can only use InnoDB redo log to roll forward from a checkpoint; and if you can not redo log buffer is flushed to disk, then since redo log in loss of data, even if the use of redo roll before the transaction submitted by users because there is no real record of the non-volatile disk medium, it lost out.

Control redo log buffer refresh timing parameters are innodb_flush_log_at_trx_commit, and control redo log buffer and parameters innodb buffer pool refresh mode is innodb_flush_method. Aiming at these two parameters are very much in detail, we are here mainly from the perspective of the buffer to resolve.

Innodb_flush_log_at_trx_commit :

Control redo log buffer of innodb_flush_log_at_trx_commit currently supports three different parameter values 0,1,2

In addition, the update about changes innodb_flush_log_at_trx_commit = 5.6 2:00:
<5 .6.6:="" buffer="" data="" disk="" every="" flush="" flushed="" in="" innodb_flush_log_at_timeout="" is="" log="" nbsp="" p="" redo="" second="" seconds.="" the="" to="">
Innodb_flush_method :

Control innodb buffer pool of innodb_flush_method currently supports four different parameter values:

fdatasync
O_DSYNC
O_DIRECT
O_DIRECT_NO_FSYNC

Here we note that there are several issues:

innodb_flush_method specify not only the "Data File" refresh mode, also specify the "log file" refresh mode.
These parameters there are no parameters in the configuration windows environment, and now we are beginning not bird Gates brother it? In fact, wrote in a note inside, windows on the use async_unbuffered, and can not be modified, so the list is not written inside.
The first three parameters only allows 6.6 and 5.6.6 used in previous versions, starting from the 5.6.7 adds O_DIRECT_NO_FSYNC. That opens the file with the O_DIRECT, but no fsync () to synchronize data. Because of this relatively new Linux kernel and part of the file system, use O_DIRECT can ensure data security, without special then fsync () to synchronize, ensure meta data is flushed to non-volatile disk media. For example: XFS not use this parameter. O_DIRECT bypassing the page cache, why use fsync () and then refresh the following, we devoted the next section.
Some would say there is a small document referense bug, 6.6 versions prior to default is fdatasync, but there is not even within fdatasync value Valid Values can be specified.

In fact, here is his deliberate, because fdatasync () and fsync () is not the same as the difference between O_DSYNC and O_SYNC same. Fdatasync and O_DSYNC only for data synchronization, fsync () and O_SYNC for data and metadata meta-data synchronization. But MySQL fdatasync parameter values used to indicate "data file" and "Log File" is fsync () to open (note: not fdatasync ()), this is a historical reason, so 5.6 deliberately remove it from the optional values, to avoid misunderstanding. Of course, if you still want to use fsync () to sync, then do not specify anything for innodb_flush_method it.

Apart O_DIRECT_NO_FSYNC, InnoDB uses fsync () to refresh "the data file." Exception is O_DIRECT_NO_FSYNC here.
If O_DIRECT, O_DIRECT_NO_FSYNC, data files are opened O_DIRECT (on solaris () opened with directio, if Innodb data files are placed in a separate device, you can use forcedirectio makes the whole file system with directio when the mount is open Here specified as innodb instead because MySQL, MyISAM do not use directio ()

Virtual File System (VFS) Layer

The buffer layer are placed in host memory, its main purpose is to buffer data in the operating system layer, to avoid slow block device reads and writes affect the response time of the IO.

Careful study O_DIRECT / O_SYNC tag
Involved in the discussions preceding redo log buffer and innodb buffer pool in a lot of data refresh and data security issues, we in this section, devoted to the meaning O_DIRECT / O_SYNC tags.

The figure, we see that the layer mainly page_cache / buffer cache / Inode-cache / Directory cache. Which page_cache / buffer cache main memory for buffering data and block data structure. The inode-cache for buffering inode, directory-cache directory structure for buffering data.

Depending on the file systems and operating systems, in general, a file write operation consists of two parts, the writing operation of the data itself, and the file attributes (metadata metadata) writes (here the file attributes include directory, inode, etc).

Understand these later, we can say relatively simple clear sign of the meaning of each:

Difference O_DSYNC and fdatasync () is: is all refreshed for the corresponding page cache and buffer cache at the time of each IO submitted; or write to certain data after calling fdatasync () the moment of the whole page cache and buffer cache refresh. O_SYNC and fsync () of the difference between empathy.
The main difference between the page cache and buffer cache is that one is for the actual file data, a block-oriented devices. Using the open () mode in the upper VFS open those files using mkfs to make the file system, you will use the page cache and buffer cache, and if you use the dd this way on the Linux operating system to operate a Linux block device, You can only use buffer cache.
O_DSYNC and O_SYNC difference lies: O_DSYNC tell the kernel, when data is written to the file only when data is written to the disk, the write operation is completed (write before returning success). O_SYNC O_DSYNC more stringent than, not only requires the data has been written to disk, and attribute the corresponding data files (such as file inode, changes in relevant directories, etc.) need to be updated to complete the write operation to be successful. O_SYNC seen to do more than O_DSYNC operation.
Open () the referense There is also a O_ASYNC, it is mainly used for terminals, pseudoterminals, sockets, and pipes / FIFOs, the signal driving the IO, sends a signal (SIGIO) When the device can read and write, the application process to capture this signal for IO operations.
O_SYNC and O_DIRECT are synchronized to write, that success will only write return.

Looking back, we will look at innodb_flush_log_at_trx_commit configuration is better understood. Why O_DIRECT IO directly bypassing the page cache / buffer cache ever need to fsync (), it is to put inode cache directory cache and metadata are flushed to the storage device.

And because updating the kernel and file system, the file system can guarantee some assurance in O_DIRECT mode without fsync () to synchronize the metadata will not cause data security issues, so InnoDB also provided O_DIRECT_NO_FSYNC way.

Of course, O_DIRECT to read and write are effective, especially for reading, it can ensure that read data is read from the storage device, rather than the cache. Avoid data cache data and storage devices is inconsistent (ie you by DRBD block device will update the underlying data, and for the non-distributed file system cache contents and storage devices on inconsistent data) . But we are here focused on the buffer (write buffer), it is not discussed in depth. This problem.

O_DIRECT advantages and disadvantages

Most of the recommended innodb_flush_method parameter values are recommended to use in O_DIRECT, even in percona server branch also provides ALL_O_DIRECT, also used for the log file is opened O_DIRECT.

Advantages:

Saving operating system memory: O_DIRECT bypass the page cache / buffer cache, thus avoiding InnoDB read and write data in operating system takes up less memory, the more memory to leave a innodb buffer pool to use.
Saving CPU. In addition, the memory storage device to transfer mode mainly poll, interrupt and DMA mode. Use O_DIRECT way to make use of the operating system prompt DMA mode for storage devices operate, saving CPU.

Weaknesses

Byte alignment. O_DIRECT way required to write data, the memory is byte-aligned (aligned manner depending on the kernel and file system being used). This requires data at the time of writing need additional alignment operation. Via / sys / block / sda / queue / logical_block_size know aligned size, usually 512 bytes.
IO can not be merged. O_DIRECT bypassing the page cache / buffer cache to write directly to the storage device, so that if the same piece of data can not be hit repeatedly written in memory, page cache / buffer cache to write the function can not be merged into effect.
Lower sequential read and write efficiency. If you use O_DIRECT to open a file, the read / write operations will skip the cache, the read / write directly on the storage device. Because there is no cache, so the sequential read and write file usage efficiency O_DIRECT this small IO requests is relatively low.

In general, use O_DIRECT to set innodb_flush_method not 100% of all applications and scenarios are applicable.

Storage Controller Layer

The layers are placed in the corresponding buffer memory controller on-board cache, and its main purpose is to buffer data in memory controller layer to avoid slow block device reads and writes affect the response time of the IO. When data is fsync () and other brush to the storage layer, it is first sent to the storage controller layer. Common storage controller is Raid card, and now most of the Raid card has 1G or greater storage capacity. This buffer is generally volatile memory, through the onboard battery / capacitor to ensure that the data "volatile memory" of power after the machine will still be synchronized to the underlying disk storage media.
About storage controller we have some aspects to note:

write back / write through:

With respect to whether the buffer, the general storage controllers provide write back and write through two ways. Under write back mode, the operating system writes data requests submitted written directly to buffer it returns success; in write through mode, the operating system writes data submitted written request must be true underlying disk media before returning success.

Battery / capacitor differences:

In order to guarantee that power down the machine data in the "volatile" buffer can be instantly updated to the underlying disk media, there is a battery / capacitor to ensure the storage controller. Common battery capacity fade problem, that is to say from time to time, the on-board battery charge and discharge must be controlled to ensure that the battery capacity. In the battery charging and discharging process, is set to write-back memory controller will automatically become write through. The charge-discharge cycles (Learn Cycle cycle) is generally 90 days, LSI card can MegaCli to see:
#MegaCli -AdpBbuCmd -GetBbuProperties-AAll
BBU Properties for Adapter: 0
Auto Learn Period: 90 Days
Next Learn time: Tue Oct 14 05:38:43 2014
Learn Delay Interval: 0 Hours
Auto-Learn Mode: Enabled
If you find every once in IO request response time suddenly slow down, there may be a problem, oh. By MegaCli -AdpEventLog -GetEvents -f mr_AdpEventLog.txt -aALL log in Event Description: Battery started charging can determine whether there has occurred a case of charging and discharging.
Since the battery have this problem, a new Raid card is configured capacitance to ensure data "volatile" buffer can be instantly updated to the underlying disk media, so there is no question of charging and discharging.

read / write ratio:

HP's smart array provides read and write cache distinction (Accelerator Ratio),
hpacucli ctrl all show config detail | grep 'Accelerator Ratio'
Accelerator Ratio: 25% Read / 75% Write
So you can set the ratio of cache read and write buffer cache for applications based on the actual situation.

Open Direct IO

In order to allow the top of the device using Direct IO way to bypass the raid card, you need to set to open DirectIO Raid mode:
/ Opt / MegaRAID / MegaCli / MegaCli64 -LDSetProp -Direct -Immediate -Lall -aAll

LSI flash raid:

We mentioned above the "volatile" buffer, if we now have a non-volatile buffer, and the capacity of several hundred G, such as the memory controller buffer to the underlying device is not more speed? As a veteran of the Raid card vendors, LSI now there is such a memory controller, use the write back mode and more dependent on the memory controller buffer applications can consider using this type of storage controller.

write barriers

Currently raid card cache if a battery or capacitor protection for Linux is not visible, so the Linux system log files in order to ensure consistency, the default will open write barriers, that is, it will continue to refresh the "volatile "buffer, it will greatly reduce IO performance. So if you are convinced that the underlying cells can guarantee "volatile" buffer will brush to the underlying disk devices, you can add the disk mount -o nobarrier time.

Disk controller layer

The buffer layer are placed in the corresponding on-board disk controller cache. Storage device firmware (firmware) will be sorted by the rules will be synchronized to the media writes really go. This is mainly to ensure sequential write, mechanical disks, so you can try to make the move a head to do more disk write operations.
In general, DMA controller on the disk of this layer is carried out through DMA direct memory access controller, capable of saving CPU resources.
For the mechanical hard drive, because on conventional disk device and no battery capacitors, we can not guarantee that all data on the disk cache inside the machine power failure timely synchronized to the media, so we strongly recommend that closed off the disk cache.
Disk cache memory controller can turn off the layer. For example, using MegaCli off command is as follows:
MegaCli -LDSetProp -DisDskCache -Lall -aALL

Summary

From InnoDB to the final medium, we have gone through all kinds of cushion, their purpose is very clear, it is to solve: speed memory and disk does not match the problem, or that the disk speed is too slow.
Additionally, in fact, most know whether the data should buffer / cache or the application itself, VFS, storage controllers and disks can only be written by the delay (in order to merge duplicate IO, making random writes into sequential writes) to alleviate the underlying storage device slow slow response speed caused problems. Therefore, the type of database application will be to manage the buffer, and then try to avoid buffer operating system and the underlying equipment.
But in fact occur due to the current difference in speed SSDs and PCIe Flash SSD cards, memory, and disk is greatly reduced between these buffers if necessary, what can be improved hardware and software, hardware and software engineers, a major challenge.

Reference: http://www.codeproject.com/Articles/460057/HDD-FS-O_SYNC-Throughput-vs-Integrity
http://rdc.taobao.com/blog/dba/html/296_innodb_flush_method_performance.html
http://www.orczhou.com/index.php/2009/08/innodb_flush_method-file-io/
http://blog.csdn.net/yuyin86/article/details/8113305
http://www.mtop.cc/node/100
https://www.usenix.org/legacy/event/usenix01/full_papers/kroeger/kroeger_html/node8.html
http://www.lsi.com/downloads/Public/Direct%20Assets/LSI/Benchmark_Tips.pdf
http://www.lsi.com/products/flash-accelerators/pages/default.aspx .
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/writebarrieronoff.html
http://en.wikipedia.org/wiki/Direct_memory_access
http://www.hgst.com/tech/techlib.nsf/techdocs/DFE76984029D3BE586256FAB0058B1A8/$file/DMA-white_paper_FINAL.pdf
http://en.wikipedia.org/wiki/Disk_buffer

Tuesday, July 14, 2015

DatabaseDesignBestPractices

20 Database Design Best Practices

Use well defined and consistent names for tables and columns (e.g. School, StudentCourse, CourseID ...).
Use singular for table names (i.e. use StudentCourse instead of StudentCourses). Table represents a collection of entities, there is no need for plural names.
Don’t use spaces for table names. Otherwise you will have to use ‘{‘, ‘[‘, ‘“’ etc. characters to define tables (i.e. for accesing table Student Course you'll write “Student Course”. StudentCourse is much better).
Don’t use unnecessary prefixes or suffixes for table names (i.e. use School instead of TblSchool, SchoolTable etc.).
Keep passwords as encrypted for security. Decrypt them in application when required.
Use integer id fields for all tables. If id is not required for the time being, it may be required in the future (for association tables, indexing ...).
Choose columns with the integer data type (or its variants) for indexing. varchar column indexing will cause performance problems.
Use bit fields for boolean values. Using integer or varchar is unnecessarily storage consuming. Also start those column names with “Is”.
Provide authentication for database access. Don’t give admin role to each user.
Avoid “select *” queries until it is really needed. Use "select [required_columns_list]" for better performance.
Use an ORM (object relational mapping) framework (i.e. hibernate, iBatis ...) if application code is big enough. Performance issues of ORM frameworks can be handled by detailed configuration parameters.
Partition big and unused/rarely used tables/table parts to different physical storages for better query performance.
For big, sensitive and mission critic database systems, use disaster recovery and security services like failover clustering, auto backups, replication etc.
Use constraints (foreign key, check, not null ...) for data integrity. Don’t give whole control to application code.
Lack of database documentation is evil. Document your database design with ER schemas and instructions. Also write comment lines for your triggers, stored procedures and other scripts.
Use indexes for frequently used queries on big tables. Analyser tools can be used to determine where indexes will be defined. For queries retrieving a range of rows, clustered indexes are usually better. For point queries, non-clustered indexes are usually better.
Database server and the web server must be placed in different machines. This will provide more security (attackers can’t access data directly) and server CPU and memory performance will be better because of reduced request number and process usage.
Image and blob data columns must not be defined in frequently queried tables because of performance issues. These data must be placed in separate tables and their pointer can be used in queried tables.
Normalization must be used as required, to optimize the performance. Under-normalization will cause excessive repetition of data, over-normalization will cause excessive joins across too many tables. Both of them will get worse performance.
Spend time for database modeling and design as much as required. Otherwise saved(!) design time will cause (saved(!) design time) * 10/100/1000 maintenance and re-design time.

Tuesday, January 27, 2015

MySQL: Kill sleeping connections using script

Kill Sleep Queries

It’s all too common for me to come across a box that is hopelessly overloaded with Apache requests and MySQL queries. Sometimes, it takes some brute force to keep the whole box from going under. When all else fails, I use the following one-liner to kill off all of those queries that have been running for too long.

mysql> select concat('KILL ',id,';') from information_schema.processlist where Command = 'Sleep' and Time > 60 into outfile '/tmp/sleep_processes.txt'"
mysql> source /tmp/sleep_processes.txt

Slave Type Conversions - ER SLAVE CONVERSION FAILED

Replication with Differing Table Definitions on Master and Slave

Today, I have increased the size column varchar(5000) to varchar(10000) on slave for table.

alter table CASE_ACTIVITY modify remarks VARCHAR(10000) DEFAULT NULL;

And, start getting below error and replication is stoped,

[ERROR] Slave SQL: Column 6 of table 'db.slave' cannot be converted from type 'varchar(5000)' to type 'varchar(10000)', Error_code: 1677

MySQL error code 1677 (ER SLAVE CONVERSION FAILED): Columns of table '%-.192s.%-.192s' cannot be converted from type '%-.32s' to type '%-.32s'

As a solution I changes slave_type_conversions

show global variables like '%slave_type_conversions%';

MySQL> set global slave_type_conversions='ALL_NON_LOSSY'

And, it worked :)

Controlling what conversions are allowed is controlled with a new server variable SLAVE_TYPE_CONVERSIONS, which is of the type SET('ALL_LOSSY','ALL_NON_LOSSY'), that is, it is a set of allowed conversions. The default for this variable is the empty set, meaning that no conversions are allowed at all.

If the ALL_NON_LOSSY constant is in the set, all conversions (within each group) that do not lose any information are allowed. For example, replicating from CHAR(32) to TINYTEXT is allowed since the conversion goes to a wider field (even if it is a different type).

If the ALL_LOSSY constant is in the set, all conversions (again, within the same group) that could potentially lose information is allowed. For example, conversion to a narrower field on the slave, such as CHAR(32) to CHAR(16) is allowed. Note that non-lossy conversions are not automatically allowed when ALL_LOSSY is set.

Replication with Differing Tables on Master and Slave

Tuesday, January 13, 2015

MySQL Slave relay log corrupted

There is issues like “Relay log read failure: Could not parse relay log event entry” many times on MySQLslave node. Most of time in our cases, it’s due to Linux Server crashed/restarted unexpectedly.

The detail error is like this:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 146.23.1.34
                  Master_User: repl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.002032
          Read_Master_Log_Pos: 537558834
               Relay_Log_File: mysql-relay-bin.003601
                Relay_Log_Pos: 461487606
        Relay_Master_Log_File: mysql-bin.002032
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 461487460
              Relay_Log_Space: 537764119
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Replicate_Ignore_Server_Ids:
             Master_Server_Id: 120
1 row in set (0.00 sec)

In this case, we can check if the bin log at master node is still available.

Relay_Master_Log_File: mysql-bin.002032
...
Exec_Master_Log_Pos: 461487460

If it’s still available at master node, we can reset slave to that point, and let it recover by itself as follows:

Be careful, using Relay_Master_Log_File value in “show slave status\G” for master_log_file, not Master_Log_File value. They’re the same in this case.

mysql> stop slave;
Query OK, 0 rows affected (0.01 sec)

mysql> change master to master_log_file='mysqlbin.002032',master_log_pos=461487460;
Query OK, 0 rows affected (0.30 sec)

Check slave status again and again, make sure Exec_Master_Log_Pos is increasing. Seconds_Behind_Master should be decreasing. If master is too busy, the delay may still increase, but it’ll catch up sooner or later.

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Queueing master event to the relay log
                  Master_Host: 146.23.1.34
                  Master_User: repl_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.002032
          Read_Master_Log_Pos: 496939207
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 56472
        Relay_Master_Log_File: mysql-bin.002032
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 461543679
              Relay_Log_Space: 35452156
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 2782
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
Replicate_Ignore_Server_Ids:
             Master_Server_Id: 120
1 row in set (0.00 sec)

Admin Linux ... DBA MySQL