Monday, December 18, 2023

Understanding Data

Understanding Data 

Data is a distinct piece of information that is gathered and translated for some purpose. Data can come in the form of text, observations, figures, images, numbers, graphs, or symbols. For example, data might include individual prices, weights, addresses, ages, names, temperatures, dates, or distances.


Types of Data

  • Structured 

  • Unstructured

  • Semi-Structured

Structured Data -  Data that is organized in a defined manner or schema, typically found in relational databases. 

  • Characteristics: • Easily queryable • Organized in rows and columns • Has a consistent structure 

  • Examples: • Database tables • CSV files with consistent columns • Excel spreadsheets


Unstructured Data - Definition: Data that doesn't have a predefined structure or schema. 

  • Characteristics: • Not easily queryable without preprocessing • May come in various formats 

  • Examples: • Text files without a fixed format • Videos and audio files • Images • Emails and word processing documents


Semi -Structured Data - Data that is not as organized as structured data but has some level of structure in the form of tags, hierarchies, or other patterns. 

  • Characteristics: • Elements might be tagged or categorized in some way • More flexible than structured data but not as chaotic as unstructured data 

  • Examples: • XML and JSON files • Email headers (which have a mix of structured fields like date, subject, etc., and unstructured data in the body) • Log files with varied formats


Properties of Data 

  • Volume 

  • Velocity 

  • Variety


Volume - Refers to the amount or size of data that organizations are dealing with at any given time. 

  • Characteristics: • May range from gigabytes to petabytes or even more • Challenges in storing, processing, and analyzing high volumes of data 

  • Examples: • A popular social media platform processing terabytes of data daily from user posts, images, and videos. • Retailers collecting years' worth of transaction data, amounting to several petabytes.


Velocity - Refers to the speed at which new data is generated, collected, and processed. 

  • Characteristics: • High velocity requires real-time or near-real-time processing capabilities • Rapid ingestion and processing can be critical for certain applications 

  • Examples: • Sensor data from IoT devices streaming readings every millisecond. • High-frequency trading systems where milliseconds can make a difference in decision-making.


Variety - Refers to the different types, structures, and sources of data. 

  • Characteristics: • Data can be structured, semi-structured, or unstructured • Data can come from multiple sources and in various formats 

  • Examples: • A business analyzing data from relational databases (structured), emails (unstructured), and JSON logs (semi-structured). • Healthcare systems collecting data from electronic medical records, wearable health devices, and patient feedback forms.


Managing Data - Data Warehouse, Data Lake & Data Lakehouse


Data Warehouse : A centralized repository optimized for analysis where data from different sources is stored in a structured format. 

  • Characteristics:  Designed for complex queries and analysis • Data is cleaned, transformed, and loaded (ETL process) • Typically uses a star or snowflake schema • Optimized for read-heavy operations 

  • Examples: • Amazon Redshift • Google BigQuery • Microsoft Azure SQL Data Warehouse


Data Lake : A storage repository that holds vast amounts of raw data in its native format, including structured, semi-structured, and unstructured data. 

  • Characteristics: Can store large volumes of raw data without predefined schema • Data is loaded as-is, no need for preprocessing • Supports batch, real-time, and stream processing • Can be queried for data transformation or exploration purposes 

  • Examples: Amazon Simple Storage Service (S3) when used as a data lake • Azure Data Lake Storage • Hadoop Distributed File System (HDFS)


Data Lakehouse : A hybrid data architecture that combines the best features of data lakes and data warehouses, aiming to provide the performance, reliability, and capabilities of a data warehouse while maintaining the flexibility, scale, and low -cost storage of data lakes. 

  • Characteristics: • Supports both structured and unstructured data. • Allows for schema-on-write and schema-on-read. • Provides capabilities for both detailed analytics and machine learning tasks. • Typically built on top of cloud or distributed architectures. • Benefits from technologies like Delta Lake, which bring ACID transactions to big data. 

  • Examples: • AWS Lake Formation (with S3 and Redshift Spectrum) • Delta Lake: An open-source storage layer that brings ACID transactions to Apache Spark and big data workloads. • Databricks Lakehouse Platform: A unified platform that combines the capabilities of data lakes and data warehouses. • Azure Synapse Analytics: Microsoft's analytics service that brings together big data and data warehousing

Monday, November 06, 2023

MySQL Best Practices - Tips

MySQL is a widely used relational database management system, and following best practices is essential to ensure performance, reliability, and security. Here are some tips and best practices for working with MySQL

Use normalized tables :

  • First Normal Form (1NF): In the first normal form, each column must contain only one value and no table should store repeating groups of related data.

  • Each column of your table should be single-valued.
  • The values stored in each column must be of the same type.
  • Each column in a table should have a unique name.
  • You can store the data in the table in any order.

  • Second Normal Form (2NF): In the second normal form, first the database must be in the first normal form, it should not store duplicate rows in the same table. And if there are duplicate values in the row, they should be stored in their own separate tables and linked to the table using foreign keys. The ideal way to a database in second normal form is to create one to many relationship tables.

  • The table should be in first normal form (1NF).
  • There should not be any partial dependency.

  • Third Normal Form (3NF): In the third normal form, the database is already in the third form, if it is in the second normal form and every non-key column is mutually independent. Identify any columns in the table that are interdependent and break those columns into their own separate tables.

  • The table should be in Second Normal Form (2NF)
  • There should not be any transitive dependency for non-prime attributes.

  • Boyce Codd Normal Form (BCNF): It is the highest form of the third normal form which deals with different types of anomalies that are not handled by the 3NF.

  • The table should be in 3NF
  • For dependency like a-> B, A should be a super key which means A cannot be a non-prime attribute if B is a prime attribute.

Refer : https://www.simplilearn.com/tutorials/sql-tutorial/what-is-normalization-in-sql

Always use proper datatype

  • One of the most important MySQL best practices is to use datatypes based on the nature of data. Using irrelevant datatypes may consume more space or lead to errors.
  •  For example: Using varchar (20) instead of DATETIME datatype for storing date time values will lead to errors in date time-related calculations. Also, it is possible that invalid data will be stored.
  • Use CHAR (1) over VARCHAR (1)  - VARCHAR (1) takes extra bytes to store information, so if you string a single character, it better to use CHAR (1).
  • Use the CHAR datatype to store only fixed length data - For example: If the length of the data is less than 1000, using char (1000) instead of varchar (1000) will consume more space.
  • Avoid using regional date formats - When using DATETIME or DATE datatype, always use the YYYY-MM-DD date format or ISO date format suitable for your SQL Engine. Regional formats like DD-MM-YYYY or MM-DD-YYYY will not be stored properly.
  • Use the smallest data types possible - avoid large char (255) text fields when a varchar or smaller char is enough. If you use the right data type, more records will fit in memory or index key block. This leads to fewer reads and faster performance.
  • Use ENUM rather than VARCHAR

 Best Practices for InnoDB Tables

  • Specify a primary key for every table using the most frequently queried column or columns, or an auto-increment value if there is no obvious primary key.
  • Use joins wherever data is pulled from multiple tables based on identical ID values from those tables. For fast join performance, define foreign keys on the join columns, and declare those columns with the same data type in each table. Adding foreign keys ensures that referenced columns are indexed, which can improve performance. Foreign keys also propagate deletes and updates to all affected tables, and prevent insertion of data in a child table if the corresponding IDs are not present in the parent table.
  • Turn off autocommit. Committing hundreds of times a second puts a cap on performance (limited by the write speed of your storage device).
  • Group sets of related DML operations into transactions by bracketing them with START TRANSACTION and COMMIT statements. While you don't want to commit too often, you also don't want to issue huge batches of INSERT, UPDATE, or DELETE statements that run for hours without committing.
  • Do not use LOCK TABLES statements. InnoDB can handle multiple sessions all reading and writing to the same table at once without sacrificing reliability or high performance. To get exclusive write access to a set of rows, use the SELECT ... FOR UPDATE syntax to lock just the rows you intend to update.
  • Enable the innodb_file_per_table variable or use general tablespaces to put the data and indexes for tables into separate files instead of the system tablespace. The innodb_file_per_table variable is enabled by default.
  •  Evaluate whether your data and access patterns benefit from the InnoDB table or page compression features. You can compress InnoDB tables without sacrificing read/write capability.
  • Run the server with the --sql_mode=NO_ENGINE_SUBSTITUTION option to prevent tables from being created with storage engines that you do not want to use.

Optimizing SELECT Statements

The main considerations for optimizing queries are:

  • To make a slow SELECT ... WHERE query faster, the first thing to check is whether you can add an index. Set up indexes on columns used in the WHERE clause, to speed up evaluation, filtering, and the final retrieval of results. To avoid wasted disk space, construct a small set of indexes that speed up many related queries used in your application.
  •  Indexes are especially important for queries that reference different tables, using features such as joins and foreign keys. You can use the EXPLAIN statement to determine which indexes are used for a SELECT. Optimizing Queries with EXPLAIN”.
  • If a performance issue is not easily solved by one of the basic guidelines, investigate the internal details of the specific query by reading the EXPLAIN plan and adjusting your indexes, WHERE clauses, join clauses, and so on. (When you reach a certain level of expertise, reading the EXPLAIN plan might be your first step for every query.)
  • Isolate and tune any part of the query, such as a function call, that takes excessive time. Depending on how the query is structured, a function could be called once for every row in the result set, or even once for every row in the table, greatly magnifying any inefficiency.
  • Minimize the number of full table scans in your queries, particularly for big tables.
  • Keep table statistics up to date by using the ANALYZE TABLE statement periodically, so the optimizer has the information needed to construct an efficient execution plan.
  • Avoid transforming the query in ways that make it hard to understand, especially if the optimizer does some of the same transformations automatically.
  • Deal with locking issues, where the speed of your query might be affected by other sessions accessing the tables at the same time.

Here are some types of optimizations MySQL knows how to do: -

  • Reordering joins
  • Converting OUTER JOINs to INNER JOINs
  • Applying algebraic equivalence rules
  • COUNT(), MIN(), and MAX() optimizations
  • Evaluating and reducing constant expressions
  • Use Covering indexes
  • Subquery optimization
  • Early termination
  • Equality propagation
  • IN() list comparisons
  • Use SELECT * only if needed
  • Use ORDER BY Clause only if needed

Few More :

  • Partition Large Tables: For tables with a large number of rows, consider partitioning to improve query performance and management.
  • Use Connection Pooling: Connection pooling helps manage and reuse database connections, reducing the overhead of opening and closing connections for each query.
  • Cache Frequently Used Queries: Implement query caching mechanisms like MySQL Query Cache or use external caching systems like Memcached or Redis to reduce query load on the database server.
  • Regularly Review and Tune: Regularly review and tune your MySQL setup based on the changing needs of your applications.

Tuesday, October 03, 2023

Standard Methods - Network protocols

Network protocols are standard methods of transferring data between two computers in a network.

 

 

1. HTTP (HyperText Transfer Protocol)
HTTP is a protocol for fetching resources such as HTML documents. 
It is the foundation of any data exchange on the Web and it is a client-server protocol.

2. HTTP/3
HTTP/3 is the next major revision of the HTTP. It runs on QUIC, 
a new transport protocol designed for mobile-heavy internet usage. 
It relies on UDP instead of TCP, which enables faster web page responsiveness. 
VR applications demand more bandwidth to render intricate details of a virtual scene and 
will likely benefit from migrating to HTTP/3 powered by QUIC.

3. HTTPS (HyperText Transfer Protocol Secure)
HTTPS extends HTTP and uses encryption for secure communications.

4. WebSocket
WebSocket is a protocol that provides full-duplex communications over TCP. 
Clients establish WebSockets to receive real-time updates from the back-end services. 
Unlike REST, which always “pulls” data, WebSocket enables data to be “pushed”. Applications, 
like online gaming, stock trading, and messaging apps leverage WebSocket for real-time 
communication.

5. TCP (Transmission Control Protocol)
TCP is is designed to send packets across the internet and ensure the successful delivery of 
data and messages over networks. Many application-layer protocols build on top of TCP.

6. UDP (User Datagram Protocol)
UDP sends packets directly to a target computer, without establishing a connection first. 
UDP is commonly used in time-sensitive communications where occasionally dropping packets 
is better than waiting. Voice and video traffic are often sent using this protocol.

7. SMTP (Simple Mail Transfer Protocol)
SMTP is a standard protocol to transfer electronic mail from one user to another.

8. FTP (File Transfer Protocol)
FTP is used to transfer computer files between client and server. 
It has separate connections for the control channel and data channel.

 

Friday, September 22, 2023

Install MySQL 8 on Different Operating Systems

  • INSTALL MYSQL 8 ON CENTOS/REHAT

  ### INSTALLATION STEPS STEP 1. DOWNLOAD MYSQL 8 REPOSITORY PACKAGE wget https://repo.mysql.com/mysql80-community-release-el7-3.noarch.rpm STEP 2. INSTALL MYSQL REPO LOCALLY sudo yum localinstall mysql80-community-release-el7-3.noarch.rpm STEP 3. Import Public Key for MySQL 8 sudo rpm --import https://repo.mysql.com/RPM-GPG-KEY-mysql-2022 STEP 4. INSTALL MYSQL SERVER sudo yum install mysql-community-server STEP 5. ENABLE MYSQL SERVICE TO AUTO-START ON REBOOT sudo systemctl enable mysqld.service STEP 6. START MYSQL SERVICE sudo systemctl start mysqld.service STEP 7. CHECK STATUS OF MYSQL SERVICE systemctl status mysqld ### VERIFICATION pidof mysqld netstat -ntlp | grep 3306 sudo lsof -u mysql

Saturday, August 12, 2023

Lock Wait Timeout Exceeded - Error 1205

 One of the most popular InnoDB’s errors is InnoDB lock wait timeout exceeded, for example:

SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction

The above simply means the transaction has reached the innodb_lock_wait_timeout while waiting to obtain an exclusive lock which defaults to 50 seconds. The common causes are:

  1. The offensive transaction is not fast enough to commit or rollback the transaction within innodb_lock_wait_timeout duration.
  2. The offensive transaction is waiting for row lock to be released by another transaction.

The Effects of a InnoDB Lock Wait Timeout

InnoDB lock wait timeout can cause two major implications:

  • The failed statement is not being rolled back by default.
  • Even if innodb_rollback_on_timeout is enabled, when a statement fails in a transaction, ROLLBACK is still a more expensive operation than COMMIT.

Let’s play around with a simple example to better understand the effect. Consider the following two tables in database mydb:

mysql> CREATE SCHEMA mydb;
mysql> USE mydb;

The first table (table1):

mysql> CREATE TABLE table1 ( id INT PRIMARY KEY AUTO_INCREMENT, data VARCHAR(50));
mysql> INSERT INTO table1 SET data = 'data #1';

The second table (table2):

mysql> CREATE TABLE table2 LIKE table1;
mysql> INSERT INTO table2 SET data = 'data #2';

We executed our transactions in two different sessions in the following order:

Ordering

Transaction #1 (T1)

Transaction #2 (T2)

1

SELECT * FROM table1;

(OK)

SELECT * FROM table1;

(OK)

2

UPDATE table1 SET data = ‘T1 is updating the row’ WHERE id = 1;  

(OK)

 

3

 

UPDATE table2 SET data = ‘T2 is updating the row’ WHERE id = 1; 

(OK)

4

 

UPDATE table1 SET data = ‘T2 is updating the row’ WHERE id = 1; 

(Hangs for a while and eventually returns an error “Lock wait timeout exceeded; try restarting transaction”)

5

COMMIT;

(OK)

 

6

 

COMMIT;

(OK)

However, the end result after step #6 might be surprising if we did not retry the timed out statement at step #4:
mysql> SELECT * FROM table1 WHERE id = 1;
+----+-----------------------------------+
| id | data                              |
+----+-----------------------------------+
| 1  | T1 is updating the row            |
+----+-----------------------------------+



mysql> SELECT * FROM table2 WHERE id = 1;
+----+-----------------------------------+
| id | data                              |
+----+-----------------------------------+
| 1  | T2 is updating the row            |
+----+-----------------------------------+

After T2 was successfully committed, one would expect to get the same output “T2 is updating the row” for both table1 and table2 but the results show that only table2 was updated. One might think that if any error encounters within a transaction, all statements in the transaction would automatically get rolled back, or if a transaction is successfully committed, the whole statements were executed atomically. This is true for deadlock, but not for InnoDB lock wait timeout.

Unless you set innodb_rollback_on_timeout=1 (default is 0 – disabled), automatic rollback is not going to happen for InnoDB lock wait timeout error. This means, by following the default setting, MySQL is not going to fail and rollback the whole transaction, nor retrying again the timed out statement and just process the next statements until it reaches COMMIT or ROLLBACK. This explains why transaction T2 was partially committed!

The InnoDB documentation clearly says “InnoDB rolls back only the last statement on a transaction timeout by default”. In this case, we do not get the transaction atomicity offered by InnoDB. The atomicity in ACID compliant is either we get all or nothing of the transaction, which means partial transaction is merely unacceptable.

Dealing With a InnoDB Lock Wait Timeout

So, if you are expecting a transaction to auto-rollback when encounters an InnoDB lock wait error, similarly as what would happen in deadlock, set the following option in MySQL configuration file:

innodb_rollback_on_timeout=1

A MySQL restart is required. When deploying a MySQL-based cluster, ClusterControl will always set innodb_rollback_on_timeout=1 on every node. Without this option, your application has to retry the failed statement, or perform ROLLBACK explicitly to maintain the transaction atomicity.

To verify if the configuration is loaded correctly:

mysql> SHOW GLOBAL VARIABLES LIKE 'innodb_rollback_on_timeout';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| innodb_rollback_on_timeout | ON    |
+----------------------------+-------+

To check whether the new configuration works, we can track the com_rollback counter when this error happens:

mysql> SHOW GLOBAL STATUS LIKE 'com_rollback';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Com_rollback  | 1     |
+---------------+-------+

Tracking the Blocking Transaction

There are several places that we can look to track the blocking transaction or statements. Let’s start by looking into InnoDB engine status under TRANSACTIONS section:

mysql> SHOW ENGINE INNODB STATUSG
mysql> SHOW ENGINE INNODB STATUSG
------------
TRANSACTIONS
------------

...

---TRANSACTION 3100, ACTIVE 2 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 50, OS thread handle 139887555282688, query id 360 localhost ::1 root updating
update table1 set data = 'T2 is updating the row' where id = 1

------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6 page no 4 n bits 72 index PRIMARY of table `mydb`.`table1` trx id 3100 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 000000000c19; asc       ;;
 2: len 7; hex 020000011b0151; asc       Q;;
 3: len 22; hex 5431206973207570646174696e672074686520726f77; asc T1 is updating the row;;
------------------

---TRANSACTION 3097, ACTIVE 46 sec
2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1
MySQL thread id 48, OS thread handle 139887556167424, query id 358 localhost ::1 root
Trx read view will not see trx with id >= 3097, sees < 3097

From the above information, we can get an overview of the transactions that are currently active in the server. Transaction 3097 is currently locking a row that needs to be accessed by transaction 3100. However, the above output does not tell us the actual query text that could help us figuring out which part of the query/statement/transaction that we need to investigate further. By using the blocker MySQL thread ID 48, let's see what we can gather from MySQL processlist:

mysql> SHOW FULL PROCESSLIST;
+----+-----------------+-----------------+--------------------+---------+------+------------------------+-----------------------+
| Id | User            | Host            | db                 | Command | Time | State                  | Info                  |
+----+-----------------+-----------------+--------------------+---------+------+------------------------+-----------------------+
| 4  | event_scheduler | localhost       |              | Daemon  | 5146 | Waiting on empty queue |                 |
| 10 | root            | localhost:56042 | performance_schema | Query   | 0    | starting               | show full processlist |
| 48 | root            | localhost:56118 | mydb               | Sleep   | 145  |                        |                 |
| 50 | root            | localhost:56122 | mydb               | Sleep   | 113  |                        |                 |
+----+-----------------+-----------------+--------------------+---------+------+------------------------+-----------------------+

Thread ID 48 shows the command as 'Sleep'. Still, this does not help us much to know which statements that block the other transaction. This is because the statement in this transaction has been executed and this open transaction is basically doing nothing at the moment. We need to dive further down to see what is going on with this thread.

For MySQL 8.0, the InnoDB lock wait instrumentation is available under data_lock_waits table inside performance_schema database (or innodb_lock_waits table inside sys database). If a lock wait event is happening, we should see something like this:

mysql> SELECT * FROM performance_schema.data_lock_waitsG
***************************[ 1. row ]***************************
ENGINE                           | INNODB
REQUESTING_ENGINE_LOCK_ID        | 139887595270456:6:4:2:139887487554680
REQUESTING_ENGINE_TRANSACTION_ID | 3100
REQUESTING_THREAD_ID             | 89
REQUESTING_EVENT_ID              | 8
REQUESTING_OBJECT_INSTANCE_BEGIN | 139887487554680
BLOCKING_ENGINE_LOCK_ID          | 139887595269584:6:4:2:139887487548648
BLOCKING_ENGINE_TRANSACTION_ID   | 3097
BLOCKING_THREAD_ID               | 87
BLOCKING_EVENT_ID                | 9
BLOCKING_OBJECT_INSTANCE_BEGIN   | 139887487548648

Note that in MySQL 5.6 and 5.7, the similar information is stored inside innodb_lock_waits table under information_schema database. Pay attention to the BLOCKING_THREAD_ID value. We can use the this information to look for all statements being executed by this thread in events_statements_history table:

mysql> SELECT * FROM performance_schema.events_statements_history WHERE `THREAD_ID` = 87;
0 rows in set

It looks like the thread information is no longer there. We can verify by checking the minimum and maximum value of the thread_id column in events_statements_history table with the following query:

mysql> SELECT min(`THREAD_ID`), max(`THREAD_ID`) FROM performance_schema.events_statements_history;
+------------------+------------------+
| min(`THREAD_ID`) | max(`THREAD_ID`) |
+------------------+------------------+
| 98               | 129              |
+------------------+------------------+

The thread that we were looking for (87) has been truncated from the table. We can confirm this by looking at the size of event_statements_history table:

mysql> SELECT @@performance_schema_events_statements_history_size;
+-----------------------------------------------------+
| @@performance_schema_events_statements_history_size |
+-----------------------------------------------------+
| 10                                                  |
+-----------------------------------------------------+

The above means the events_statements_history can only store the last 10 threads. Fortunately, performance_schema has another table to store more rows called events_statements_history_long, which stores similar information but for all threads and it can contain way more rows:

mysql> SELECT @@performance_schema_events_statements_history_long_size;
+----------------------------------------------------------+
| @@performance_schema_events_statements_history_long_size |
+----------------------------------------------------------+
| 10000                                                    |
+----------------------------------------------------------+

However, you will get an empty result if you try to query the events_statements_history_long table for the first time. This is expected because by default, this instrumentation is disabled in MySQL as we can see in the following setup_consumers table:

mysql> SELECT * FROM performance_schema.setup_consumers;
+----------------------------------+---------+
| NAME                             | ENABLED |
+----------------------------------+---------+
| events_stages_current            | NO      |
| events_stages_history            | NO      |
| events_stages_history_long       | NO      |
| events_statements_current        | YES     |
| events_statements_history        | YES     |
| events_statements_history_long   | NO      |
| events_transactions_current      | YES     |
| events_transactions_history      | YES     |
| events_transactions_history_long | NO      |
| events_waits_current             | NO      |
| events_waits_history             | NO      |
| events_waits_history_long        | NO      |
| global_instrumentation           | YES     |
| thread_instrumentation           | YES     |
| statements_digest                | YES     |
+----------------------------------+---------+

To activate table events_statements_history_long, we need to update the setup_consumers table as below:

mysql> UPDATE performance_schema.setup_consumers SET enabled = 'YES' WHERE name = 'events_statements_history_long';

Verify if there are rows in the events_statements_history_long table now:

mysql> SELECT count(`THREAD_ID`) FROM performance_schema.events_statements_history_long;
+--------------------+
| count(`THREAD_ID`) |
+--------------------+
| 4                  |
+--------------------+

Cool. Now we can wait until the InnoDB lock wait event raises again and when it is happening, you should see the following row in the data_lock_waits table:

mysql> SELECT * FROM performance_schema.data_lock_waitsG
***************************[ 1. row ]***************************
ENGINE                           | INNODB
REQUESTING_ENGINE_LOCK_ID        | 139887595270456:6:4:2:139887487555024
REQUESTING_ENGINE_TRANSACTION_ID | 3083
REQUESTING_THREAD_ID             | 60
REQUESTING_EVENT_ID              | 9
REQUESTING_OBJECT_INSTANCE_BEGIN | 139887487555024
BLOCKING_ENGINE_LOCK_ID          | 139887595269584:6:4:2:139887487548648
BLOCKING_ENGINE_TRANSACTION_ID   | 3081
BLOCKING_THREAD_ID               | 57
BLOCKING_EVENT_ID                | 8
BLOCKING_OBJECT_INSTANCE_BEGIN   | 139887487548648

Again, we use the BLOCKING_THREAD_ID value to filter all statements that have been executed by this thread against events_statements_history_long table: 

mysql> SELECT `THREAD_ID`,`EVENT_ID`,`EVENT_NAME`, `CURRENT_SCHEMA`,`SQL_TEXT` FROM events_statements_history_long 
WHERE `THREAD_ID` = 57
ORDER BY `EVENT_ID`;
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+
| THREAD_ID | EVENT_ID | EVENT_NAME            | CURRENT_SCHEMA | SQL_TEXT                                                       |
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+
| 57        | 1        | statement/sql/select  |          | select connection_id()                                         |
| 57        | 2        | statement/sql/select  |          | SELECT @@VERSION                                               |
| 57        | 3        | statement/sql/select  |          | SELECT @@VERSION_COMMENT                                       |
| 57        | 4        | statement/com/Init DB |          |                                                          |
| 57        | 5        | statement/sql/begin   | mydb           | begin                                                          |
| 57        | 7        | statement/sql/select  | mydb           | select 'T1 is in the house'                                    |
| 57        | 8        | statement/sql/select  | mydb           | select * from table1                                           |
| 57        | 9        | statement/sql/select  | mydb           | select 'some more select'                                      |
| 57        | 10       | statement/sql/update  | mydb           | update table1 set data = 'T1 is updating the row' where id = 1 |
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+

Finally, we found the culprit. We can tell by looking at the sequence of events of thread 57 where the above transaction (T1) still has not finished yet (no COMMIT or ROLLBACK), and we can see the very last statement has obtained an exclusive lock to the row for update operation which needed by the other transaction (T2) and just hanging there. That explains why we see 'Sleep' in the MySQL processlist output.

As we can see, the above SELECT statement requires you to get the thread_id value beforehand. To simplify this query, we can use IN clause and a subquery to join both tables. The following query produces an identical result like the above:

mysql> SELECT `THREAD_ID`,`EVENT_ID`,`EVENT_NAME`, `CURRENT_SCHEMA`,`SQL_TEXT` from events_statements_history_long WHERE `THREAD_ID` IN (SELECT `BLOCKING_THREAD_ID` FROM data_lock_waits) ORDER BY `EVENT_ID`;
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+
| THREAD_ID | EVENT_ID | EVENT_NAME            | CURRENT_SCHEMA | SQL_TEXT                                                       |
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+
| 57        | 1        | statement/sql/select  |          | select connection_id()                                         |
| 57        | 2        | statement/sql/select  |          | SELECT @@VERSION                                               |
| 57        | 3        | statement/sql/select  |          | SELECT @@VERSION_COMMENT                                       |
| 57        | 4        | statement/com/Init DB |          |                                                          |
| 57        | 5        | statement/sql/begin   | mydb           | begin                                                          |
| 57        | 7        | statement/sql/select  | mydb           | select 'T1 is in the house'                                    |
| 57        | 8        | statement/sql/select  | mydb           | select * from table1                                           |
| 57        | 9        | statement/sql/select  | mydb           | select 'some more select'                                      |
| 57        | 10       | statement/sql/update  | mydb           | update table1 set data = 'T1 is updating the row' where id = 1 |
+-----------+----------+-----------------------+----------------+----------------------------------------------------------------+

However, it is not practical for us to execute the above query whenever InnoDB lock wait event occurs. Apart from the error from the application, how would you know that the lock wait event is happening? We can automate this query execution with the following simple Bash script, called track_lockwait.sh:

$ cat track_lockwait.sh
#!/bin/bash
## track_lockwait.sh
## Print out the blocking statements that causing InnoDB lock wait

INTERVAL=5
DIR=/root/lockwait/

[ -d $dir ] || mkdir -p $dir

while true; do
  check_query=$(mysql -A -Bse 'SELECT THREAD_ID,EVENT_ID,EVENT_NAME,CURRENT_SCHEMA,SQL_TEXT FROM events_statements_history_long WHERE THREAD_ID IN (SELECT BLOCKING_THREAD_ID FROM data_lock_waits) ORDER BY EVENT_ID')

  # if $check_query is not empty
  if [[ ! -z $check_query ]]; then
    timestamp=$(date +%s)
    echo $check_query > $DIR/innodb_lockwait_report_${timestamp}
  fi

  sleep $INTERVAL
done

Apply executable permission and daemonize the script in the background:

$ chmod 755 track_lockwait.sh
$ nohup ./track_lockwait.sh &

Now, we just need to wait for the reports to be generated under the /root/lockwait directory. Depending on the database workload and row access patterns, you might probably see a lot of files under this directory. Monitor the directory closely otherwise it would be flooded with too many report files.

If you are using ClusterControl, you can enable the Transaction Log feature under Performance -> Transaction Log where ClusterControl will provide a report on deadlocks and long-running transactions which will ease up your life in finding the culprit.

Conclusion

In summary, if we face a “Lock Wait Timeout Exceeded” error in MySQL, we need to first understand the effects that such an error can have to our infrastructure, then track the offensive transaction and act on it either with shell scripts like track_lockwait.sh, or database management software like ClusterControl.

If you decide to go with shell scripts, just bear in mind that they may save you money but will cost you time, as you’d need to know a thing or two about how they work, apply permissions, and possibly make them run in the background, and if you do get lost in the shell jungle, we can help.

Whatever you decide to implement, make sure to follow us on Twitter or subscribe to our RSS feed to get more tips on improving the performance of both your software and the databases backing it, such as this post covering 6 common failure scenarios in MySQL.

source : https://severalnines.com/blog/

Basics of Kubernetes

 Kubernetes, often abbreviated as K8s , is an open-source platform designed to automate the deployment, scaling, and management of container...