Monday, June 10, 2024

Optimize Query Performance in MySQL Databases

Optimize Query Performance in MySQL Databases

In this we learn step by step to Optimize Query Performance in MySQL Databases and cover below :

  • MySQL query execution processes

  • Identify slow queries in MySQL using several query profiling techniques

  • Optimize your queries for faster response times

  • Understand MySQL optimizes and executes your queries

  • Control and modify the default query execution plan of your query

  • EXPLAIN and EXPLAIN ANALYZE keywords provide information about how MySQL databases execute queries

MySQL query execution processes



  1. The MySQL client sends your query to the MySQL server using the MySQL Client/Server Protocol.


    The query is parsed, preprocessed, and finally optimized into a query execution plan by the MySQL query optimizer. The optimizer may ask the Storage engine for statistics about the tables referenced in your query before execution.


    The Query Execution Engine executes the plan by making calls to the Storage engine through special handler interfaces.


    The MySQL server sends the results to the MySQL client.

Identify slow queries in MySQL using several query profiling technique.

Slow queries can be annoying, but we can identify and fix them. The amount of time it takes MySQL to execute your query is known as its response time. It is the most critical metric used in measuring your query’s speed. In other words, the performance of an individual query or transaction is directly proportional to its response time.

There are several reasons why a query may be slow. These reasons can range from your current hardware configuration, to the permissions you may have set, improper index usage, bad schema design, or even the intention of your queries. Whenever you send a query to the MySQL server, you send a series of instructions that you want it to perform.

Some instructions are simple — e.g., a query that searches using a single parameter — and some instructions can be complex — e.g., queries that involve complex joins and subqueries. In whatever form the MySQL server receives your queries, they will consume time.
The slow_query_log

We can identify slow queries by inspecting the slow query logs. This built-in feature enables you to log queries that exceed a time limit you set using the long_query_time system variable. The default time limit is 10 seconds, i.e., MySQL will log any query that runs longer than 10 seconds. This slow query log, like the INFORMATION_SCHEMA.PROFILING, is not enabled by default. To use it, you must first enable it by setting the slow_query_log global variable to 'ON':

mysql> SET GLOBAL slow_query_log = 'ON';
Query OK, 0 rows affected (0.08 sec)

To confirm, you could always view a global variable’s value this way:

mysql> SHOW GLOBAL VARIABLES LIKE 'slow_query_log';
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| slow_query_log | ON |
+----------------+-------+

You can change the time limit in seconds by typing the following command:

mysql> SET GLOBAL long_query_time = 60;

Query OK, 0 rows affected (0.08 sec)

You can also change the default location for the slow query log file, usually found at var/lib/mysql/hostname-slow.log, to any destination of your choice:

mysql> SET GLOBAL slow_query_log_file = '/somepath/filename.log';

Query OK, 0 rows affected (0.08 sec)

Once you’ve run any query that exceeds the time limit you configured, it will be logged in the slow query log by the MySQL server. You can always inspect the file to see those slow queries.

The INFORMATION_SCHEMA.PROFILING table

The INFORMATION_SCHEMA.PROFILING table stores profiling information about queries you run in a current interactive session. It’s disabled by default, but you can enable query profiling for your current session by setting the profiling session variable as shown below. Note that the profiling information is lost when a session ends:

mysql> SET SESSION profiling = 1;

Query OK, 0 rows affected, 1 warning (0.00 sec)

After that, you can select the database you want to use:

mysql> USE sakila;

Database changedCode language: PHP (php)

Next, you run your query (Make sure you do not use the EXPLAIN statement)

mysql> SELECT * FROM customer;

Run the SHOW PROFILES query

mysql> SHOW PROFILES;

+----------+------------+------------------------+

| Query_ID | Duration   | Query                  |

+----------+------------+------------------------+

|        1 | 0.02364600 | SELECT DATABASE()      |

|        2 | 0.04395425 | show databases         |

|        3 | 0.00854575 | show tables            |

|        4 | 0.00213000 | SELECT * FROM customer |

+----------+------------+------------------------+

4 rows in set, 1 warning (0.00 sec)

The SHOW PROFILES query fetches all the list of queries in the current session. It has three columns:

  1. The Query_ID – a unique numerical identifier for a query

  2. The Duration is the time taken to execute a query

  3. The Query column shows the query that the MySQL server executed

To see more information about a particular query, we can run the following:

The EXPLAIN statement

Another tool MySQL provides for query analysis is the EXPLAIN statement. This statement shows us how the MySQL server will execute a plan. It’s quite easy to use. Anytime you want to see how MySQL will execute your query before running it, just prepend the EXPLAIN statement to it:

mysql> EXPLAIN SELECT * FROM customer\G

*************************** 1. row ***************************

          id: 1

  select_type: SIMPLE

        table: customer

  partitions: NULL

        type: ALL

possible_keys: NULL

          key: NULL

      key_len: NULL

          ref: NULL

        rows: 599

    filtered: 100.00

        Extra: NULL

1 row in set, 1 warning (0.00 sec)

Ending the SQL statement with “\G” instead of “;” causes MySQL to return the query result in vertical format, which makes it easier to read tables with many long columns.

The above example tells us that the server only needs to scan 599 rows to get the result.

The EXPLAIN ANALYZE statement

The profiling techniques we have discussed won’t be complete without mentioning EXPLAIN ANALYZE.

It works similarly to the EXPLAIN statement. The key difference is that the former is more verbose. You get more statistics about the query execution.

Another difference is that EXPLAIN doesn’t execute queries, but EXPLAIN ANALYZE does. That’s because to fetch the missing statistics, it needs to communicate with the Storage engine. We’ll explain how this communication works in a later section. Meanwhile, let’s demonstrate this profiling technique. To profile a query with the EXPLAIN ANALYZE statement, you just need to prepend the query with the text like this:

mysql> EXPLAIN ANALYZE SELECT first_name, last_name, city, country

FROM customer

INNER JOIN address USING(address_id)

INNER JOIN city USING(city_id) INNER JOIN country USING(country_id);


+---------------------------------------------------------------------------------------------------------------------->

| EXPLAIN                                                                                                              >

+---------------------------------------------------------------------------------------------------------------------->

| -> Nested loop inner join  (cost=690.10 rows=599) (actual time=0.073..8.259 rows=599 loops=1)

    -> Nested loop inner join  (cost=480.45 rows=599) (actual time=0.064..5.635 rows=599 loops=1)

        -> Nested loop inner join  (cost=270.80 rows=599) (actual time=0.057..3.083 rows=599 loops=1)

            -> Table scan on customer  (cost=61.15 rows=599) (actual time=0.043..0.478 rows=599 loops=1)

            -> Single-row index lookup on address using PRIMARY (address_id=customer.address_id)  (cost=0.25 rows=1) (actual time=0.002..0.002 rows=1 loops=599)

        -> Single-row index lookup on city using PRIMARY (city_id=address.city_id)  (cost=0.25 rows=1) (actual time=0.002..0.002 rows=1 loops=599)

    -> Single-row index lookup on country using PRIMARY (country_id=city.country_id)  (cost=0.25 rows=1) (actual time=0.002..0.002 rows=1 loops=599)

|

+---------------------------------------------------------------------------------------------------------------------->

(END)

As you can see, the results show how the MySQL server executed the plan. It introduces a few more statistics:

  • The actual time to fetch the first row in a table (in milliseconds)

  • The actual time to fetch all rows in a table (in milliseconds)

  • The estimated costs of the query

  • The actual number of rows read

  • The actual number of loops made

The estimated cost of a query’s execution plan is the cumulative sum of the costs of all operations within the plan, such as I/O or CPU operations.

Inspecting the last_query_cost session variable

Another quick tool to find out the estimated cost of a query is by inspecting the last_query_cost session variable:

mysql> SHOW STATUS LIKE 'last_query_cost';

+-----------------+------------+

| Variable_name   | Value      |

+-----------------+------------+

| Last_query_cost | 520.262389 |

+-----------------+------------+

1 row in set (0.00 sec)


Optimize slow queries

Once we identify and analyze slow queries, the next thing would be to fix the slow-performing queries. Instinctively, the first thing we would do is to check the query itself. The following techniques demonstrate how we can optimize slow queries.

Only retrieve the data you need

While the server scans all the rows your query asks for, it locks any resources available in the session, and the client cannot interrupt the process without you forcefully quitting the session. Also, the client cannot selectively access data until the server has scanned all the rows.

The query you’re sending may be asking for too much data, which causes the MySQL server to waste time scanning many rows of data. But we can fix it by doing the following:

Fetch the number of rows you need only (pagination)

We can fix our query using a common database technique known as pagination. We can use the LIMIT and OFFSET clauses to specify the number of rows we need and from which row it should start fetching data. 


Fetch only the columns you need

Our queries in the last example are still not fully optimized. If you observe them, you will notice we fetched our data in all the table columns using SELECT * (select all). A better way to have accessed our data would have been by naming the columns we need in our query. 


Fetch the columns you need when joining tables

Just like the previous example. You can make your joins faster by only asking for the columns you need in the result set. Let’s say you want to fetch only the first and last names of all the customers in our earlier join example. We would write our query in this manner:

SELECT first_name, last_name

FROM customer

INNER JOIN address USING(address_id)

INNER JOIN city USING(city_id)

INNER JOIN country USING(country_id);

Doing so will prevent the MySQL server from scanning all the columns of the merged tables.

Use caching strategies to prevent fetching the same data repeatedly

If your application uses the same piece of data frequently, instead of running the query every time the app needs the data, you can cache the data the first time the query runs and simply reuse the data subsequently.

The only exception is if the data changes frequently or when you need your queried information to be up-to-date every time. There used to be an in-built Cache layer in MySQL called “The Query Cache,” but it was deprecated for scalability reasons. Luckily, some tools can help cache data, such as Memcached and Redis.

Avoid LIKE expressions with leading wildcards

Wildcards are used in conjunction with LIKE clauses. They are characters used to help search for data that match complex criteria.

The position of wildcard characters in the LIKE clause can cause unexpected performance behavior. Using leading wildcards has been known to be the culprit most times, and this is because MySQL is not able to utilize indexes efficiently when you have a leading wildcard:

SELECT * FROM customer WHERE first_name LIKE '%AB%'; // Leading wildcard causes poor performance

UNION vs. UNION ALL

The MySQL query optimizer doesn’t apply as many optimizations to UNION queries as other queries, but you can specify clauses such as WHERE, LIMIT, ORDER BY, etc., to help the optimizer. But an even better option would be to use the UNION ALL clause if you need to eliminate duplicate rows in your query result.

That’s because omitting ALL adds the DISTINCT option to the temporary table, which results in the MySQL server making expensive scans on all the rows to determine uniqueness. The key takeaway is to avoid using DISTINCT and UNION unless necessary.

Optimize JOIN queries

You should use INNER JOIN instead of OUTER JOIN where possible. You can also optimize your JOIN queries by ensuring that indexes are present on the columns specified in the ON and USING clauses.

GROUP BY and ORDER BY

 GROUP BY and ORDER BY expressions should always refer to columns from a single table so that the server can use indexes for such operations.

Optimize COUNT() queries

The COUNT() function is a special aggregate function used in two ways. First, when you specify a column name or expression in the COUNT() function, it counts the number of times that expression has value (i.e. a non-NULL expression).

The second way to use COUNT() is for counting the number of rows in a query result using the COUNT(*) expression. It ignores the columns in the table and counts rows instead; therefore, it’s best to use this version when you only need to count the number of rows in the query result.

Help the MySQL optimizer

There are a few cases where the optimizer may not choose the best execution path. The reasons could be that MySQL does not have sufficient information about the current data and must make “educated” guesses about the data.

In such cases, there are some methods available to help MySQL:

  • Using the EXPLAIN statement to view information about how MySQL processed your query. I showed you how to do that earlier.

  • Using FORCE INDEX, USE INDEX, and IGNORE INDEX (Index hints) for the scanned table to tell MySQL that the table scans are expensive compared to using the given index. Index hints give you  total control over how the optimizer chooses indexes for a query execution plan. It’s best to use this as a last resort after trying other optimization techniques discussed. The syntax for using index hints looks like this:

SELECT * FROM table1, table2 FORCE INDEX (index_of_column)

WHERE table1.col_name=table2.col_name;

  • Applying Optimizer Hints at the different scope levels: global, query block, table-level, or index-level to control the optimizer’s strategies based on certain criteria. Optimizer hints are usually applied per statement and can be used together with Index hints. They look like comments but are recognized by the query parser. The syntax for adding optimizer hints looks like this:

SELECT /*+ NO_RANGE_OPTIMIZATION(t3 PRIMARY, f2_idx) */ f1

  FROM t3 WHERE f1 > 30 AND f1 < 33;

SELECT /*+ BKA(t1) NO_BKA(t2) */ * FROM t1 INNER JOIN t2 WHERE ...;

SELECT /*+ NO_ICP(t1, t2) */ * FROM t1 INNER JOIN t2 WHERE ...;

SELECT /*+ SEMIJOIN(FIRSTMATCH, LOOSESCAN) */ * FROM t1 ...;

EXPLAIN SELECT /*+ NO_ICP(t1) */ * FROM t1 WHERE ...;

  • Using the ANALYZE TABLE table_name to update the key distributions of the scanned table.

  • Tuning global or thread-specific system variables. For example, you can start the mysqld with the --max-seeks-for-key=1000 option or use SET max_seeks_for_key=1000 to tell the optimizer to assume that no key scan causes more than 1,000 key seeks.

  • Re-writing your query: Sometimes, your query might need refactoring to improve its response time. We discussed some scenarios we could do in the previous sections.

  • Adding Indexes to your tables: For tables that do not need frequent batch inserts or updates, you might want to use an index. Indexes also speed up JOIN operations and help MySQL retrieve data faster, thereby improving your query response time. For smaller tables, adding indexes may not be optimal. Also, you should not index everything, as it increases disk space usage and makes updating or inserting records slow.

  • Re-designing your schema: To access tables with the best possible speed, you must consider schema optimization. A poorly designed database schema forces MySQL to take a longer and utilize more system resources when processing your queries.

Index and indexing

  • Reason for Slow Query and Way to Query improvement :

    • Excessive Full Table Scans

    • Wrong Index - join or filter condition is missing or references the wrong table

    • No Index Used

    • Not a Left Prefix of Index

    • Data Types Not Matching

    • Functional Dependencies

  • Improving the Index Use

    • Add a Covering Index

    • Rewriting Complex Index Conditions (use CTE & Window Functions)

    • Splitting a Query Into Parts

    • SKIP LOCKED

    • Avoid Many OR or IN Conditions

No comments:

Basics of Kubernetes

 Kubernetes, often abbreviated as K8s , is an open-source platform designed to automate the deployment, scaling, and management of container...