Tuesday, June 30, 2009

MYSQL: Query Execution Basics

1. The client sends the SQL statement to the server.
-> The protocol is halfduplex, which means that at any given time the MySQL server can be either sending or receiving messages, but not both. It also means there is no way to cut a message short.
The client sends a query to the server as a single packet of data. This is why the max_packet_size configuration variable is important if you have large queries. Once the client sends the query, it doesn’t have the ball anymore; it can only wait for results. The response from the server usually consists of many packets of data.When the server responds, the client has to receive the entire result set. It cannot simply fetch a few rows and then ask the server not to bother sending the rest. If the client needs only the first few rows that are returned, it either has to wait for all of the server’s packets to arrive and then discard the ones it doesn’t need, or disconnect ungracefully. Neither is a good idea, which is why appropriate LIMIT clauses are so important.Here’s another way to think about this: when a client fetches rows from the server, it thinks it’s pulling them. But the truth is, the MySQL server is pushing the rows as it generates them. The client is only receiving the pushed rows; there is no way for it to tell the server to stop sending rows. The client is “drinking from the fire hose,” so to speak.



2. The server checks the query cache. If there’s a hit, it returns the stored result from the cache; otherwise, it passes the SQL statement to the next step.
-> Before even parsing a query, MySQL checks for it in the query cache, if the cache is enabled. This operation is a case sensitive hash lookup. If the query differs from a similar query in the cache by even a single byte, it won’t match, and the query processing will go to the next stage.
If MySQL does find a match in the query cache, it must check privileges before
returning the cached query. This is possible without parsing the query, because
MySQL stores table information with the cached query. If the privileges are OK,
MySQL retrieves the stored result from the query cache and sends it to the client,
bypassing every other stage in query execution. The query is never parsed, optimized,
or executed.

3. The server parses, preprocesses, and optimizes the SQL into a query execution
plan.

-> MySQL’s parser breaks the query into tokens and builds a “parse tree”
from them. The parser uses MySQL’s SQL grammar to interpret and validate the
query. For instance, it ensures that the tokens in the query are valid and in the proper order, and it checks for mistakes such as quoted strings that aren’t terminated. The preprocessor then checks the resulting parse tree for additional semantics that the parser can’t resolve. For example, it checks that tables and columns exist, and it resolves names and aliases to ensure that column references aren’t ambiguous.Next, the preprocessor checks privileges. This is normally very fast unless your server has large numbers of privileges.
-> The parse tree is now valid and ready for the optimizer to turn it into a query execution plan. A query can often be executed many different ways and produce the same result. The optimizer’s job is to find the best option. MySQL uses a cost-based optimizer, which means it tries to predict the cost of various execution plans and choose the least expensive. The unit of cost is a single random four-kilobyte data page read.

4. The query execution engine executes the plan by making calls to the storage engine API.
-> The parsing and optimizing stage outputs a query execution plan, which MySQL’s
query execution engine uses to process the query. The plan is a data structure; it is
not executable byte-code, which is how many other databases execute queries. In contrast to the optimization stage, the execution stage is usually not all that complex: MySQL simply follows the instructions given in the query execution plan.
Many of the operations in the plan invoke methods implemented by the storage
engine interface, also known as the handler API. Each table in the query is represented by an instance of a handler. If a table appears three times in the query, for example, the server creates three handler instances. Though we glossed over this before, MySQL actually creates the handler instances early in the optimization stage. The optimizer uses them to get information about the tables, such as their column names and index statistics.

5. The server sends the result to the client.
-> The final step in executing a query is to reply to the client. Even queries that don’t return a result set still reply to the client connection with information about the query, such as how many rows it affected.

Sunday, June 14, 2009

How do I Use the Linux Top Command?

The Unix top command is designed to help users determine which processes are running and which applications are using more memory or processing power than they should be.

The top command is very easy to use but you should know the things in details. The output of to is :

top output:

top - 22:09:08 up 14 min, 1 user, load average: 0.21, 0.23, 0.30
Tasks: 81 total, 1 running, 80 sleeping, 0 stopped, 0 zombie
Cpu(s): 9.5%us, 31.2%sy, 0.0%ni, 27.0%id, 7.6%wa, 1.0%hi, 23.7%si, 0.0%st
Mem: 255592k total, 167568k used, 88024k free, 25068k buffers
Swap: 524280k total, 0k used, 524280k free, 85724k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3166 apache 15 0 29444 6112 1524 S 6.6 2.4 0:00.79 httpd
3161 apache 15 0 29444 6112 1524 S 5.9 2.4 0:00.79 httpd
3164 apache 15 0 29444 6112 1524 S 5.9 2.4 0:00.75 httpd
3169 apache 15 0 29444 6112 1524 S 5.9 2.4 0:00.74 httpd
3163 apache 15 0 29444 6112 1524 S 5.6 2.4 0:00.76 httpd
3165 apache 15 0 29444 6112 1524 S 5.6 2.4 0:00.77 httpd
3167 apache 15 0 29444 6112 1524 S 5.3 2.4 0:00.73 httpd
3162 apache 15 0 29444 6112 1524 S 5.0 2.4 0:00.77 httpd
3407 root 16 0 2188 1012 816 R 1.7 0.4 0:00.51 top
240 root 15 0 0 0 0 S 0.3 0.0 0:00.08 pdflush
501 root 10 -5 0 0 0 S 0.3 0.0 0:01.20 kjournald
2794 root 18 0 12720 1268 560 S 0.3 0.5 0:00.73 pcscd
1 root 15 0 2060 636 544 S 0.0 0.2 0:03.81 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
5 root 10 -5 0 0 0 S 0.0 0.0 0:00.07 events/0


The first line in top:


top - 22:09:08 up 14 min, 1 user, load average: 0.21, 0.23, 0.30

“22:09:08″ is the current time; “up 14 min” shows how long the system has been up for; “1 user” how many users are logged in; “load average: 0.21, 0.23, 0.30″ the load average of the system (1minute, 5 minutes, 15 minutes).

Load average is an extensive topic and to understand its inner workings can be daunting. The simplest of definitions states that load average is the cpu utilization over a period of time. A load average of 1 means your cpu is being fully utilized and processes are not having to wait to use a CPU. A load average above 1 indicates that processes need to wait and your system will be less responsive. If your load average is consistently above 3 and your system is running slow you may want to upgrade to more CPU’s or a faster CPU.

The second line in top:

Tasks: 82 total, 1 running, 81 sleeping, 0 stopped, 0 zombie

Shows the number of processes and their current state.

The third lin in top:

Cpu(s): 9.5%us, 31.2%sy, 0.0%ni, 27.0%id, 7.6%wa, 1.0%hi, 23.7%si, 0.0%st

Shows CPU utilization details. “9.5%us” user processes are using 9.5%; “31.2%sy” system processes are using 31.2%; “27.0%id” percentage of available cpu; “7.6%wa” time CPU is waiting for IO.

When first analyzing the Cpu(s) line in top look at the %id to see how much cpu is available. If %id is low then focus on %us, %sy, and %wa to determine what is using the CPU.

The fourth and fifth lines in top:

Mem: 255592k total, 167568k used, 88024k free, 25068k buffers
Swap: 524280k total, 0k used, 524280k free, 85724k cached


Describes the memory usage. These numbers can be misleading. “255592k total” is total memory in the system; “167568K used” is the part of the RAM that currently contains information; “88024k free” is the part of RAM that contains no information; “25068K buffers and 85724k cached” is the buffered and cached data for IO.

So what is the actual amount of free RAM available for programs to use ?

The answer is: free + (buffers + cached)

88024k + (25068k + 85724k) = 198816k

How much RAM is being used by progams ?

The answer is: used - (buffers + cached)

167568k - (25068k + 85724k) = 56776k

The processes information:

Top will display the process using the most CPU usage in descending order. Lets describe each column that represents a process.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3166 apache 15 0 29444 6112 1524 S 6.6 2.4 0:00.79 httpd


PID - process ID of the process

USER - User who is running the process

PR - The priority of the process

NI - Nice value of the process (higher value indicates lower priority)

VIRT - The total amount of virtual memory used

RES - Resident task size

SHR - Amount of shared memory used

S - State of the task. Values are S (sleeping), D (uninterruptible sleep), R (running), Z (zombies), or T (stopped or traced)

%CPU - Percentage of CPU used

%MEM - Percentage of Memory used

TIME+ - Total CPU time used

COMMAND - Command issued
Interacting with TOP

Now that we are able to understand the output from TOP lets learn how to change the way the output is displayed.

Just press the following key while running top and the output will be sorted in real time.

M - Sort by memory usage

P - Sort by CPU usage

T - Sort by cumulative time

z - Color display

k - Kill a process

q - quit

If we want to kill the process with PID 3161, then press “k” and a prompt will ask you for the PID number, and enter 3161.
Command Line Parameters with TOP

You can control what top displays by issuing parameters when you run top.

- d - Controls the delay between refreshes

- p - Specify the process by PID that you want to monitor

-n - Update the display this number of times and then exit

If we want to only monitor the http process with a PID of 3166

$ top -p 3166

If we want to change the delay between refreshes to 5 seconds

$ top -d 5

Basics of Kubernetes

 Kubernetes, often abbreviated as K8s , is an open-source platform designed to automate the deployment, scaling, and management of container...