随遇而安_MySQL 状态变量 Aborted_connects 与 Aborted_clients 浅析_Nemo社区_LinkNemo

该文章投稿至Nemo社区数据库板块复制链接

MySQL 状态变量 Aborted_connects 与 Aborted_clients 浅析

发布于 2018/06/29 00:28 1,561浏览 0回复 13,558字

原文出处：潇湘隐者

关于MySQL的状态变量Aborted_clients & Aborted_connects分别代表的意义，以及哪些情况或因素会导致这些状态变量变化呢？下文通过实验测试来验证一下，首先我们来看看状态变量的描述：

Aborted Connect

Aborted Connect表示尝试连接到MySQL服务器失败的次数。这个状态变量可以结合host_cache表和其错误日志一起来分析问题。引起这个状态变量激增的原因如下：

客户端没有权限但是尝试访问MySQL数据库。
客户端输入的密码有误。
A connection packet does not contain the right information.
超过连接时间限制，主要是这个系统变量connect_timeout控制（mysql默认是10s，基本上，除非网络环境极端不好，一般不会超时。）

官方解释如下：

If a client is unable even to connect, the server increments the Aborted_connects status variable. Unsuccessful connection attempts can occur for the following reasons:

A client attempts to access a database but has no privileges for it.
A client uses an incorrect password.
A connection packet does not contain the right information.
It takes more than connect_timeout seconds to obtain a connect packet. See Section 5.1.7, “Server System Variables”.

Aborted Clients：

Aborted Clients表示由于客户端没有正确关闭连接而中止的连接数。官方解释如下：

The number of connections that were aborted because the client died without closing the connection properly. See Section B.5.2.10, “Communication Errors and Aborted Connections”

当Aborted Clients增大的时候意味着有客户端成功建立连接，但是由于某些原因断开连接或者被终止了，这种情况一般发生在网络不稳定的环境中。主要的可能性有：

客户端程序在退出之前未调用mysql_close（）正确关闭MySQL连接。
客户端休眠的时间超过了系统变量wait_timeout和interactive_timeout的值，导致连接被MySQL进程终止
客户端程序在数据传输过程中突然结束

官方文档B.5.2.10 Communication Errors and Aborted Connections的介绍如下：

If a client successfully connects but later disconnects improperly or is terminated, the server increments the Aborted_clients status variable, and logs an Aborted connection message to the error log. The cause can be any of the following:

The client program did not call mysql_close() before exiting.
The client had been sleeping more than wait_timeout or interactive_timeout seconds without issuing any requests to the server. See Section 5.1.7, “Server System Variables”.
The client program ended abruptly in the middle of a data transfer.

Other reasons for problems with aborted connections or aborted clients:

The max_allowed_packet variable value is too small or queries require more memory than you have allocated for mysqld. See Section B.5.2.9, “Packet Too Large”.
Use of Ethernet protocol with Linux, both half and full duplex. Some Linux Ethernet drivers have this bug. You should test for this bug by transferring a huge file using FTP between the client and server machines. If a transfer goes in burst-pause-burst-pause mode, you are experiencing a Linux duplex syndrome. Switch the duplex mode for both your network card and hub/switch to either full duplex or to half duplex and test the results to determine the best setting.
A problem with the thread library that causes interrupts on reads.
Badly configured TCP/IP.
Faulty Ethernets, hubs, switches, cables, and so forth. This can be diagnosed properly only by replacing hardware.

如上介绍所示，有很多因素引起这些状态变量的值变化，那么我们来一个个分析、演示一下吧。首先，我们来测试一下导致Aborted Connect状态变量增加的可能因素

1、客户端没有权限但是尝试访问MySQL数据库。

其实这里所说的没有权限，个人理解是：客户端使用没有授权的账号访问数据库。打个比方，你尝试用账号kkk访问MySQL数据库，其实你也不知道数据库是否存在这个用户，实际上不存在这个用户。

实验对比测试前，先将状态变量清零。

mysql> flush status;
Query OK, 0 rows affected (0.01 sec)
mysql> show status like 'Abort%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_clients  | 0     |
| Aborted_connects | 0     |
+------------------+-------+
2 rows in set (0.01 sec)
 
mysql> 
mysql> select host,user from mysql.user;
+-------------------------------+-----------+
| host                          | user      |
+-------------------------------+-----------+
| %                             | mydba     |
| %                             | root      |
| %                             | test      |
| 127.0.0.1                     | root      |
| 192.168.%                     | mydbadmin |
| 192.168.103.18,192.168.103,22 | LimitIP   |
| ::1                           | root      |
| db-server.localdomain         | root      |
| localhost                     | backuser  |
| localhost                     | root      |
+-------------------------------+-----------+

在本机的SecureCRT的另外一个窗口，使用不存在的账号kkk访问MySQL后，你会发现状态变量Aborted_connects变为1了。

[root@DB-Server ~]# mysql -u kkk -p
Enter password:
ERROR 1045 (28000): Access denied for user 'kkk'@'localhost' (using password: YES)

也有可能，这个账号本身存在，但是只允许特定IP地址才能访问，实际环境中，可能是有人在进行尝试暴力破解。可能性非常多。我们来测试一下限制IP访问的情况

mysql> grant all on MyDB.* to mydbadmin@'10.20.%' identified by '123456';
Query OK, 0 rows affected (0.01 sec)
 
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
 
mysql>  show status like 'Abort%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_clients  | 0     |
| Aborted_connects | 0     |
+------------------+-------+
2 rows in set (0.00 sec)

如上所示，创建一个mydbadmin的行号，只允许10.20段的IP访问，然后我们从192.168段的IP访问MySQL数据库

# mysql -h 10.20.57.24 -u mydbadmin -p
Enter password:
ERROR 1045 (28000): Access denied for user 'mydbadmin'@'192.168.7.208' (using password: YES)

此时，状态变量Aborted_connects就变为1了。

2、客户端输入的密码有误或者根本就是尝试各个密码。(A client uses an incorrect password)

如下所示，使用test账号访问MySQL数据，但是输入了一个错误密码

[root@DB-Server ~]# mysql -u test -p
Enter password:
ERROR 1045 (28000): Access denied for user 'test'@'localhost' (using password: YES)
[root@DB-Server ~]#

你检查状态变量Aborted_connects就会发现状态变量Aborted_connects变为2了。

mysql>  show status like 'Abort%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_clients  | 0     |
| Aborted_connects | 2     |
+------------------+-------+
2 rows in set (0.00 sec)

3： A connection packet does not contain the right information.

这个比较容易构造，可以对MySQL的端口进行端口测试（ping 端口），因为psping的包不包含正确的信息（right information），测试之前，先将状态变量清空。

mysql> flush status;
 Query OK, 0 rows affected (0.00 sec)
mysql> show status like 'abort%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_clients  | 0     |
| Aborted_connects | 0     |
+------------------+-------+
2 rows in set (0.00 sec)

在客户端对MySQL服务所在的主机进行端口连通性验证（psping）

如上所示，psping测试后，Aborted_connects变成了5，如果继续进行psping测试，那么这个状态变量就会继续增长。

另外，如果超过max_connect_error的限制后，某一个客户端持续访问MySQL，这个是否会引起状态变量Aborted_connects变化呢，实验测试的答案是不会。有兴趣的可以验证一下，很奇怪，网上有不少文章都说如果连接数满了，也会导致Aborted_connects状态变量增加，实际上这个是不会引起状态变量Aborted_connects变化的。

4、超过连接时间限制，主要是这个参数connect_timeout控制（mysql默认是10s，基本上，除非网络环境极端不好，一般不会超时。）

首先在一台MySQL数据库服务器上执行下面命令，我们用Linux下的netem与tc命令模拟构造出复杂环境下的网络传输延时案例，延时11秒。

# tc qdisc add dev eth0 root netem delay 11000ms

在另外一台MySQL服务器ping这台MySLQ服务器，如下所示，你会看到网络时延为11秒

# ping 10.20.57.24
PING 10.20.57.24 (10.20.57.24) 56(84) bytes of data.
64 bytes from 10.20.57.24: icmp_seq=1 ttl=61 time=11001 ms
64 bytes from 10.20.57.24: icmp_seq=2 ttl=61 time=11001 ms
64 bytes from 10.20.57.24: icmp_seq=3 ttl=61 time=11001 ms
64 bytes from 10.20.57.24: icmp_seq=4 ttl=61 time=11001 ms
64 bytes from 10.20.57.24: icmp_seq=5 ttl=61 time=11001 ms

此时访问MySQL数据库，由于网络时延为11秒，超出了系统变量connect_timeout的10秒，就会出现下面错误，此时状态变量Aborted_connects的值变化！

# mysql -h 10.20.57.24 -u test -p
Enter password:
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 0

那么如何区分状态变量Aborted Connect是那个引起的呢？单从状态变量本身是无法区分的，但是可以结合performance_schema.host_cache来稍微做判别、甄别。

COUNT_NAMEINFO_PERMANENT_ERRORS IP到主机名称DNS解析期间的永久性错误数。
COUNT_AUTHENTICATION_ERRORS 验证失败导致的错误数量
SUM_CONNECT_ERRORS：被视为“ 阻塞 ”的连接错误的数量（根据max_connect_errors系统变量进行评估）。只有协议握手错误才会被计数，只有通过验证（HOST_VALIDATED = YES）的主机才会被计数

1、客户端没有权限但是尝试访问MySQL数据库。

每次都会引起COUNT_AUTHENTICATION_ERRORS增1 ，第一次会引起COUNT_NAMEINFO_PERMANENT_ERRORS也增1

2、客户端输入的密码有误

每次都会引起COUNT_AUTHENTICATION_ERRORS增1 ，第一次会引起COUNT_NAMEINFO_PERMANENT_ERRORS也增1

其实对于与1和2，两者无法判别，最简单有效的将系统变量log_warnings设置为2，然后分析、查看错误日志信息：

mysql> set global log_warnings=2;
Query OK, 0 rows affected (0.00 sec)
mysql>

那么此时1和2都会记录到错误日志里面去，然后你就可以通过分析错误日志，结合状态变量Aborted Connect来分析，如下测试案例所示：

2018-06-20 22:44:16 18026 [Warning] IP address '192.168.xxx.xxx' could not be resolved: Name or service not known
2018-06-20 22:44:16 18026 [Warning] Access denied for user 'kkkk'@'192.168.xxx.xxx' (using password: YES)
2018-06-20 22:45:18 18026 [Warning] Access denied for user 'test'@'192.168.xxx.xxx' (using password: YES)

3、 A connection packet does not contain the right information

每次引起COUNT_HANDSHAKE_ERRORS增1，

每次引起SUM_CONNECT_ERRORS增1

 
 
 
PsPing v2.10 - PsPing - ping, latency, bandwidth measurement utility
 
Copyright (C) 2012-2016 Mark Russinovich
 
Sysinternals - www.sysinternals.com
 
 
 
TCP connect to 10.20.57.24:3306:
 
5 iterations (warmup 1) ping test:
 
Connecting to 10.20.57.24:3306 (warmup): from 192.168.103.34:55327: 1.93ms
 
Connecting to 10.20.57.24:3306: from 192.168.103.34:55328: 10.08ms
 
Connecting to 10.20.57.24:3306: from 192.168.103.34:55329: 3.35ms
 
Connecting to 10.20.57.24:3306: from 192.168.103.34:55330: 3.71ms
 
Connecting to 10.20.57.24:3306: from 192.168.103.34:55331: 2.32ms
 
 
 
TCP connect statistics for 10.20.57.24:3306:
 
  Sent = 4, Received = 4, Lost = 0 (0% loss),
 
  Minimum = 2.32ms, Maximum = 10.08ms, Average = 4.87ms

4、超过连接时间限制

如果是超时引起，那么就会出现下面状况：

每次引起SUM_CONNECT_ERRORS增1，
每次引起COUNT_HANDSHAKE_ERRORS增1
第一次会引起COUNT_NAMEINFO_PERMANENT_ERRORS增1

注意： 3与4不会写入错误日志，3与4的区别可以通过COUNT_NAMEINFO_PERMANENT_ERRORS的值来区别。

下面我们来实验测试一下状态变量Aborted Clients的变化因素，

1、客户端程序在退出之前未调用mysql_close（）正确关闭MySQL连接。

在实验前，使用flush status清理一下状态变量

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> show status like 'Abort%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_clients  | 0     |
| Aborted_connects | 0     |
+------------------+-------+
2 rows in set (0.00 sec)

mysql>

写一个简单的Python测试脚本python_mysql.py，如下所示，将关闭数据库连接的地方dbcon.close注释掉，

import mysql.connector

try:
 
 dbcon=mysql.connector.connect(
 host='127.0.0.1',
 user='root' ,
 passwd='xxxxxxx',
 database='information_schema'
 )

 cursor= dbcon.cursor()
 sql_tex='select count(*) from MyDB.test'
 cursor.execute(sql_tex)
 dtlist= cursor.fetchall()
 print dtlist
except mysql.connector.Error as e:

  print('operation the sql fail!{0}'.format(e))
  
finally:

  cursor.close;
 # dbcon.close;

然后执行一下脚本，检查状态变量Aborted_clients，然后发现状态变量Aborted_clients的值增1了。

2、客户端休眠的时间超过了系统变量wait_timeout和interactive_timeout的值，导致连接被MySQL进程终止

mysql> show global variables like 'interactive_timeout';
+---------------------+-------+
| Variable_name       | Value |
+---------------------+-------+
| interactive_timeout | 28800 |
+---------------------+-------+
1 row in set (0.00 sec)

mysql> show global variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.00 sec)

mysql>

将全局系统变量interactive_timeout 和wait_timeout设置为4秒

mysql> set global interactive_timeout=4;
Query OK, 0 rows affected (0.00 sec)

mysql> set global wait_timeout=4;
Query OK, 0 rows affected (0.00 sec)

mysql> show status like 'Abort%';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_clients  | 0     |
| Aborted_connects | 0     |
+------------------+-------+
2 rows in set (0.00 sec)

然后在客户端连接到MySQL数据库，不做任何操作，过来4秒后，你去操作就会出现错误“ERROR 2013 (HY000): Lost connection to MySQL server during query”

# mysql -h 10.20.57.24 -u test -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 43
Server version: 5.6.20-enterprise-commercial-advanced-log MySQL Enterprise Server - Advanced Edition (Commercial)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select current_user();
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql>

在MySQL服务器你就会看到状态变量Aborted_clients变为1了。

mysql> show status like 'Abort%';

+------------------+-------+

| Variable_name    | Value |

+------------------+-------+

| Aborted_clients  | 1     |

| Aborted_connects | 0     |

+------------------+-------+

2 rows in set (0.00 sec

还有其他一些原因(客户端异常中断或查询超出了max_allowed_packet值)由于不方便构造，在此略过。另外，其实我们还可以通过tcpdump抓包工具来追踪分析。下面举个例子（这里

简单介绍一下tcpdump,后续文章再做展开分析）

在MySQL服务器使用tcpdump抓包

[root@DB-Server ~]# tcpdump -i eth0 port 3306 -s 1500 -w tcpdump.log

然后在另外一台MySQL服务器，使用不存在的账号或错误的密码访问MySQL数据库

# mysql -h 10.20.57.24 -u kkk -p

Enter password:

ERROR 1045 (28000): Access denied for user 'kkk'@'192.168.7.208' (using password: YES)

# mysql -h 10.20.57.24 -u test -p

Enter password:

ERROR 1045 (28000): Access denied for user 'test'@'192.168.7.208' (using password: YES)

[root@GETLNX28 ~]#

执行完命令后，你可以使用CTRL + C结束抓包分析，然后查看分析。如下截图所示：

[root@DB-Server ~]# tcpdump -i eth0 port 3306 -s 1500 -w tcpdump.log

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 1500 bytes

28 packets captured

28 packets received by filter

0 packets dropped by kernel

[root@DB-Server ~]# strings tcpdump.log

参考资料：

https://dev.mysql.com/doc/refman/8.0/en/communication-errors.html
http://www.olivierdoucet.info/blog/2012/05/08/customer-case-finding-cause-of-max_user_connections/

点了个评