问题:
前段时间,一朋友面试的时候,问到sql优化时,说sql查询效率 exists大于in,果真如此?
准备
新建users
/* 用户表 */
drop table if exists users;
create table users(
id int primary key auto_increment,
name varchar(20)
);
insert into users(name) values ('A');
insert into users(name) values ('B');
insert into users(name) values ('C');
insert into users(name) values ('D');
insert into users(name) values ('E');
insert into users(name) values ('F');
insert into users(name) values ('G');
insert into users(name) values ('H');
insert into users(name) values ('I');
insert into users(name) values ('J');
新建orders
/* 订单表 */
drop table if exists orders;
create table orders(
id int primary key auto_increment,/*订单id*/
order_no varchar(20) not null,/*订单编号*/
title varchar(20) not null,/*订单标题*/
goods_num int not null,/*订单数量*/
money decimal(7,4) not null,/*订单金额*/
user_id int not null /*订单所属用户id*/
)engine=myisam default charset=utf8 ;
创建订单存储过程
delimiter $
drop procedure batch_orders $
create procedure batch_orders(in max int)
begin
declare start int default 0;
declare i int default 0;
set autocommit = 0;
while i < max do
set i = i + 1;
insert into orders(order_no,title,goods_num,money,user_id)
values (concat('NCS-',floor(1 + rand()*1000000000000 )),concat('订单title-',i),i%50,(100.0000+(i%50)),i%10);
end while;
commit;
end $
delimiter ;
call batch_orders(10000000); # 创建1000W数据
模拟
场景一: 子查询 < 主查询
mysql> select count(1) from orders where user_id in (select id from users) ;
+----------+
| count(1) |
+----------+
| 9000000 |
+----------+
1 row in set (9.47 sec)
mysql> select count(1) from orders where exists (select id from users where orders.user_id = users.id);
+----------+
| count(1) |
+----------+
| 9000000 |
+----------+
1 row in set (12.18 sec)
场景二:子查询 > 主查询
mysql> select count(1) from users where id in (select user_id from orders);
+----------+
| count(1) |
+----------+
| 9 |
+----------+
1 row in set (4.13 sec)
mysql> select count(1) from users where exists (select 1 from orders where users.id=orders.user_id);
+----------+
| count(1) |
+----------+
| 9 |
+----------+
1 row in set (1.35 sec)
分析:
in执行顺序:先执行in中的子查询,作为我们最外层循环,主查询作为内层循环
exists: 主查询作为最外层循环,子查询作为最内层循环(工作原理先将主查询的结果作为子查询的条件)
结论
exists 性能大于in 视情况而定,
如果in中子查询<主循环,则exists
如果in中子查询>主循环,则exists > in;