两道分组函数的典型题

首页 / 学习区 / 正文

这两道题都基于hr.employees表,表的结构如下:

image-20210331112209612

题一:注意count的坑

1、编写一个查询显示员工总数以及其中在2004年、2005年、2006年和2007年聘用的员工数。创建合适的列标题。(注:分别使用横表、纵表)

横表

横表的目标查询结果如下:

img

这道题我思考了半个多小时。一开始,我的想法是:

select count(employee_id) as "total", 
count(extract(year from hire_date)=2004) as "2004", 
count(extract(year from hire_date)=2005) as "2005", 
count(extract(year from hire_date)=2006) as "2006", 
count(extract(year from hire_date)=2007) as "2007"
from hr.employees;

但是结果是这样的:

image-20210331111143721

于是,我想着先把count去掉,看看每一列的内容究竟是什么。当然,这样的话就得暂时先使用group by了:

select count(employee_id) as "total", 
extract(year from hire_date)=2004 as "2004", 
extract(year from hire_date)=2005 as "2005", 
extract(year from hire_date)=2006 as "2006", 
extract(year from hire_date)=2007 as "2007"
from hr.employees
group by "2004", "2005", "2006", "2007";

显示的结果如下:

image-20210331111435652

因此,我便产生了在count中增加判断结果为true的想法,也就是:

select count(employee_id) as "total", 
count((extract(year from hire_date)=2004)=true) as "2004", 
count((extract(year from hire_date)=2005)=true) as "2005", 
count((extract(year from hire_date)=2006)=true) as "2006", 
count((extract(year from hire_date)=2007)=true) as "2007"
from hr.employees;

但是,结果依然不对!结果所显示的依旧是:

image-20210331111143721

这让我感到十分疑惑。于是,我去查询了count()函数的用法才发现,count()函数不会统计null值,但是会统计 bool 为false的值。这是问题的核心。因此,我需要去手动地把false转为null,因此我决定去使用case...when语法,果然奏效。

最终的sql语句为:

select count(extract(year from hire_date)) as "total",
count(case when extract(year from hire_date)=2004 then 1 else null end) as "2004",
count(case when extract(year from hire_date)=2005 then 1 else null end) as "2005",
count(case when extract(year from hire_date)=2006 then 1 else null end) as "2006",
count(case when extract(year from hire_date)=2007 then 1 else null end) as "2007"
from hr.employees;

结果为:

img

纵表

纵表的目标查询结果为:

image-20210331112816306

这里,我觉得难点是对“总数”这一行的实现。如果没有这里的“总数”的话,我可以这样来实现:

select extract(year from hire_date) as "年份", count(employee_id) as "雇佣员工数"
from hr.employees 
where extract(year from hire_date) in (2004, 2005, 2006, 2007)
group by "年份";

这样写的话“年份”这一列的数据类型就是 double precision。而题目已经提示我说“年份”这一列的数据类型为 text,因此在琢磨一番后,我采用了如下的写法:

select '2004' as "年份", count(*) as "雇佣员工数" from hr.employees where extract(year from hire_date)=2004
UNION ALL
select '2005' as "年份", count(*) as "雇佣员工数" from hr.employees where extract(year from hire_date)=2005
UNION ALL
select '2006' as "年份", count(*) as "雇佣员工数" from hr.employees where extract(year from hire_date)=2006
UNION ALL
select '2007' as "年份", count(*) as "雇佣员工数" from hr.employees where extract(year from hire_date)=2007
UNION ALL
select '总数' as "年份", count(extract(year from hire_date)) as "雇佣员工数" from hr.employees;

我这样处理的思路是:逐行判断,一行一个 where 来约束条件。尤其是 select '总数' as "年份",这让我在年份这列中新建了一个自定义的行。

这样就完美达到了结果:

img

题二:交叉表

2、编写一个矩阵查询显示department_id为20、50、80和90的职务(job_id)、基于department_id的职务平均薪金、以及该职务的总薪金,并为每个列给出合适的标题。

目标查询结果为:

image-20210331113757316

一开始我也有点被吓到了,后来转念一想,这和上面的case...when实际上是一个套路!只不过,这里要注意一下when后的then

我的写法为:

select job_id as "Job", 
round(avg(case department_id when 20 then salary else null end), 2) as "Dept 20",
round(avg(case department_id when 50 then salary else null end), 2) as "Dept 50",
round(avg(case department_id when 80 then salary else null end), 2) as "Dept 80",
round(avg(case department_id when 90 then salary else null end), 2) as "Dept 90",
sum(salary) as "Total"
from hr.employees 
group by job_id;

结果为:

img

完美达到了预期。

评论区
头像
文章目录