SQL 5주차

내일배움캠프_QAQC 1기/SQL

SQL 5주차

이지응:) 2025. 1. 17. 15:03

5주차 목표

데이터에서 예상하지 못한 값이 나왔을 때 (이상한 값, 값이 없음 등), 분석에 적절하에 처리한다
SQL 로 엑셀에서 자주 사용하는 형태로 데이터를 만든다
업무에 활용할 수 있는 다양한 SQL 심화 문법을 익힌다

Subquery, Join 복습

Subquery

Query 결과를 Query 에 다시 활용하는 것
기본 형식

select column1, special_column
from
    ( /* subquery */
    select column1, column2 special_column
    from table1
    ) a

JOIN

두 개 이상의 테이블을 결합하여 사용하는 것
JOIN 의 형태에 따라 Left join, Inner join 등이 있음
기본 형식

-- LEFT JOIN
select 조회 할 컬럼
from 테이블1 a left join 테이블2 b on a.공통컬럼명=b.공통컬럼명

-- INNER JOIN
select 조회 할 컬럼
from 테이블1 a inner join 테이블2 b on a.공통컬럼명=b.공통컬럼명

조회한 데이터에 값이 없는 경우

1. 테이블에 잘못된 값이 들어간 경우

2. JOIN을 했을 때 값이 없는 경우

해결 방법

1. 없는 갓을 제외해주기

Mysql 에서는 사용할 수 없는 값일 때 해당 값을 연산에서 제외해줍니다. → 0으로 간주
즉 평균 rating 을 구하는 쿼리를 아래와 같이 작성했을 때 실제 연산에 사용되는 데이터는 다음과 같습니다.

select restaurant_name,
       avg(rating) average_of_rating,
       avg(if(rating<>'Not given', rating, null)) average_of_rating2
from food_orders
group by 1

EX) 값 제외

JOIN을 사용했을 때 없는 값인 경우 NULL로 표시

select a.order_id,
       a.customer_id,
       a.restaurant_name,
       a.price,
       b.name,
       b.age,
       b.gender
from food_orders a left join customers b on a.customer_id=b.customer_id
where b.customer_id is not null

WHERE 절에서 NULL인 값 다 제외

2. 다른 값을 대신 사용하기

사용할 수 없는 값 대신 다른 값을 대체해서 사용하는 방법이 있습니다.
데이터 분석 시에는 평균값 혹은 중앙값 등 대표값을 이용하여 대체해주기도 합니다.
다른 값으로 변경하고 싶을 때, 다음 두 개의 문법을 이용할 수 있습니다.
- 다른 값이 있을 때 조건문 이용하기 : if(rating>=1, rating, 대체값)
- null 값일 때 : coalesce(age, 대체값)

EX) null 을 다른 값으로 대체한 쿼리문

customer 테이블에 없는 데이터 중에 age 만 20으로 채움

select a.order_id,
       a.customer_id,
       a.restaurant_name,
       a.price,
       b.name,
       b.age,
       coalesce(b.age, 20) "null 제거",
       b.gender
from food_orders a left join customers b on a.customer_id=b.customer_id
where b.age is null

WHERE 절에서 NULL 값을 가지고 있는 데이터를 다 뽑음

상식적이지 않은 데이터

주문 고객의 나이
- 보통 음식을 주문한 고객은 20세 이상인 성인인 경우가 많음
- 하지만 데이터를 보면 2세와 같이 상식적이지 않은 값들이 있는 경우가 있음
결제 일자
- 결제의 경우, 비교적 최근인 일자가 있어야 함
- 하지만, 데이터를 보면 1970년대와 같이 상식적이지 않은 값들이 있는 경우가 있음

해결 방법

조건문으로 값의 범위를 지정하기

조건문으로 가장 큰 값, 가장 작은 값의 범위를 지정해 줄 수 있음

select customer_id, name, email, gender, age,
       case when age<15 then 15
            when age>80 then 80
            else age end "범위를 지정해준 age"
from customers

SQL로 Pivot Table 만들기

Pivot table

2개 이상의 기준으로 데이터를 집계할 때, 보기 쉽게 배열하여 보여주는 것

Pivot table 의 기본 구조

EX) 음식점별 시간별 주문건수 Pivot Table 뷰 만들기 (15~20시 사이, 20시 주문건수 기준 내림차순)

select restaurant_name,
       max(if(hh='15', cnt_order, 0)) "15",
       max(if(hh='16', cnt_order, 0)) "16",
       max(if(hh='17', cnt_order, 0)) "17",
       max(if(hh='18', cnt_order, 0)) "18",
       max(if(hh='19', cnt_order, 0)) "19",
       max(if(hh='20', cnt_order, 0)) "20"
from 
(
select a.restaurant_name,
       substring(b.time, 1, 2) hh,
       count(1) cnt_order
from food_orders a inner join payments b on a.order_id=b.order_id
where substring(b.time, 1, 2) between 15 and 20
group by 1, 2
) a
group by 1
order by 7 desc

EX) 성별, 연령별 주문건수 Pivot Table 뷰 만들기 (나이는 10~59세 사이, 연령 순으로 내림차순)

select age,
       max(if(gender='male', order_count, 0)) male,
       max(if(gender='female', order_count, 0)) female
from 
(
select b.gender,
       case when age between 10 and 19 then 10
            when age between 20 and 29 then 20
            when age between 30 and 39 then 30
            when age between 40 and 49 then 40
            when age between 50 and 59 then 50 end age,
       count(1) order_count
from food_orders a inner join customers b on a.customer_id=b.customer_id
where b.age between 10 and 59
group by 1, 2
) t
group by 1
order by age

시간 단축 문법 : Window Function - RANK, SUM

Window Function

각 행의 관계를 정의하기 위한 함수로 그룹 내의 연산을 쉽게 만들어줌

기본 구조

window_function(argument) over (partition by 그룹 기준 컬럼 order by 정렬 기준)

window_function : 기능 명을 사용해줍니다. (sum, avg 와 같이 기능명이 있습니다)
argument : 함수에 따라 작성하거나 생략합니다.
partition by : 그룹을 나누기 위한 기준입니다. group by 절과 유사하다고 생각해주시면 됩니다.
order by : window function 을 적용할 때 정렬 할 컬럼 기준을 적어줍니다.

Rank

특정 기준으로 순위를 매겨주는 기능

EX) 음식 타입별로 주문 건수가 가장 많은 상점 3개씩 조회하기

select cuisine_type,
       restaurant_name,
       order_count,
       rn "순위"
from
(
select cuisine_type,
       restaurant_name,
       rank() over (partition by cuisine_type order by order_count desc) rn,
       order_count
from
(
select cuisine_type, restaurant_name, count(1) order_count
from food_orders
group by 1, 2
) a
) b
where rn<=3
order by 1, 4

Sum

누적합이 필요하거나 카테고리별 합계컬럼와 원본 컬럼을 함께 이용할 때 유용하게 사용

EX) 각 음식점의 주문건이 해당 음식 타입에서 차지하는 비율을 구하고, 주문건이 낮은 순으로 정렬했을 때 누적 합 구하기

select cuisine_type,
       restaurant_name,
       cnt_order,
       sum(cnt_order) over (partition by cuisine_type) sum_cuisine,
       sum(cnt_order) over (partition by cuisine_type order by cnt_order) cum_cuisine
from
(
select cuisine_type, 
	restaurant_name, 
	count(1) cnt_order
from food_orders
group by 1, 2
) a
order by cuisine_type , cnt_order

여기서 order_count가 동일한 값을 일괄로 더해 cum_sum으로 출력되는 문제가 발생함
order by 절에 cnt_order 외에 추가적인 순서를 부여할 수 있는 restaurant_name을 포함시켜야 함

select cuisine_type,
       restaurant_name,
       cnt_order,
       sum(cnt_order) over (partition by cuisine_type) sum_cuisine,
       sum(cnt_order) over (partition by cuisine_type order by cnt_order) cum_cuisine
from
(
select cuisine_type, 
	restaurant_name, 
	count(1) cnt_order
from food_orders
group by 1, 2
) a
order by cuisine_type , cnt_order

날짜 포맷

yyyy-mm-dd 형식의 컬럼을 date type 으로 변경하기

select date(date) date_type,
       date
from payments

괄호 안은 칼럼명

date type 을 date_format 을 이용하여 년, 월, 일, 주 로 조회

select date(date) date_type,
       date_format(date(date), '%Y') "년",
       date_format(date(date), '%m') "월",
       date_format(date(date), '%d') "일",
       date_format(date(date), '%w') "요일"
from payments

일요일은 0, 월요일은 1 등 차례대로 숫자 부여됨

년도, 월을 포함하여 데이터 가공하기

select date_format(date(date), '%Y') y,
       date_format(date(date), '%m') m,
       order_id
from food_orders a inner join payments b on a.order_id=b.order_id

3월 조건으로 지정하고, 년도별로 정렬하기

select date_format(date(date), '%Y') "년",
       date_format(date(date), '%m') "월",
       date_format(date(data), 'Y%m') "년월",
       count(1) "주문건수"
from food_orders a inner join payments b on a.order_id=b.order_id
where date_format(date(date), '%m')='03'
group by 1, 2
order by 1

숙제

음식 타입별, 연령별 주문건수 pivot view 만들기

select cuisine_type,
       max(if(age=10, order_count, 0)) "10대",
       max(if(age=20, order_count, 0)) "20대",
       max(if(age=30, order_count, 0)) "30대",
       max(if(age=40, order_count, 0)) "40대",
       max(if(age=50, order_count, 0)) "50대"
from 
(
select a.cuisine_type,
       case when age between 10 and 19 then 10
            when age between 20 and 29 then 20
            when age between 30 and 39 then 30
            when age between 40 and 49 then 40
            when age between 50 and 59 then 50 end age,
       count(1) order_count
from food_orders a inner join customers b on a.customer_id=b.customer_id
where age between 10 and 59
group by 1, 2
) t
group by 1

저작자표시 비영리 변경금지 (새창열림)

'내일배움캠프_QAQC 1기 > SQL' 카테고리의 다른 글

SQL 4주차 (0)	2025.01.16
SQL 3주차 (0)	2025.01.16
SQL 2주차 (0)	2025.01.16
SQL_1주차 (0)	2025.01.15

현재글SQL 5주차

이지은님의 블로그

Today :
Yesterday :

이지은님의 블로그

SQL 5주차

Subquery, Join 복습

조회한 데이터에 값이 없는 경우

상식적이지 않은 데이터

SQL로 Pivot Table 만들기

시간 단축 문법 : Window Function - RANK, SUM

날짜 포맷

숙제

'내일배움캠프_QAQC 1기 > SQL' 카테고리의 다른 글

'내일배움캠프_QAQC 1기/SQL'의 다른글

티스토리툴바

« 2026/06 »
일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

SQL 5주차

Subquery, Join 복습

조회한 데이터에 값이 없는 경우

상식적이지 않은 데이터

SQL로 Pivot Table 만들기

시간 단축 문법 : Window Function - RANK, SUM

날짜 포맷

숙제

'내일배움캠프_QAQC 1기 > SQL' 카테고리의 다른 글

'내일배움캠프_QAQC 1기/SQL'의 다른글

관련글

티스토리툴바