판다스 Date Time Index #일자 / 시간 데이터 처리법

Pandas에서는 일자와 시간을 처리하기 위한 일자시간타입(datetime64) 데이터를 제공하고있고, 이를 통해 여러 데이터를 한꺼번에 다룰 수 있다.

Date Time Index 사용법.

import pandas as pd
df = pd.DataFrame({'날짜': ['2021-01-10 07:10:00',
                            '2021-02-15 08:20:30',
                            '2021-03-20 09:30:00',
                            '2021-04-25 10:40:30',
                            '2021-05-27 11:50:00',
                            '2021-06-21 12:00:30',
                            '2021-07-01 13:10:00',
                            '2021-08-16 14:50:30']})

# pandas를 import한 뒤 데이터프레임을 만들어준다.

위의 데이터 프레임의 '날짜' 컬럼을 보면 날짜데이터로 보일 수 있지만

사실은 object type이다.

컬럼의 object데이터를 datetime 형식으로 변형시키기 위해 datetime함수를 사용한다.

df['날짜'] = pd.to_datetime(df['날짜'], format='%Y-%m-%d %H:%M:%S', errors='raise')

데이터가 변경된 것을 확인 할 수 있고 , 컬럼의 형식도 datetime으로 변경된 것을 볼 수 있다.

datetime으로 사용할 수 있는 함수들

df['날짜_date']       = df['날짜'].dt.date         # YYYY-MM-DD(문자)
df['날짜_year']       = df['날짜'].dt.year         # 연(4자리숫자)
df['날짜_month']      = df['날짜'].dt.month        # 월(숫자)
df['날짜_month_name'] = df['날짜'].dt.month_name() # 월(문자)
df['날짜_day']        = df['날짜'].dt.day          # 일(숫자)
df['날짜_time']       = df['날짜'].dt.time         # HH:MM:SS(문자)
df['날짜_hour']       = df['날짜'].dt.hour         # 시(숫자)
df['날짜_minute']     = df['날짜'].dt.minute       # 분(숫자)
df['날짜_second']     = df['날짜'].dt.second       # 초(숫자)

df['날짜_quarter']       = df['날짜'].dt.quarter       # 분기(숫자)
df['날짜_weekday']       = df['날짜'].dt.weekday       # 요일숫자(0-월, 1-화) (=dayofweek)
df['날짜_weekofyear']    = df['날짜'].dt.weekofyear    # 연 기준 몇주째(숫자) (=week)
df['날짜_dayofyear']     = df['날짜'].dt.dayofyear     # 연 기준 몇일째(숫자)
df['날짜_days_in_month'] = df['날짜'].dt.days_in_month # 월 일수(숫자) (=daysinmonth)
#df['날짜_weekday_name']  = df['날짜'].dt.weekday_name  # 요일이름(문자) (=day_name())

df['날짜_is_leap_year']     = df['날짜'].dt.is_leap_year     # 윤년 여부
df['날짜_is_month_start']   = df['날짜'].dt.is_month_start   # 월 시작일 여부
df['날짜_is_month_end']     = df['날짜'].dt.is_month_end     # 월 마지막일 여부
df['날짜_is_quarter_start'] = df['날짜'].dt.is_quarter_start # 분기 시작일 여부
df['날짜_is_quarter_end']   = df['날짜'].dt.is_quarter_end   # 분기 마지막일 여부
df['날짜_is_year_start']    = df['날짜'].dt.is_year_start    # 연 시작일 여부
df['날짜_is_year_end']      = df['날짜'].dt.is_year_end      # 연 마지막일 여부

format 파라미터에서 지정한 대표적 시간은 아래와 같다.

%Y: Year, ex) 2019, 2020
%m: Month as a zero-padded, ex) 01~12
%d: Day of the month as a zero-padded ex) 01~31
%H: Hour (24-hour clock) as a zero-padded ex) 01~23
%M: Minute as a zero-padded ex) 00~59
%S: Second as a zero-padded ex) 00~59
ex) 2019-09-01 19:30:00 =(Directivs)=> %Y-%m-%d %H:%M:%S

#추가

'데이터 사이언스 스쿨'에서 시계열 자료다루기에 대한 자세한 자료를 뒤늦게 찾아보았다.

더 유용하고 편리한 기능에 대한 설명이 잘되어있는데, 따라해보겠다.

pd.date_range 함수를 쓰면 모든 날짜/시간을 일일히 입력할 필요없이 시작일과 종료일 또는 시작일과 기간을 입력하면 범위 내의 인덱스를 생성해 준다.

pd.date_range("2018-4-1", "2018-4-30")

pd.date_range("2018-4-1", periods=30)

freq 인수로 특정한 날짜만 생성되도록 할 수도 있다. 많이 사용되는 freq 인수값은 다음과 같다.

-s: 초
-T: 분
-H: 시간
-D: 일(day)
-B: 주말이 아닌 평일
-W: 주(일요일)
-W-MON: 주(월요일)
-M: 각 달(month)의 마지막 날
-MS: 각 달의 첫날
-BM: 주말이 아닌 평일 중에서 각 달의 마지막 날
-BMS: 주말이 아닌 평일 중에서 각 달의 첫날
-WOM-2THU: 각 달의 두번째 목요일
-Q-JAN: 각 분기의 첫달의 마지막 날
-Q-DEC: 각 분기의 마지막 달의 마지막 날

#참고 - https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects

사용시 - pd.date_range("2018-4-1", "2018-4-30",freq ='B')

resample 연산

resample 연산을 쓰면 시간 간격을 재조정하는 리샘플링(resampling)이 가능하다. 이 때 시간 구간이 작아지면 데이터 양이 증가한다고 해서 업-샘플링(up-sampling)이라 하고 시간 구간이 커지면 데이터 양이 감소한다고 해서 다운-샘플링(down-sampling)이라 부른다.

ts = pd.Series(np.random.randn(100), index=pd.date_range(
    "2018-1-1", periods=100, freq="D"))
ts.tail(20)

다운-샘플링의 경우에는 원래의 데이터가 그룹으로 묶이기 때문에 그룹바이(groupby)때와 같이 그룹 연산을 해서 대표값을 구해야 한다.

ts.resample('W').mean()

datascienceschool.net/01%20python/04.08%20%EC%8B%9C%EA%B3%84%EC%97%B4%20%EC%9E%90%EB%A3%8C%20%EB%8B%A4%EB%A3%A8%EA%B8%B0.html

4.8 시계열 자료 다루기 — 데이터 사이언스 스쿨

resample 연산 resample 연산을 쓰면 시간 간격을 재조정하는 리샘플링(resampling)이 가능하다. 이 때 시간 구간이 작아지면 데이터 양이 증가한다고 해서 업-샘플링(up-sampling)이라 하고 시간 구간이 커

datascienceschool.net

docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

datetime — Basic date and time types — Python 3.9.2 documentation

datetime — Basic date and time types Source code: Lib/datetime.py The datetime module supplies classes for manipulating dates and times. While date and time arithmetic is supported, the focus of the implementation is on efficient attribute extraction for

docs.python.org

'Tensorflow' 카테고리의 다른 글

Transfer Learning / Fine Tuning (0)	2021.03.12
Model Checkpoint와 CSVLogger (0)	2021.03.12
타임시리즈 데이터 분석을 위한 Prophet (0)	2021.03.04
이미지 데이터 제너레이터가 하는 역할과 코드 (0)	2021.03.03
callback 함수 (0)	2021.03.02

티스토리툴바

판다스 Date Time Index #일자 / 시간 데이터 처리법

'Tensorflow' 카테고리의 다른 글

티스토리툴바