0%

Pandas 以日期過濾資料

DataFrame

Pandas 的 DataFrame 是很好用的資料處理工具,而且資料科學必須好好學習 pandas
下次應該要好好介紹如何建立 pandas 的環境

時間過濾

花了很多時間學習分析資料,當資料讀進來,我們就需要從大量的資料中選出我們需要處理的部分。
其中我們要依據時間欄位選出特定時間段的資料,這就需要使用到時間過濾了
需要使用到的是 datetime 模組,直接看範例吧

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import pandas as pd
import datetime
df = pd.DataFrame([datetime.datetime(2020, 8, 12, 0, 0),
datetime.datetime(2020, 8, 24, 0, 0),
datetime.datetime(2020, 8, 5, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 24, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 23, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0),
datetime.datetime(2020, 8, 19, 0, 0)],columns=['date'])
print(df)
print(df[df['date']>datetime.datetime(2020,8,19,0,0,0)])

執行結果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
         date
0 2020-08-12
1 2020-08-24
2 2020-08-05
3 2020-08-19
4 2020-08-24
5 2020-08-19
6 2020-08-23
7 2020-08-19
8 2020-08-19
9 2020-08-19
10 2020-08-19
11 2020-08-19
12 2020-08-19
13 2020-08-19
14 2020-08-19
15 2020-08-19
16 2020-08-19
17 2020-08-19
18 2020-08-19
19 2020-08-19
date
1 2020-08-24
4 2020-08-24
6 2020-08-23

用 loc 來過濾

看一下第二種方法

1
2
3
mask = (df['date'] > '2020-08-1') & (df['date'] <= '2020-08-23')
filtered_df=df.loc[mask]
print(filtered_df)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
         date
0 2020-08-12
2 2020-08-05
3 2020-08-19
5 2020-08-19
6 2020-08-23
7 2020-08-19
8 2020-08-19
9 2020-08-19
10 2020-08-19
11 2020-08-19
12 2020-08-19
13 2020-08-19
14 2020-08-19
15 2020-08-19
16 2020-08-19
17 2020-08-19
18 2020-08-19
19 2020-08-19

另一個表式方法

1
2
3
mask = (df['date'] > '2020-08-1') & (df['date'] <= '2020-08-23')
filtered_df=df.loc['2020-08-1':'2020-08-23']
print(filtered_df)