Pandas Time Series
Date and Time Data
Date and time data comes in several varieties
- Time stamps: reference particular moments in time
- Time intervals and periods: reference a length of time between a particular beginning and end point
- Time deltas or durations: reference an exact length of time
Dates and Times in Python
Python has a built-in
datetime
module for working with date and time data>>> from datetime import datetime >>> date = datetime(year=2020, month=11, day=13) >>> date datetime.datetime(2020, 11, 13, 0, 0)
- The
datetime
object has many useful methods Example: print the day of the week:
>>> date.strftime('%A') 'Friday'
Dates and Times in NumPy
NumPy has a native time series data types:
datetime64
andtimedelta64
>>> import numpy as np >>> date = np.array('2020-11-13', dtype=np.datetime64) >>> date array('2020-11-13', dtype='datetime64[D]')
Vectorized operations are available for arrays of type
datetime64
>>> date + np.arange(12) array(['2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16', '2020-11-17', '2020-11-18', '2020-11-19', '2020-11-20', '2020-11-21', '2020-11-22', '2020-11-23', '2020-11-24'], dtype='datetime64[D]')
Dates and Times in NumPy
NumPy
datetime
andtimedelta64
data types are built on a fundamental time unit which imposes a trade-off between time resolution and maximum time span>>> np.datetime64('2020-11-13') numpy.datetime64('2020-11-13') >>> np.datetime64('2020-11-13 12:00') numpy.datetime64('2020-11-13T12:00') >>> np.datetime64('2020-11-13 12:00:30.50', 'ns') numpy.datetime64('2020-11-13T12:00:30.500000000')
NumPy Time Resolution Codes
Code | Meaning | Relative Time Span | Absolute Time Span |
---|---|---|---|
Y | Year | ± 9.2e18 years | [9.2e18 BC, 9.2e18 AD] |
M | Month | ± 7.6e17 years | [7.6e17 BC, 7.6e17 AD] |
W | Week | ± 1.7e17 years | [1.7e17 BC, 1.7e17 AD] |
D | Day | ± 2.5e16 years | [2.5e16 BC, 2.5e16 AD] |
h | Hour | ± 1.0e15 years | [1.0e15 BC, 1.0e15 AD] |
m | Minute | ± 1.7e13 years | [1.7e13 BC, 1.7e13 AD] |
s | Second | ± 2.9e12 years | [ 2.9e9 BC, 2.9e9 AD] |
ms | Millisecond | ± 2.9e9 years | [ 2.9e6 BC, 2.9e6 AD] |
us | Microsecond | ± 2.9e6 years | [290301 BC, 294241 AD] |
ns | Nanosecond | ± 292 years | [ 1678 AD, 2262 AD] |
ps | Picosecond | ± 106 days | [ 1969 AD, 1970 AD] |
fs | Femtosecond | ± 2.6 hours | [ 1969 AD, 1970 AD] |
as | Attosecond | ± 9.2 seconds | [ 1969 AD, 1970 AD] |
Dates and Times in Pandas
Pandas provides a
Timestamp
object which combines the ease-of-use ofdatetime
with the efficient storage and vectorized interface ofnumpy.datetime64
Ease of parsing example
>>> import pandas as pd >>> date = pd.to_datetime('13th of November, 2020') >>> date Timestamp('2020-11-13 00:00:00') >>> date.strftime('%A') 'Friday'
Dates and Times in Pandas
Vectorization example
>>> date + pd.to_timedelta(np.arange(12), 'D') DatetimeIndex(['2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16', '2020-11-17', '2020-11-18', '2020-11-19', '2020-11-20', '2020-11-21', '2020-11-22', '2020-11-23', '2020-11-24'], dtype='datetime64[ns]', freq=None)
Pandas Time Series: Indexing by Time
A
Series
object can be indexed by time>>> index = pd.DatetimeIndex(['2020-11-01', '2020-12-01', ... '2021-01-01', '2021-02-01']) >>> data = pd.Series([0, 1, 2, 3], index=index) >>> data 2020-11-01 0 2020-12-01 1 2021-01-01 2 2021-02-01 3 dtype: int64
Pandas Time Series: Indexing by Time
Indexing a time index
>>> data['2020-11-01':'2021-01-01'] 2020-11-01 0 2020-12-01 1 2021-01-01 2 dtype: int64 >>> data['2020'] 2020-11-01 0 2020-12-01 1 dtype: int64
Pandas Time Series Data Structures
timestamps: the
Timestamp
andDatetimeIndex
time periods: the
Period
andPeriodIndex
time deltas or durations: the
Timedelta
andTimedeltaIndex
Pandas Time Series Data Structures
Timestamp
objects are commonly constructed with theto_datetime
function>>> dates = pd.to_datetime([datetime(2020, 11, 10), '11th of November, 2020', ... '2020-Nov-12', '11-13-2020', '20201114']) >>> dates DatetimeIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14'], dtype='datetime64[ns]', freq=None)
Pandas Time Series Data Structures
A
DatetimeIndex
can be converted to aPeriodIndex
with theto_period
function>>> dates.to_period('D') PeriodIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14'], dtype='period[D]', freq='D')
Pandas Time Series Data Structures
A
TimedeltaIndex
can be created when subtracting one date from another>>> dates - dates[0] TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
Regular Sequences
- Pandas has functions for creating regular date sequences:
date_range
period_range
timedelta_range
date_range
examples>>> pd.date_range('2020-11-10', '2020-11-14') DatetimeIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14'], dtype='datetime64[ns]', freq='D') >>> pd.date_range('2020-11-10', periods=7) DatetimeIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16'], dtype='datetime64[ns]', freq='D')
Regular Sequences
period_range
example>>> pd.period_range('2020-11-10', periods=7, freq='M') PeriodIndex(['2020-11', '2020-12', '2021-01', '2021-02', '2021-03', '2021-04', '2021-05'], dtype='period[M]', freq='M')
timedelta_range
example>>> pd.timedelta_range(0, periods=10, freq='H') TimedeltaIndex(['00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00', '05:00:00', '06:00:00', '07:00:00', '08:00:00', '09:00:00'], dtype='timedelta64[ns]', freq='H')
Frequencies and Offsets
Code | Description | Code | Description |
---|---|---|---|
D |
Calendar day | BD |
Business day |
W |
Weekly | ||
M |
Month end | BM |
Business month end |
Q |
Quarter end | BQ |
Business quarter end |
A |
Year end | BA |
Business year end |
H |
Hours | BH |
Business hours |
T |
Minutes | ||
S |
Seconds | ||
L |
Milliseconds | ||
U |
Microseconds | ||
N |
nanoseconds |
Frequencies and Offsets (Continued)
The
S
suffix will change and end to a startCode Description Code Description MS
Month start BMS
Business month start QS
Quarter start BQS
Business quarter start AS
Year start BAS
Business year start The month used to mark a quarterly or annual can be changed with a three-letter month sequence
Q-JAN
,BQ-FEB
,QS-MAR
,BQS-APR
, etc.A-JAN
,BA-FEB
,QA-MAR
,BQA-APR
, etc.
The split-point of a weekly frequency can be modified with a three-letter weekday code
W-SUN
,W-MON
, etc
Frequencies and Offsets Examples
Frequency of 2 hours and 30 minutes
>>> pd.timedelta_range(0, periods=9, freq='2H30T') TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00', '12:30:00', '15:00:00', '17:30:00', '20:00:00'], dtype='timedelta64[ns]', freq='150T')
Business day offsets
>>> pd.date_range('2020-11-13', periods=5, freq=BDay()) DatetimeIndex(['2020-11-13', '2020-11-16', '2020-11-17', '2020-11-18', '2020-11-19'], dtype='datetime64[ns]', freq='B')
Resampling, Shifting, and Windowing
- Time series data can be resampled at a higher or lower frequency
resample
: data aggregationasfreq
: data selection
- Shifting data in time
shift
: shifts the datatshift
: shifts the index
- Rolling statistics are similar to the
groupby
operation on windows of time series datarolling