Date and time data comes in several varieties
Python has a built-in datetime
module for working with date and time data
>>> from datetime import datetime
>>> date = datetime(year=2020, month=11, day=13)
>>> date
datetime.datetime(2020, 11, 13, 0, 0)
datetime
object has many useful methodsExample: print the day of the week:
>>> date.strftime('%A')
'Friday'
NumPy has a native time series data types: datetime64
and timedelta64
>>> import numpy as np
>>> date = np.array('2020-11-13', dtype=np.datetime64)
>>> date
array('2020-11-13', dtype='datetime64[D]')
Vectorized operations are available for arrays of type datetime64
>>> date + np.arange(12)
array(['2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16',
'2020-11-17', '2020-11-18', '2020-11-19', '2020-11-20',
'2020-11-21', '2020-11-22', '2020-11-23', '2020-11-24'],
dtype='datetime64[D]')
NumPy datetime
and timedelta64
data types are built on a fundamental time unit which imposes a trade-off between time resolution and maximum time span
>>> np.datetime64('2020-11-13')
numpy.datetime64('2020-11-13')
>>> np.datetime64('2020-11-13 12:00')
numpy.datetime64('2020-11-13T12:00')
>>> np.datetime64('2020-11-13 12:00:30.50', 'ns')
numpy.datetime64('2020-11-13T12:00:30.500000000')
Code | Meaning | Relative Time Span | Absolute Time Span |
---|---|---|---|
Y | Year | ± 9.2e18 years | [9.2e18 BC, 9.2e18 AD] |
M | Month | ± 7.6e17 years | [7.6e17 BC, 7.6e17 AD] |
W | Week | ± 1.7e17 years | [1.7e17 BC, 1.7e17 AD] |
D | Day | ± 2.5e16 years | [2.5e16 BC, 2.5e16 AD] |
h | Hour | ± 1.0e15 years | [1.0e15 BC, 1.0e15 AD] |
m | Minute | ± 1.7e13 years | [1.7e13 BC, 1.7e13 AD] |
s | Second | ± 2.9e12 years | [ 2.9e9 BC, 2.9e9 AD] |
ms | Millisecond | ± 2.9e9 years | [ 2.9e6 BC, 2.9e6 AD] |
us | Microsecond | ± 2.9e6 years | [290301 BC, 294241 AD] |
ns | Nanosecond | ± 292 years | [ 1678 AD, 2262 AD] |
ps | Picosecond | ± 106 days | [ 1969 AD, 1970 AD] |
fs | Femtosecond | ± 2.6 hours | [ 1969 AD, 1970 AD] |
as | Attosecond | ± 9.2 seconds | [ 1969 AD, 1970 AD] |
Pandas provides a Timestamp
object which combines the ease-of-use of datetime
with the efficient storage and vectorized interface of numpy.datetime64
Ease of parsing example
>>> import pandas as pd
>>> date = pd.to_datetime('13th of November, 2020')
>>> date
Timestamp('2020-11-13 00:00:00')
>>> date.strftime('%A')
'Friday'
Vectorization example
>>> date + pd.to_timedelta(np.arange(12), 'D')
DatetimeIndex(['2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16',
'2020-11-17', '2020-11-18', '2020-11-19', '2020-11-20',
'2020-11-21', '2020-11-22', '2020-11-23', '2020-11-24'],
dtype='datetime64[ns]', freq=None)
A Series
object can be indexed by time
>>> index = pd.DatetimeIndex(['2020-11-01', '2020-12-01',
... '2021-01-01', '2021-02-01'])
>>> data = pd.Series([0, 1, 2, 3], index=index)
>>> data
2020-11-01 0
2020-12-01 1
2021-01-01 2
2021-02-01 3
dtype: int64
Indexing a time index
>>> data['2020-11-01':'2021-01-01']
2020-11-01 0
2020-12-01 1
2021-01-01 2
dtype: int64
>>> data['2020']
2020-11-01 0
2020-12-01 1
dtype: int64
timestamps: the Timestamp
and DatetimeIndex
time periods: the Period
and PeriodIndex
time deltas or durations: the Timedelta
and TimedeltaIndex
Timestamp
objects are commonly constructed with the to_datetime
function
>>> dates = pd.to_datetime([datetime(2020, 11, 10), '11th of November, 2020',
... '2020-Nov-12', '11-13-2020', '20201114'])
>>> dates
DatetimeIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13',
'2020-11-14'],
dtype='datetime64[ns]', freq=None)
A DatetimeIndex
can be converted to a PeriodIndex
with the to_period
function
>>> dates.to_period('D')
PeriodIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13',
'2020-11-14'],
dtype='period[D]', freq='D')
A TimedeltaIndex
can be created when subtracting one date from another
>>> dates - dates[0]
TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)
date_range
period_range
timedelta_range
date_range
examples
>>> pd.date_range('2020-11-10', '2020-11-14')
DatetimeIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13',
'2020-11-14'],
dtype='datetime64[ns]', freq='D')
>>> pd.date_range('2020-11-10', periods=7)
DatetimeIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13',
'2020-11-14', '2020-11-15', '2020-11-16'],
dtype='datetime64[ns]', freq='D')
period_range
example
>>> pd.period_range('2020-11-10', periods=7, freq='M')
PeriodIndex(['2020-11', '2020-12', '2021-01', '2021-02', '2021-03', '2021-04',
'2021-05'],
dtype='period[M]', freq='M')
timedelta_range
example
>>> pd.timedelta_range(0, periods=10, freq='H')
TimedeltaIndex(['00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00',
'05:00:00', '06:00:00', '07:00:00', '08:00:00', '09:00:00'],
dtype='timedelta64[ns]', freq='H')
Code | Description | Code | Description |
---|---|---|---|
D |
Calendar day | BD |
Business day |
W |
Weekly | ||
M |
Month end | BM |
Business month end |
Q |
Quarter end | BQ |
Business quarter end |
A |
Year end | BA |
Business year end |
H |
Hours | BH |
Business hours |
T |
Minutes | ||
S |
Seconds | ||
L |
Milliseconds | ||
U |
Microseconds | ||
N |
nanoseconds |
The S
suffix will change and end to a start
Code | Description | Code | Description |
---|---|---|---|
MS |
Month start | BMS |
Business month start |
QS |
Quarter start | BQS |
Business quarter start |
AS |
Year start | BAS |
Business year start |
The month used to mark a quarterly or annual can be changed with a three-letter month sequence
Q-JAN
, BQ-FEB
, QS-MAR
, BQS-APR
, etc.A-JAN
, BA-FEB
, QA-MAR
, BQA-APR
, etc.The split-point of a weekly frequency can be modified with a three-letter weekday code
W-SUN
, W-MON
, etcFrequency of 2 hours and 30 minutes
>>> pd.timedelta_range(0, periods=9, freq='2H30T')
TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00',
'12:30:00', '15:00:00', '17:30:00', '20:00:00'],
dtype='timedelta64[ns]', freq='150T')
Business day offsets
>>> pd.date_range('2020-11-13', periods=5, freq=BDay())
DatetimeIndex(['2020-11-13', '2020-11-16', '2020-11-17', '2020-11-18',
'2020-11-19'],
dtype='datetime64[ns]', freq='B')
resample
: data aggregationasfreq
: data selectionshift
: shifts the datatshift
: shifts the indexgroupby
operation on windows of time series data
rolling