Pandas Time Series

CSC 223 - Advanced Scientific Programming

Date and Time Data

  • Date and time data comes in several varieties

    • Time stamps: reference particular moments in time
    • Time intervals and periods: reference a length of time between a particular beginning and end point
    • Time deltas or durations: reference an exact length of time

Dates and Times in Python

  • Python has a built-in datetime module for working with date and time data

    >>> from datetime import datetime
    >>> date = datetime(year=2020, month=11, day=13)
    >>> date
    datetime.datetime(2020, 11, 13, 0, 0)
  • The datetime object has many useful methods
  • Example: print the day of the week:

    >>> date.strftime('%A')
    'Friday'

Dates and Times in NumPy

  • NumPy has a native time series data types: datetime64 and timedelta64

    >>> import numpy as np
    >>> date = np.array('2020-11-13', dtype=np.datetime64)
    >>> date
    array('2020-11-13', dtype='datetime64[D]')
  • Vectorized operations are available for arrays of type datetime64

    >>> date + np.arange(12)
    array(['2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16',
           '2020-11-17', '2020-11-18', '2020-11-19', '2020-11-20',
           '2020-11-21', '2020-11-22', '2020-11-23', '2020-11-24'],
          dtype='datetime64[D]')

Dates and Times in NumPy

  • NumPy datetime and timedelta64 data types are built on a fundamental time unit which imposes a trade-off between time resolution and maximum time span

    >>> np.datetime64('2020-11-13')
    numpy.datetime64('2020-11-13')
    >>> np.datetime64('2020-11-13 12:00')
    numpy.datetime64('2020-11-13T12:00')
    >>> np.datetime64('2020-11-13 12:00:30.50', 'ns')
    numpy.datetime64('2020-11-13T12:00:30.500000000')

NumPy Time Resolution Codes

Code Meaning Relative Time Span Absolute Time Span
Y Year ± 9.2e18 years [9.2e18 BC, 9.2e18 AD]
M Month ± 7.6e17 years [7.6e17 BC, 7.6e17 AD]
W Week ± 1.7e17 years [1.7e17 BC, 1.7e17 AD]
D Day ± 2.5e16 years [2.5e16 BC, 2.5e16 AD]
h Hour ± 1.0e15 years [1.0e15 BC, 1.0e15 AD]
m Minute ± 1.7e13 years [1.7e13 BC, 1.7e13 AD]
s Second ± 2.9e12 years [ 2.9e9 BC, 2.9e9 AD]
ms Millisecond ± 2.9e9 years [ 2.9e6 BC, 2.9e6 AD]
us Microsecond ± 2.9e6 years [290301 BC, 294241 AD]
ns Nanosecond ± 292 years [ 1678 AD, 2262 AD]
ps Picosecond ± 106 days [ 1969 AD, 1970 AD]
fs Femtosecond ± 2.6 hours [ 1969 AD, 1970 AD]
as Attosecond ± 9.2 seconds [ 1969 AD, 1970 AD]

Dates and Times in Pandas

  • Pandas provides a Timestamp object which combines the ease-of-use of datetime with the efficient storage and vectorized interface of numpy.datetime64

  • Ease of parsing example

    >>> import pandas as pd
    >>> date = pd.to_datetime('13th of November, 2020')
    >>> date
    Timestamp('2020-11-13 00:00:00')
    >>> date.strftime('%A')
    'Friday'

Dates and Times in Pandas

  • Vectorization example

    >>> date + pd.to_timedelta(np.arange(12), 'D')
    DatetimeIndex(['2020-11-13', '2020-11-14', '2020-11-15', '2020-11-16',
                   '2020-11-17', '2020-11-18', '2020-11-19', '2020-11-20',
                   '2020-11-21', '2020-11-22', '2020-11-23', '2020-11-24'],
                  dtype='datetime64[ns]', freq=None)

Pandas Time Series: Indexing by Time

  • A Series object can be indexed by time

    >>> index = pd.DatetimeIndex(['2020-11-01', '2020-12-01',
    ...                           '2021-01-01', '2021-02-01'])
    >>> data = pd.Series([0, 1, 2, 3], index=index)
    >>> data
    2020-11-01    0
    2020-12-01    1
    2021-01-01    2
    2021-02-01    3
    dtype: int64

Pandas Time Series: Indexing by Time

  • Indexing a time index

    >>> data['2020-11-01':'2021-01-01']
    2020-11-01    0
    2020-12-01    1
    2021-01-01    2
    dtype: int64
    >>> data['2020']
    2020-11-01    0
    2020-12-01    1
    dtype: int64

Pandas Time Series Data Structures

  • timestamps: the Timestamp and DatetimeIndex

  • time periods: the Period and PeriodIndex

  • time deltas or durations: the Timedelta and TimedeltaIndex

Pandas Time Series Data Structures

  • Timestamp objects are commonly constructed with the to_datetime function

    >>> dates = pd.to_datetime([datetime(2020, 11, 10), '11th of November, 2020',
    ...                        '2020-Nov-12', '11-13-2020', '20201114'])
    >>> dates
    DatetimeIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13',
                   '2020-11-14'],
                  dtype='datetime64[ns]', freq=None)

Pandas Time Series Data Structures

  • A DatetimeIndex can be converted to a PeriodIndex with the to_period function

    >>> dates.to_period('D')
    PeriodIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13',
                 '2020-11-14'],
                dtype='period[D]', freq='D')

Pandas Time Series Data Structures

  • A TimedeltaIndex can be created when subtracting one date from another

    >>> dates - dates[0]
    TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)

Regular Sequences

  • Pandas has functions for creating regular date sequences:
    • date_range
    • period_range
    • timedelta_range
  • date_range examples

    >>> pd.date_range('2020-11-10', '2020-11-14')
    DatetimeIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13',
                   '2020-11-14'],
                  dtype='datetime64[ns]', freq='D')
    >>> pd.date_range('2020-11-10', periods=7)
    DatetimeIndex(['2020-11-10', '2020-11-11', '2020-11-12', '2020-11-13',
                   '2020-11-14', '2020-11-15', '2020-11-16'],
                  dtype='datetime64[ns]', freq='D')

Regular Sequences

  • period_range example

    >>> pd.period_range('2020-11-10', periods=7, freq='M')
    PeriodIndex(['2020-11', '2020-12', '2021-01', '2021-02', '2021-03', '2021-04',
                 '2021-05'],
                dtype='period[M]', freq='M')
  • timedelta_range example

    >>> pd.timedelta_range(0, periods=10, freq='H')
    TimedeltaIndex(['00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00',
                    '05:00:00', '06:00:00', '07:00:00', '08:00:00', '09:00:00'],
                   dtype='timedelta64[ns]', freq='H')

Frequencies and Offsets

Code Description Code Description
D Calendar day BD Business day
W Weekly
M Month end BM Business month end
Q Quarter end BQ Business quarter end
A Year end BA Business year end
H Hours BH Business hours
T Minutes
S Seconds
L Milliseconds
U Microseconds
N nanoseconds

Frequencies and Offsets (Continued)

  • The S suffix will change and end to a start

    Code Description Code Description
    MS Month start BMS Business month start
    QS Quarter start BQS Business quarter start
    AS Year start BAS Business year start
  • The month used to mark a quarterly or annual can be changed with a three-letter month sequence

    • Q-JAN, BQ-FEB, QS-MAR, BQS-APR, etc.
    • A-JAN, BA-FEB, QA-MAR, BQA-APR, etc.
  • The split-point of a weekly frequency can be modified with a three-letter weekday code

    • W-SUN, W-MON, etc

Frequencies and Offsets Examples

  • Frequency of 2 hours and 30 minutes

    >>> pd.timedelta_range(0, periods=9, freq='2H30T')
    TimedeltaIndex(['00:00:00', '02:30:00', '05:00:00', '07:30:00', '10:00:00',
                    '12:30:00', '15:00:00', '17:30:00', '20:00:00'],
                   dtype='timedelta64[ns]', freq='150T')
  • Business day offsets

    >>> pd.date_range('2020-11-13', periods=5, freq=BDay())
    DatetimeIndex(['2020-11-13', '2020-11-16', '2020-11-17', '2020-11-18',
                   '2020-11-19'],
                  dtype='datetime64[ns]', freq='B')

Resampling, Shifting, and Windowing

  • Time series data can be resampled at a higher or lower frequency
    • resample: data aggregation
    • asfreq: data selection
  • Shifting data in time
    • shift: shifts the data
    • tshift: shifts the index
  • Rolling statistics are similar to the groupby operation on windows of time series data
    • rolling