Datetime Knowledge and Setting

Datetime setting seems to be easy if you just need to click a button in FactSet PA, but to perform a backtesting in python, or on any other data analytical tools you use, grasping the gist of datetime knowledge and knowing how to set them correctly is highly necessary.

First, the ondemand  API is quite handy to obtain a series of weekdays in various locations, defined by exchanges.

fds.ExtractFormulaHistory(‘dummy’,’CONVERT_DATE(DATE(-10Y,0M,M),\’YYYYMMDD\’)’,’-10Y,0M,M’,’cal’,’NAY’)

One more example of format caveat CONVERT_DATEN(P_DATE(NOW,DATEN),”YYYY”)-FF_FOUND_DATE>=100)

Note NAY is the calendar parameter for U.S., FJH for Japan, FCS is for China. For a detailed list, reference to the link: http://fdshelp.factset.com/oa2/View.aspx?i=false&p=true&c=1&node_id=16610&lang=en-US.

There are python codes to grab weekdays or second quarter end each year etc. (to be inserted later).

Once Datetime series is downloaded, the next thing is to parse and organize so they can be used for easily. there is a well-written explanation in stackoverflow: https://stackoverflow.com/questions/311627/how-to-print-date-in-a-regular-format-in-python. The key point is that Python treats dates as an object, not a string, so with the following example, we can see sometimes the mistreatment will throw off error messages.

  • import datetime
  • mylist = []
  • today = datetime.date.today()
  • mylist.append(today)
  • print mylist[0] # print the date object, not the container 😉
  • 20081122 
  • # It’s better to always use str() because : 
  • print “This is a new day : “, mylist[0] # will work
  • This is a new day : 20081122 
  • print “This is a new day : “ + mylist[0] # will crash
  • cannot concatenate ‘str’ and ‘datetime.date’ objects
  • print “This is a new day : “ + str(mylist[0])
  • This is a new day : 20081122

Python provides a bunch of useful codes to manipulate dates, for instance:

#A timeseries with a period of every 72 hours starting with midnight jan 1st, 2011
rng = pd.date_range(‘1/1/2011′, periods=72, freq=’H’)
#index objects with dates
ts = pd.Series(np.random.randn(len(rng)), index=rng)
#change to 45 minute frquency and forward fill
converted = ts.asfreq(’45Min’, method = ‘pad’)
converted
ts.resample(‘D’).mean()

pd.Timestamp(dt.datetime(2012, 5, 1))
pd.Timestamp(‘2012-05-01′)
pd.Timestamp(’31-Dec-2016’)

pd.Period(‘2011-01’)
pd.Period(‘2011-01’, ‘M’)

#datatimeindex and periodindex

dates = [pd.Timestamp(‘2012-05-01’), pd.Timestamp(‘2012-05-02’),
pd.Timestamp(‘2012-05-03’)]
ts = pd.Series(np.random.randn(3), dates)
type(ts.index)
ts.index
ts
periods = [pd.Period(‘2012-01’), pd.Period(‘2012-02’), pd.Period(‘2012-03’)]
ts = pd.Series(np.random.randn(3), periods)
type(ts.index)
Out[19]: pandas.tseries.period.PeriodIndex
ts.index
Out[20]: PeriodIndex([‘2012-01’, ‘2012-02’, ‘2012-03′], dtype=’int64′, freq=’M’)

#convert to timestamps
pd.to_datetime(pd.Series([‘Jul 31, 2009’, ‘2010-01-10’, None])) #use Series to return series/table-look like
pd.to_datetime([‘2005/11/23’, ‘2010.12.31’]) #list converted to a Datetimeindex

pd.to_datetime([’04-01-2012 10:00′], dayfirst=True) #starts with the day
pd.to_datetime([’14-01-2012′, ’01-14-2012′], dayfirst=True) #dayfirst is not strict

#explicitly specifying a format string of ‘%Y%m%d’ takes a faster path
pd.Timestamp(‘2010/11/12’) #Timestamp does not allow passing dayfirst argument

df = pd.DataFrame({‘year’: [2015, 2016],
….: ‘month’: [2, 3],
….: ‘day’: [4, 5],
….: ‘hour’: [2, 3]})
….:
pd.to_datetime(df)
pd.to_datetime(df[[‘year’, ‘month’, ‘day’]])

#pd.to_datetime has standard designation of the datetime component, year month day, or hour, minute, secnod, millisecond, microsecond, nanosecond

#Pass errors=’coerce’ to convert invalid data to NaT (not a time)
pd.to_datetime([‘2009/07/31’, ‘asd’], errors=’raise’)
pd.to_datetime([‘2009/07/31’, ‘asd’], errors=’coerce’)
pd.to_datetime([‘2009/07/31’, ‘asd’], errors=’ignore’)

#conert integer or float epoch times
pd.to_datetime([1349720105, 1349806505, 1349892905,
….: 1349979305, 1350065705], unit=’s’)
….:

pd.to_datetime([1349720105100, 1349720105200, 1349720105300,
….: 1349720105400, 1349720105500 ], unit=’ms’)
….:
Out[32]:
DatetimeIndex([‘2012-10-08 18:15:05.100000’, ‘2012-10-08 18:15:05.200000’,
‘2012-10-08 18:15:05.300000’, ‘2012-10-08 18:15:05.400000’,
‘2012-10-08 18:15:05.500000′],
dtype=’datetime64[ns]’, freq=None)

dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]

# Generating Ranges of Timestamps, use DatetimeIndex or Index #Note the frequency information
index = pd.DatetimeIndex(dates)
Out[37]: DatetimeIndex([‘2012-05-01’, ‘2012-05-02’, ‘2012-05-03′], dtype=’datetime64[ns]’, freq=None)

# Automatically converted to DatetimeIndex
index = pd.Index(dates)
index
Out[39]: DatetimeIndex([‘2012-05-01’, ‘2012-05-02’, ‘2012-05-03′], dtype=’datetime64[ns]’, freq=None)

#use the pandas functions date_range and bdate_range to create timestamp indexes.
#The default frequency for date_range is a calendar day while the default for bdate_range is a business day
index = pd.date_range(‘2000-1-1′, periods=1000, freq=’M’)
index = pd.bdate_range(‘2012-1-1′, periods=250)

start = dt.datetime(2011, 1, 1)
end = dt.datetime(2012, 1, 1)
rng = pd.date_range(start, end)

pd.date_range(start, end, freq=’BM’) #freq can be “W” weekly, BM bimonthly, or set periods = 20
pd.bdate_range(start=start, periods=20)

#tmiestamp limitations Since pandas represents timestamps in nanosecond resolution,
#the timespan that can be represented using a 64-bit integer is limited to approximately 584 years
pd.Timestamp.min
Out[54]: Timestamp(‘1677-09-22 00:12:43.145225’)
pd.Timestamp.max
Out[55]: Timestamp(‘2262-04-11 23:47:16.854775807′)
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts[’10/31/2011′:’12/31/2011’]
ts[‘2011-6’]

dft = pd.DataFrame(np.random.randn(100000,1),
….: columns=[‘A’],
….: index=pd.date_range(‘20130101′,periods=100000,freq=’T’))
….:
dft
dft[‘2013-1-15′:’2013-1-15 12:30:00’]
dft[‘2013’]
dft[‘2013-1′:’2013-2-28 00:00:00’]

dft[‘2013-1-15 12:30:00’] #raise key error for finding a single record/row, instead use .loc
dft.loc[‘2013-1-15 12:30:00’]

#DatetimeIndex Partial String Indexing also works on DataFrames with a MultiIndex.
dft2 = pd.DataFrame(np.random.randn(20, 1),
….: columns=[‘A’],
….: index=pd.MultiIndex.from_product([pd.date_range(‘20130101′,
….: periods=10,
….: freq=’12H’),
….: [‘a’, ‘b’]]))
….:

dft2 = dft2.swaplevel(0, 1).sort_index()
idx = pd.IndexSlice
dft2.loc[idx[:, ‘2013-01-05′], :]
ts.truncate(before=’10/31/2011′, after=’12/31/2011’)
ts[[0, 2, 6]].index

d = dt.datetime(2008, 8, 18, 9, 0)
d = pd.Timestamp(‘20141031’)
d + dt.relativedelta(months=4, days=5)

from pandas.tseries.offsets import *
d + DateOffset(months=-4, days=5)
Out[87]: Timestamp(‘2008-12-23 09:00:00’)

day = Day()
day.apply(pd.Timestamp(‘2014-01-01 09:00’))
Out[95]: Timestamp(‘2014-01-02 09:00:00’)
day = Day(normalize=True)
day.apply(pd.Timestamp(‘2014-01-01 09:00’))
Out[97]: Timestamp(‘2014-01-02 00:00:00′)

#Time series-related instance methods
#shift and lag
ts = ts[:5]
ts.shift(1)
ts.shift(5, freq=pd.datetools.bday)

ts.tshift(5, freq=’D’)#add customized days into day index

#resample is flexible to allow differnet paramaeters
rng = pd.date_range(‘1/1/2012′, periods=100, freq=’S’)
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts.resample(‘5Min’).sum()
ts.resample(‘5Min’).mean()

Append hereto a table for quick reference of formatting:

datetime directive and meanign

 

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.