Improve your time management with Jupyter

Discover how you are spending time by parsing your calendar with Python in Jupyter.
79 readers like this.
Poll: Upcoming open source conferences

by Dafne Cholet. CC BY-SA 2.0.

 

Python has incredibly scalable options for exploring data. With Pandas or Dask, you can scale Jupyter up to big data. But what about small data? Personal data? Private data?

JupyterLab and Jupyter Notebook provide a great environment to scrutinize my laptop-based life.

My exploration is powered by the fact that almost every service I use has a web application programming interface (API). I use many such services: a to-do list, a time tracker, a habit tracker, and more. But there is one that almost everyone uses: a calendar. The same ideas can be applied to other services, but calendars have one cool feature: an open standard that almost all web calendars support: CalDAV.

Parsing your calendar with Python in Jupyter

Most calendars provide a way to export into the CalDAV format. You may need some authentication for accessing this private data. Following your service's instructions should do the trick. How you get the credentials depends on your service, but eventually, you should be able to store them in a file. I store mine in my root directory in a file called .caldav:

import os
with open(os.path.expanduser("~/.caldav")) as fpin:
    username, password = fpin.read().split()

Never put usernames and passwords directly in notebooks! They could easily leak with a stray git push.

The next step is to use the convenient PyPI caldav library. I looked up the CalDAV server for my email service (yours may be different):

import caldav
client = caldav.DAVClient(url="https://caldav.fastmail.com/dav/", username=username, password=password)

CalDAV has a concept called the principal. It is not important to get into right now, except to know it's the thing you use to access the calendars:

principal = client.principal()
calendars = principal.calendars()

Calendars are, literally, all about time. Before accessing events, you need to decide on a time range. One week should be a good default:

from dateutil import tz
import datetime
now = datetime.datetime.now(tz.tzutc())
since = now - datetime.timedelta(days=7)

Most people use more than one calendar, and most people want all their events together. The itertools.chain.from_iterable makes this straightforward:

import itertools

raw_events = list(
    itertools.chain.from_iterable(
        calendar.date_search(start=since, end=now, expand=True) 
        for calendar in calendars
    )
)

Reading all the events into memory is important, and doing so in the API's raw, native format is an important practice. This means that when fine-tuning the parsing, analyzing, and displaying code, there is no need to go back to the API service to refresh the data.

But "raw" is not an understatement. The events come through as strings in a specific format:

print(raw_events[12].data)
    BEGIN:VCALENDAR
    VERSION:2.0
    PRODID:-//CyrusIMAP.org/Cyrus 
     3.3.0-232-g4bdb081-fm-20200825.002-g4bdb081a//EN
    BEGIN:VEVENT
    DTEND:20200825T230000Z
    DTSTAMP:20200825T181915Z
    DTSTART:20200825T220000Z
    SUMMARY:Busy
    UID:
     1302728i-040000008200E00074C5B7101A82E00800000000D939773EA578D601000000000
     000000010000000CD71CC3393651B419E9458134FE840F5
    END:VEVENT
    END:VCALENDAR

Luckily, PyPI comes to the rescue again with another helper library, vobject:

import io
import vobject

def parse_event(raw_event):
    data = raw_event.data
    parsed = vobject.readOne(io.StringIO(data))
    contents = parsed.vevent.contents
    return contents
parse_event(raw_events[12])
    {'dtend': [<DTEND{}2020-08-25 23:00:00+00:00>],
     'dtstamp': [<DTSTAMP{}2020-08-25 18:19:15+00:00>],
     'dtstart': [<DTSTART{}2020-08-25 22:00:00+00:00>],
     'summary': [<SUMMARY{}Busy>],
     'uid': [<UID{}1302728i-040000008200E00074C5B7101A82E00800000000D939773EA578D601000000000000000010000000CD71CC3393651B419E9458134FE840F5>]}

Well, at least it's a little better.

There is still some work to do to convert it to a reasonable Python object. The first step is to have a reasonable Python object. The attrs library provides a nice start:

import attr
from __future__ import annotations
@attr.s(auto_attribs=True, frozen=True)
class Event:
    start: datetime.datetime
    end: datetime.datetime
    timezone: Any
    summary: str

Time to write the conversion code!

The first abstraction gets the value from the parsed dictionary without all the decorations:

def get_piece(contents, name):
    return contents[name][0].value
get_piece(_, "dtstart")
    datetime.datetime(2020, 8, 25, 22, 0, tzinfo=tzutc())

Calendar events always have a start, but they sometimes have an "end" and sometimes a "duration." Some careful parsing logic can harmonize both into the same Python objects:

def from_calendar_event_and_timezone(event, timezone):
    contents = parse_event(event)
    start = get_piece(contents, "dtstart")
    summary = get_piece(contents, "summary")
    try:
        end = get_piece(contents, "dtend")
    except KeyError:
        end = start + get_piece(contents, "duration")
    return Event(start=start, end=end, summary=summary, timezone=timezone)

Since it is useful to have the events in your local time zone rather than UTC, this uses the local timezone:

my_timezone = tz.gettz()
from_calendar_event_and_timezone(raw_events[12], my_timezone)
    Event(start=datetime.datetime(2020, 8, 25, 22, 0, tzinfo=tzutc()), end=datetime.datetime(2020, 8, 25, 23, 0, tzinfo=tzutc()), timezone=tzfile('/etc/localtime'), summary='Busy')

Now that the events are real Python objects, they really should have some additional information. Luckily, it is possible to add methods retroactively to classes.

But figuring which day an event happens is not that obvious. You need the day in the local timezone:

def day(self):
    offset = self.timezone.utcoffset(self.start)
    fixed = self.start + offset
    return fixed.date()
Event.day = property(day)
print(_.day)
    2020-08-25

Events are always represented internally as start/end, but knowing the duration is a useful property. Duration can also be added to the existing class:

def duration(self):
    return self.end - self.start
Event.duration = property(duration)
print(_.duration)
    1:00:00

Now it is time to convert all events into useful Python objects:

all_events = [from_calendar_event_and_timezone(raw_event, my_timezone)
              for raw_event in raw_events]

All-day events are a special case and probably less useful for analyzing life. For now, you can ignore them:

# ignore all-day events
all_events = [event for event in all_events if not type(event.start) == datetime.date] 

Events have a natural order—knowing which one happened first is probably useful for analysis:

all_events.sort(key=lambda ev: ev.start)

Now that the events are sorted, they can be broken into days:

import collections
events_by_day = collections.defaultdict(list)
for event in all_events:
    events_by_day[event.day].append(event)

And with that, you have calendar events with dates, duration, and sequence as Python objects.

Reporting on your life in Python

Now it is time to write reporting code! It is fun to have eye-popping formatting with proper headers, lists, important things in bold, etc.

This means HTML and some HTML templating. I like to use Chameleon:

template_content = """
<html><body>
<div tal:repeat="item items">
<h2 tal:content="item[0]">Day</h2>
<ul>
    <li tal:repeat="event item[1]"><span tal:replace="event">Thing</span></li>
</ul>
</div>
</body></html>"""

One cool feature of Chameleon is that it will render objects using its html method. I will use it in two ways:

  • The summary will be in bold
  • For most events, I will remove the summary (since this is my personal information)
def __html__(self):
    offset = my_timezone.utcoffset(self.start)
    fixed = self.start + offset
    start_str = str(fixed).split("+")[0]
    summary = self.summary
    if summary != "Busy":
        summary = "&lt;REDACTED&gt;"
    return f"<b>{summary[:30]}</b> -- {start_str} ({self.duration})"
Event.__html__ = __html__

In the interest of brevity, the report will be sliced into one day's worth.

import chameleon
from IPython.display import HTML
template = chameleon.PageTemplate(template_content)
html = template(items=itertools.islice(events_by_day.items(), 3, 4))
HTML(html)

When rendered, it will look something like this:

2020-08-25

  • <REDACTED> -- 2020-08-25 08:30:00 (0:45:00)
  • <REDACTED> -- 2020-08-25 10:00:00 (1:00:00)
  • <REDACTED> -- 2020-08-25 11:30:00 (0:30:00)
  • <REDACTED> -- 2020-08-25 13:00:00 (0:25:00)
  • Busy -- 2020-08-25 15:00:00 (1:00:00)
  • <REDACTED> -- 2020-08-25 15:00:00 (1:00:00)
  • <REDACTED> -- 2020-08-25 19:00:00 (1:00:00)
  • <REDACTED> -- 2020-08-25 19:00:12 (1:00:00)

Endless options with Python and Jupyter

This only scratches the surface of what you can do by parsing, analyzing, and reporting on the data that various web services have on you.

Why not try it with your favorite service?

What to read next
Tags
Moshe sitting down, head slightly to the side. His t-shirt has Guardians of the Galaxy silhoutes against a background of sound visualization bars.
Moshe has been involved in the Linux community since 1998, helping in Linux "installation parties". He has been programming Python since 1999, and has contributed to the core Python interpreter. Moshe has been a DevOps/SRE since before those terms existed, caring deeply about software reliability, build reproducibility and other such things.

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.