Lauris Jullien

Equipped with a decade of software and data engineering expertise, Lauris empowers teams and organizations to supercharge decision-making with their data. He brings a myriad of battle-tested best-practices and concepts from the software engineering to the data world. His stack includes: machine learning engineering, data engineering, backend engineering, data science, infrastructure engineering, and business intelligence. Previously he led teams working on climate tech at Cervest, logistics at Sennder and tackled ad-tech challenges at Yelp. Currently he is digging deep into the data privacy and compliance area in the LLM paradigm.


Session

07-11
15:30
30min
Caching for Jupyter Notebooks
Lauris Jullien

Caching data and calculation results in jupyter notebooks is a great way to speed up development by making expensive cells easier to re-run.

Data scientists and developers using notebooks on a daily basis, can improve their notebook workflow with low-effort changes in the notebook code, cut the time spent waiting and reduce context switches.

This talk targets developers and data scientist of all experience levels and will cover:

Why caching in notebooks?
Setting up the context in which developers and data scientists use notebooks for exploratory work and how caching is relevant in it.

What is caching
Quick definition of caching, introducing the different types of persistence (in-memory, on disk, database, object storage …), cache invalidation strategies (parameters, code changes, ttl, …), with some cautionary comments about data security when caching protected data.

Caching Techniques
Going through readily available options from the python standard library, and how to use them in notebooks. Introducing a few off-the-shelves options like ipython % magics, and cachetools.
Showcasing how one would build their own mini-caching framework, that fits for their specific use case, using pandas and spark for the example
Explaining when to stop trying to cache, and keeping the caching framework mini, what are the signs that caching went overboard.

PyData: Software Packages & Jupyter
Terrace 2A