From Jupyter Notebooks to a Python Package: The Best of Both Worlds
2023-07-19 , Terrace 2A

A Jupyter notebook is quite handy for rapid REPL (Read-Eval-Print-Loop) style tasks
such as exploratory data analysis and data science. However, we would feel
deficiencies in proper SW engineering supports at some point as the notebook grows
to have larger and more complicated code. It is because the Jupyter notebook lacks
several important features including code sharing, refactoring support,
version control and advanced editing. Fortunately, traditional full-fledged IDEs,
such as VS Code or PyCharm, are available at hand and they support
these lacking features very well.
Then, why don’t we take advantage of the best of both worlds?

In this beginner-level hands-on talk, I will demonstrate how to
transform Jupyter notebook workflows to a proper Python package using VS Code.
I will also introduce several basic but essential refactoring recommendations.
By doing so, you can use the package for several notebooks
and even share with your colleagues and friends.


Notebooks, code and slides are available in this repo.

Introduction

  • Jupyter Notebook
  • Provides ideal workflows for data science
  • Pros: REPL, interactivity, integration of code / output / documentation,
    visualization, rapid prototyping, result sharing, etc.
  • Cons: lacks of debugging, code sharing, refactoring, version control,
    advanced editing, etc.
  • Full-fledged IDEs
  • Designed to maximize programmer productivity
  • One iteration might take a long journey
  • We can benefit from the best of both worlds
    by using a Python package

Jupyter Notebook Data Science Workflow

  • Data Loading
  • Preprocessing
  • Exploratory Data Analysis (EDA)
  • Prediction

To (Your Own) Python Package

  • What is a package and why do we want to use it?
  • How to create a (minimal) package
  • How to import and use
  • Live refactoring examples

Wrap up / Some tips

  • Publish your awesome package
  • PyScaffold
  • VS Code

Expected audience expertise:

beginner

See also:

I am working at Safran as a research engineer. My major responsibility in the company is analyzing data obtained from airplanes and helicopters using various statistical models and machine-learning algorithms.
Formerly, I worked at Samsung Electronics in South Korea for 3 years as a senior engineer. At Samsung, I have developed various computer networking related algorithms and software for smartphones and IoT devices to improve user experiences.
Before joining Samsung, I finished my Ph.D. at Pohang University of Science and Technology (POSTECH) in South Korea. The theme of my thesis was "Traffic Engineering in Data Center Networks using Software Defined Networking."