Automated Refactoring Large Python Codebases
2022-07-15 , Liffey A

Like many companies with multi-million-line Python codebases, Carta has struggled to adopt best practices like Black formatting and type annotation. The extra work needed to do the right thing competes with the almost overwhelming need for new development, and unclear code ownership and lack of insight into the size and scope of type problems add to the burden. We’ve greatly mitigated these problems by building an automated refactoring pipeline that applies Black formatting and backfills missing types via incremental Github pull requests. Our refactor applications use LibCST and MonkeyType to modify the Python syntax tree and use GitPython/PyGithub to create and manage pull requests. It divides changes into small, easily reviewed pull requests and assigns appropriate code owners to review them. After creating and merging more than 3,000 pull requests, we have fully converted our large codebase to Black format and have added type annotations to more than 50,000 functions. In this talk, you’ll learn to use LibCST to build automated refactoring tools that fix general Python code quality issues at scale and how to use GitPython/PyGithub to automate the code review process.
Slides: https://www.slideshare.net/jimmy_lai/europython-2022-automated-refactoring-large-python-codebases


Expected audience expertise: Domain:

some

Expected audience expertise: Python:

some

Abstract as a tweet:

Automated refactoring on large Python codebases for adopting Black formatting and type annotation with incremental changes by using open source LibCST, GitPython and PyGithub.

Jimmy Lai is a Software Engineer in Instagram and Carta Infrastructure. He love Python and like to share his love in tech talks. His recent interest is automated refactoring and his prior sharing topics include profiling, optimization, asyncio and type annotations.