Python Linters at Scale
2023-07-19 , Terrace 2B

Black, Flake8, isort, and Mypy are useful Python linters but it’s challenging to use them effectively at scale in the case of multiple codebases, in a large codebase, or with many developers. Linter analysis on large codebases is slow. Linters may slow down developers by asking them to fix trivial issues. Running linters in distributed CI jobs makes it hard to understand the overall developer experience.

In this talk, we'll walk you through solving those scaling problems using a reusable linter framework that releases new linter updates automatically, reuses consistent configurations, runs linters on only updated code to speedup runtime, collects logs and metrics to provide observability, and builds auto fixes for common linter issues. Our linter runs are fast and scalable. Every week, they run 10k times on multiple millions of lines of code in over 25 codebases, generating 25k suggestions for more than 200 developers. Its autofixes also save 20 hours of developer time every week.


Popular Python linters overview and configuration recommendation (6 mins)
- Black, Flake8, isort, mypy
Common practice: linter versions and config files managed by version control (1.5 mins)
- dependency management
- cached dependencies in CI jobs

Scaling Challenges: (5 mins)
- Multiple codebases
1. Inconsistent linter version and configuration
2. Endless effort on upgrading linters and configs
- Large codebases (1 min): Linters run slow on analyzing million lines of code
- Many developers (2 mins): Linter suggestions of trivial issues slow down developers

Solutions for the challenges: (12.5 mins in total)
- The solution for inconsistent version and configuration (2.5 mins)
- Tradeoff between consistent version and flexible versions
- All codebases use the same version and configurations of linters
- Need to fix existing linter errors while upgrading linters
- Reusable workflow
- The solution for slow linter run: (2.5 mins)
- Avoid repeated analysis by caching linter results
- Only run linters on updated files
- The solution for lake of developer experience overview (2.5 mins)
- Collect linter suggestions from CI jobs
- Detect linter errors that are merged to the main branch and find the breaking change easily
- Understand frequent linter suggestions
- The solution for slowing down developers (2.5 mins)
- Build autofixes for frequent linter suggestions: Black and isort formatting, Flake8 unused import, and Mypy missing return type.
- The solution for providing guidance on more best practices (2.5 minutes)
- Build custom linters for best practices using PyGithub and LibCST

Effective linter strategies: (3 mins)
- Use linters to guide developers on best practices
- Convention over configuration
- With linters, developer don’t need to remember too many things
- Use autofix to help developer on fixing issues
- DRY: don’t repeat yourself
- Wth autofixes, developers move fast to focus on developing products

Results and Recap: (2 mins)
- scaled out our linters to run 10k times on 25 codebases for 200 developers and provide 25k suggestions on a weekly basis
- recap the solutions for scaling out linters


Expected audience expertise:

intermediate

Jimmy Lai is a Software Engineer at Carta Infrastructure. He loves Python and likes to share his love in tech talks. His recent interest is linters and his prior sharing topics include profiling, optimization, asyncio, type annotations and automated refactoring.