Jonathan Hollenbeck

Jonathan Hollenbeck is a Senior Software Engineer at Bloomberg for the ESG (Environmental, Social & Governance) team, where he delivers performance & reliability improvements for financial models and data transformations. He has five years of experience in scientific computing, using Python, Julia, and R. After receiving a bachelor’s degree in computer science from UC Davis, he started his software engineering career at Learning at Cisco. During his early career, he integrated content development workflows and learner telemetry into a highly reliable, near-real-time database system with 1,100 internal users. He then took on a strategic role, supporting cross-org learning integration and driving inception + productionalization of ML/AI training projects, one of which achieved CEO recognition & a partnership with the White House. During this time, he earned a master’s degree in Computational & Applied Mathematics from Stanford University, with a focus on Machine Learning ethics.


Session

07-11
16:05
45min
How we used vectorization for 1000x Python speedups (no C or Spark needed!)
Justine Wezenaar, Jonathan Hollenbeck

Want to make all your code faster? With matrices, library knowledge, and a sprinkle of creativity, you can consistently speed up multivariate Python functions by 1000x!

Modal optimization requires simple axioms - arithmetic, checking a case, calling the right sklearn function, and so on. When that’s not sufficient, three core tricks - converting conditional logic to set theory, stacking vectors into a matrix, and shaping data to match library expectations - cover the vast majority of real world cases (90% of the ~400 functions we vectorized).

At Bloomberg, ESG (Environmental, Social, and Governance) Scores require complex computations on large data sets. Time-series computations are fundamental for Governance - one UDF infers board support for a policy from prior cyclical votes and other time offset inputs. By rewriting the pandas backfill as a series of reductions on a 4-tensor, we reduced the runtime from 45 minutes to 10 milliseconds! Analogously, due to real world complexity, finance UDFs can end up with 100+ if/else branches in one function. With a mix of De Morgan’s laws and sparse matrix representations, we simplified the cases and achieved 1000x+ speedups.

We’ll conclude with a quick overview of cutting-edge tools, and hope you’ll leave with a concrete strategy for vectorizing financial models!

PyData: Machine Learning, Stats
Forum Hall