2024-07-10 –, North Hall
Ever felt like you’re navigating a data jungle, battling to survive the unexpected production problems that throw you off track? Well, you’re not alone. Staying on top of your data's health is not just smart – it's crucial. In this talk, I will share some Python tricks (methods and libraries) that you can use to defend from those wild data problems. Because let's face it, being able to effectively monitor your data, spot sneaky anomalies, and get to the bottom of them is the key to unlocking a buried treasure.
First, I'll take you through the ins and outs of observability, highlighting its importance for managing both the inputs and outputs of machine learning models, as well as for overall data quality. We'll explore a range of techniques to detect anomalies, with a focus on multivariate time series data. We'll also cover how we can keep this process as computationally efficient as possible.
But we won't stop at just finding these anomalies: we're on a mission to chase them down to their lair! The second part of the talk will equip you with the detective skills to perform root cause analysis and extract as much insights as possible. These discoveries can be an eye opener and the first step towards new projects and strategies. Next, we will also tackle distinguishing real anomalies from data evolution (or drift) and set up effective monitoring strategies to keep your data clean and insightful.
If your interests lie in machine learning or you're simply keen on data quality, join me as we set off to unravel the mysteries of data observability. Let's learn how to keep data problems in check and when life gives you anomalies, turn them into business opportunities!
Intermediate
Madalina Ciortan is Head of Data Science at Kiwi.com. Her academic journey includes a degree in engineering, a master's in computer science, a post-master's in bioinformatics and doctorate in data science. With close to two decades of professional experience, she navigated roles in software development, architecture, applied data science, and research. Her expertise spans a wide array of domains, from computer vision and natural language processing to time series analysis, unsupervised analysis, self-supervised learning within the industry.