My name is Franz and I’m an open source and python enthuisiast:
- father of 2 girls and husband
- major in psychology
- chess hobbiyst
- former competitive ultimate frisbee player
- likes cooking and :bread: baking sourdough bread
Currently, SQL and Cloud Data Warehouses (DWH) are extremely popular for good reason. They are great for dashboarding and business intelligence (BI) use cases due to their ease-of-use. However, their combination might not be the best choice for every problem. More precisely, business-critical data pipelines with high complexity might be better suited for frameworks such as Apache Spark which greatly benefit from the tight integration with general purpose languages like Python (e.g., PySpark).
Expect an opinionated comparison between Apache Spark and seemingly easier-to-use cloud native SQL engines. By the end of this talk, you will be challenged to think about why they are complementary and when each has its justification.