publish date
Dec 7, 2022
duration
44
min
Difficulty
Case details
A high-level introduction to data engineering for data scientists. In this fast-paced talk, you’ll learn how adopting data engineering best practices and tools can improve your data science projects and empower you to deliver better, more reliable results in record time. We’ll discuss data architecture and design principles, and explore open source tools you can use today, including: - Running Jupyter notebooks in production using Papermill and nbdev - Improve data quality with Great Expectations, and monitor models with Evidently.ai - Write unit tests for your pandas and Spark dataframes with pandera - Reusable SQL with dbt, an exciting new tool for data transformation that’s transforming data teams. - Workflow orchestration with Apache Airflow, a better approach than fragile and frustrating cron jobs or Lambdas. - Version control your data alongside your code with DVC
Share case: