publish date
Oct 15, 2024
duration
23
min
Difficulty
Case details
Anna will explore the intricacies of developing a robust infrastructure capable of processing web-scaled image data efficiently. Drawing from her experience at Amazon UK, Anna will detail the construction of a data processing pipeline that handles 10 billion images daily, crucial for Amazon LLM training. She highlights the use of PySpark and EMR to achieve unparalleled scalability without much learning curve, Airflow for seamless orchestration, and Nvidia-SMI for monitoring GPU usage. Attendees will gain insights into the technical challenges and solutions involved in building such large-scale systems, along with practical tips for leveraging these tools to optimise their own data processing workflows.
Share case:
About Author