/

Scaling Image Processing for the AI Era: Building Robust Infrastructure to Power Large Language Model Training

Copy Link

Unlock This Lesson

23

min

Scaling Image Processing for the AI Era: Building Robust Infrastructure to Power Large Language Model Training

Scaling Image Processing for the AI Era: Building Robust Infrastructure to Power Large Language Model Training

Scaling Image Processing for the AI Era: Building Robust Infrastructure to Power Large Language Model Training

Scaling Image Processing for the AI Era: Building Robust Infrastructure to Power Large Language Model Training

publish date

Oct 15, 2024

duration

23

min

Difficulty

Intermediate

Beginner

Beginner

Beginner

Case details

Anna will explore the intricacies of developing a robust infrastructure capable of processing web-scaled image data efficiently. Drawing from her experience at Amazon UK, Anna will detail the construction of a data processing pipeline that handles 10 billion images daily, crucial for Amazon LLM training. She highlights the use of PySpark and EMR to achieve unparalleled scalability without much learning curve, Airflow for seamless orchestration, and Nvidia-SMI for monitoring GPU usage. Attendees will gain insights into the technical challenges and solutions involved in building such large-scale systems, along with practical tips for leveraging these tools to optimise their own data processing workflows.

Share case:

About Author

Software Engineer

6 years of industry experience, proficient in backend development of Amazon Fulfilment Technology and LLM Training Infrastructure, expertise in leveraging AWS services for architecting scalable solutions.

Led Development of Large-Scale Data Processing Pipelines

Created High-Performance Image Search Engine

Optimised Automated Metrics Reporting Systems

Expertise in Scalable Solutions

In-Depth Knowledge of Cutting-Edge Technologies

Software Engineer

6 years of industry experience, proficient in backend development of Amazon Fulfilment Technology and LLM Training Infrastructure, expertise in leveraging AWS services for architecting scalable solutions.

Led Development of Large-Scale Data Processing Pipelines

Created High-Performance Image Search Engine

Optimised Automated Metrics Reporting Systems

Expertise in Scalable Solutions

In-Depth Knowledge of Cutting-Edge Technologies

Software Engineer

6 years of industry experience, proficient in backend development of Amazon Fulfilment Technology and LLM Training Infrastructure, expertise in leveraging AWS services for architecting scalable solutions.

Led Development of Large-Scale Data Processing Pipelines

Created High-Performance Image Search Engine

Optimised Automated Metrics Reporting Systems

Expertise in Scalable Solutions

In-Depth Knowledge of Cutting-Edge Technologies

Questions?

Chat with Us!

910 Foulk Road, Suite 201

Wilmington, DE 19803, USA

© 2025 Geekle. All rights reserved.

Questions?

Chat with Us!

910 Foulk Road, Suite 201

Wilmington, DE 19803, USA

© 2025 Geekle. All rights reserved.

Questions?

Chat with Us!

910 Foulk Road, Suite 201

Wilmington, DE 19803, USA

© 2025 Geekle. All rights reserved.

Questions?

Chat with Us!

910 Foulk Road, Suite 201

Wilmington, DE 19803, USA

© 2025 Geekle. All rights reserved.