We are seeking a highly skilled and experienced Lead Cloud Engineer to join our AI team
As the Lead Cloud Engineer, you will be responsible for designing, implementing, and maintaining our cloud infrastructure on AWS and GCP
This is an individual contributor role that requires hands-on expertise in cloud engineering, with a focus on supporting our AI and machine learning initiatives
You will work closely with data scientists, MLOps engineers, and other technical teams to ensure our cloud infrastructure is scalable, secure, and optimized for AI workloads
Primary Responsibilities
Design and implement robust, scalable, and secure cloud architecture on AWS and GCP to support our AI and machine learning platforms
Develop and maintain Infrastructure as Code (IaC) using tools like Terraform or AWS CloudFormation
Implement and manage CI/CD pipelines for cloud infrastructure deployment
Optimize cloud resource utilization and costs while ensuring high performance and reliability
Implement and maintain robust security measures, including identity and access management, encryption, and network security
Design and implement disaster recovery and backup solutions
Collaborate with data scientists and MLOps engineers to provide the necessary cloud infrastructure for AI model development, training, and deployment
Stay up-to-date with the latest AWS services and best practices, particularly those relevant to AI and machine learning workloads
Troubleshoot and resolve complex cloud infrastructure issues
Develop and maintain documentation for cloud architecture, processes, and best practices
Mentor junior engineers and provide technical guidance to the team
Experience And Skills Required
Bachelor's degree in Computer Science, Engineering, or a related field
6+ years of experience in cloud engineering, with at least 5 years of hands-on experience with AWS/GCP
Strong expertise in AWS services, particularly those relevant to AI and machine learning (e g, EC2, S3, ECS, EKS, SageMaker, Lambda)
Proficiency in Infrastructure as Code (IaC) tools, preferably Terraform or AWS CloudFormation
Experience with containerization technologies such as Docker and Kubernetes
Strong scripting skills in languages such as Python, Bash, or PowerShell
Expertise in networking concepts and implementation in cloud environments
In-depth understanding of cloud security best practices and compliance requirements
Experience with CI/CD tools and methodologies
Familiarity with monitoring and logging tools such as CloudWatch, Prometheus, or ELK stack
Strong problem-solving skills and ability to work independently
Excellent communication skills and ability to collaborate with cross-functional teams
AWS certifications (e g, AWS Certified Solutions Architect Professional, AWS Certified DevOps Engineer Professional) are highly desirable
Experience with supporting AI and machine learning workloads in cloud environments is a plus
Knowledge of cost optimization strategies for cloud resources