Prepare for Your AgileEngine Interview with Real Experiences!
View interviewsi
AgileEngine
12 AgileEngine Jobs
AgileEngine - Lead DevOps Engineer - Site Reliability (6-8 yrs)
AgileEngine
posted 3+ weeks ago
Flexible timing
Key skills for the job
Job Title: SRE - DevOps Lead
Job Description :
Responsibilities :
- Lead and mentor a team of SRE and DevOps engineers, fostering a culture of collaboration, ownership, and continuous improvement.
- Architect, design, and implement scalable, reliable, and highly available infrastructure on AWS.
- Manage and maintain our Kubernetes clusters, ensuring optimal performance, security, and resource utilization.
- Develop and maintain infrastructure-as-code (IaC) using Terraform for automated provisioning and management of resources.
- Design, implement, and manage CI/CD pipelines using GitLab for automated builds, testing, and deployments.
- Configure and manage VPN tunnels and SFTP setups for secure data transfer and connectivity.
- Design and implement cloud-based networking solutions, including VPCs, subnets, routing, and security groups.
- Develop and maintain scripting solutions using Shell/Bash (and ideally Python) to automate routine tasks
and system administration.
- Lead incident management processes, including root cause analysis, post-incident reviews, and preventative measures.
- Implement and maintain observability solutions, including monitoring, logging, and alerting, to proactively identify and address system issues.
- Coordinate effectively with cross-functional teams across multiple time zones to ensure smooth operations and project delivery.
- Ensure compliance with relevant industry regulations and standards, including HIPAA and GDPR.
- Train, mentor, and support junior team members, fostering their technical growth and development.
- Drive process improvement initiatives and implement automation strategies to enhance system reliability and operational efficiency.
- Participate in on-call rotations to provide 24/7 support for critical systems (approximately 2-3 days per week, including every other weekend).
- Work a Panama schedule with 8-hour shifts daily.
Requirements :
- 6+ years of experience in Site Reliability Engineering (SRE), DevOps, or infrastructure roles, with increasing levels of responsibility.
- Proven experience in leading distributed engineering or support teams, including performance management, mentoring, and team development.
- Deep knowledge of Amazon Web Services (AWS), including core services such as EC2, S3, RDS, VPC, and IAM.
- Extensive hands-on experience with Terraform for infrastructure provisioning and management.
- Strong proficiency in GitLab, including CI/CD pipeline design, implementation, and maintenance.
- Expertise in Kubernetes, including cluster management, deployment strategies, and troubleshooting.
- Solid understanding of Docker and containerization technologies.
- Practical experience with VPN tunnel configuration, SFTP setup, and cloud-based networking principles and practices.
- Familiarity with scripting languages, particularly Shell/Bash, for system automation and scripting.
- Strong incident management skills, including experience in leading incident response, conducting root cause analysis, and implementing corrective actions.
- Proven experience in implementing and utilizing observability tools and practices for monitoring, logging, and alerting.
- Excellent verbal and written communication skills, with the ability to articulate complex technical issues clearly and concisely.
- Ability to coordinate effectively with teams across multiple time zones and cultural backgrounds.
- Familiarity with working in compliance-heavy environments, with specific experience in HIPAA and GDPR regulations.
- Demonstrated ability to train, mentor, and support junior team members, fostering their technical growth.
- Proven track record of driving process improvement and implementing automation solutions to enhance system reliability and efficiency.
- Willingness to work a Panama schedule with 8-hour shifts daily and participate in on-call rotations (2-3 days per week, including every other weekend).
Nice to Have :
- Familiarity with Bitbucket and Codefresh.
- Knowledge of Ansible for configuration management and automation.
- Familiarity with Python for scripting and automation tasks.
- Previous involvement in healthcare or medical device environments, with an understanding of relevant regulations and best practices.
- Strong understanding of high-availability infrastructure patterns and design principles.
Functional Areas: Other
Read full job descriptionPrepare for Your AgileEngine Interview with Real Experiences!
View interviews6-8 Yrs
Software Configuration Management, DevOps, AWS +6 more
3-8 Yrs
Kolkata, Mumbai, New Delhi +4 more
Manual Testing, Recruitment, Javascript +7 more
3-5 Yrs
Indore
Automation Testing, Javascript, Automation +3 more
5-8 Yrs
₹ 275L/yr - 400L/yr
Indore
Excel, Clinical Data Management, Power Point Presentation +2 more
6-8 Yrs
Javascript, Nestjs, Full Stack +2 more
5-8 Yrs
Cloud Computing, Java, Java Spring Boot +4 more
4-6 Yrs
Linux Administration, VMware, CCNA +6 more
3-6 Yrs
UI and UX, Javascript, HTML +4 more
8-10 Yrs
Javascript, Full Stack, Postgresql +2 more
5-7 Yrs
Clinical Data Management, Data Governance, Data Modeling +1 more