Upload Button Icon Add office photos
filter salaries All Filters

69 Pylon Management Consulting Jobs

Principal Site Reliability Engineer - Kubernetes/Docker (9-14 yrs)

9-14 years

Principal Site Reliability Engineer - Kubernetes/Docker (9-14 yrs)

Pylon Management Consulting

posted 3d ago

Job Description

About the Role :

We are seeking a highly experienced and visionary Principal Site Reliability Engineer (SRE) to lead our efforts in ensuring the reliability, scalability, and performance of our critical systems. In this role, you will be a technical leader, driving the adoption of SRE principles and practices across the organization. You will be responsible for designing and implementing robust infrastructure, automation, and monitoring solutions to maintain high availability and optimize system performance.

Responsibilities :

SRE Leadership & Strategy :

- Develop and implement SRE strategies and best practices to improve system reliability and performance.

- Lead the design and implementation of highly available and scalable infrastructure solutions.

- Define and enforce service level objectives (SLOs), service level indicators (SLIs), and service level agreements (SLAs).

- Champion a culture of observability, automation, and continuous improvement.

Infrastructure Design & Automation :

- Design and implement infrastructure-as-code (IaC) using tools like Terraform, CloudFormation, or Ansible.

- Architect and manage container orchestration platforms (Kubernetes, Docker Swarm).

- Build and maintain CI/CD pipelines for automated deployments.

- Implement and manage configuration management systems.

Monitoring & Observability :

- Design and implement comprehensive monitoring and logging solutions using tools like Prometheus, Grafana, ELK stack, or Datadog.

- Develop and maintain alerting and incident response procedures.

- Analyze metrics and logs to identify performance bottlenecks and potential issues.

- Implement distributed tracing to understand system behavior.

Incident Management & Response :

- Lead incident response efforts, ensuring timely resolution of critical issues.

- Conduct post-incident reviews to identify root causes and implement preventive measures.

- Develop and maintain runbooks and playbooks for incident response.

- Drive improvements in incident management processes.

Performance Optimization & Capacity Planning :

- Identify and resolve performance bottlenecks through profiling, tracing, and optimization.

- Conduct capacity planning and forecasting to ensure system scalability.

- Optimize resource utilization and reduce operational costs.

Security & Compliance :

- Implement and maintain security best practices across the infrastructure.

- Ensure compliance with relevant industry standards and regulations.

- Conduct security audits and vulnerability assessments.

Mentoring & Knowledge Sharing :

- Mentor and guide junior SREs, fostering a culture of learning and growth.

- Share knowledge and best practices through documentation, presentations, and training sessions.

- Act as a technical leader and subject matter expert.

Technical Skills :

Cloud Platforms :

- Deep expertise in at least one major cloud platform (AWS, Azure, GCP).

- Experience with cloud-native technologies and services.

Containerization & Orchestration :

- Expert-level knowledge of Docker and Kubernetes.

- Experience with container registry services.

Infrastructure as Code (IaC) :

- Proficiency in Terraform, CloudFormation, or Ansible.

CI/CD Tools :

- Experience with Jenkins, GitLab CI, CircleCI, or similar tools.

Monitoring & Logging :

- Expertise in Prometheus, Grafana, ELK stack, Datadog, or similar tools.

Scripting & Automation :

- Strong scripting skills in Python, Bash, or Go.

Operating Systems :

- Expert-level knowledge of Linux system administration.

Networking :

- Deep understanding of networking concepts and protocols (TCP/IP, DNS, HTTP, etc.).

Security :

- Strong understanding of security best practices and tools.

Databases :

- Experience with relational and NoSQL databases.

Distributed Systems :

- Understanding of distributed system principles and architectures.

Qualifications :

- Experience : 9-14 years of experience in Site Reliability Engineering or a related field.

- Education : Bachelor's degree in Computer Science, Software Engineering, or a related field.

- Certifications : Cloud certifications (e.g., AWS Certified DevOps Engineer, Google Cloud Certified Professional DevOps Engineer) are highly desirable.

Soft Skills :

- Exceptional problem-solving and analytical skills.

- Strong communication and interpersonal skills.

- Excellent leadership and mentoring abilities.

- Ability to work effectively in a fast-paced environment.

- Strong sense of ownership and accountability.

- Ability to think strategically and drive innovation.

Benefits :

- Competitive salary and benefits package.

- Opportunity to work on cutting-edge technologies and challenging problems.

- Collaborative and supportive work environment.

- Opportunities for professional development and growth.

- Chance to make a significant impact on the reliability and performance of critical systems


Functional Areas: Other

Read full job description

Pylon Management Consulting Interview Questions & Tips

Prepare for Pylon Management Consulting roles with real interview advice

What people at Pylon Management Consulting are saying

What Pylon Management Consulting employees are saying about work life

based on 41 employees
63%
67%
91%
100%
Strict timing
Monday to Friday
No travel
Day Shift
View more insights

Pylon Management Consulting Benefits

Work From Home
Team Outings
Job Training
Soft Skill Training
Cafeteria
Free Food +6 more
View more benefits

Compare Pylon Management Consulting with

McKinsey & Company

3.8
Compare

BCG

3.7
Compare

KPMG India

3.5
Compare

Deloitte

3.8
Compare

Ernst & Young

3.4
Compare

PwC

3.4
Compare

Accenture

3.8
Compare

Bain & Company

3.8
Compare

Capgemini

3.7
Compare

IBM

4.0
Compare

Vision India Services

3.6
Compare

Gi Group

3.9
Compare

Ven Consulting

3.5
Compare

Verifacts Services

3.6
Compare

Xeam Ventures

3.6
Compare

Adhaan Solutions

3.9
Compare

Mount Talent Consulting

3.3
Compare

Talent Corner HR Services

3.9
Compare

Million Minds Management Services

3.8
Compare

Integrated Resources

2.8
Compare

Similar Jobs for you

Site Reliability Engineer Lead at New Age Consulting

8-15 Yrs

₹ 10-35 LPA

Principal at Global Consultants Inc

9-13 Yrs

₹ 10-40 LPA

Tower Lead at Growel Softech Pvt. Ltd.

10-12 Yrs

₹ 24-26 LPA

Principal Engineer at Signeasy

10-12 Yrs

₹ 21-30 LPA

Site Reliability Engineer Lead at Factset

7-10 Yrs

₹ 25-30 LPA

Lead DevOps Engineer at ZopSmart

6-10 Yrs

₹ 18-24 LPA

Gcp Architect at TekPillar

10-15 Yrs

₹ 40-60 LPA

Staff DevOps Engineer at Black Duck

9-11 Yrs

₹ 26-32 LPA

Software Python Engineer at Gloinnt Solutions Pvt. Ltd.

6-8 Yrs

₹ 18-22 LPA

DevOps Lead at GAMIFi Consulting Services Pvt Ltd

7-10 Yrs

₹ 50-70 LPA

Tableau Developer (4-6 yrs)

4-6 Yrs

1d ago·via hirist.com

Lead Engineer (8-13 yrs)

8-13 Yrs

2d ago·via hirist.com

Pricing & Analytics Role - Automotive (4-8 yrs)

4-8 Yrs

2d ago·via iimjobs.com

Technical Lead - Golang Development (6-11 yrs)

6-11 Yrs

3d ago·via hirist.com

Senior Data Engineer - SQL/Python/ETL (5-9 yrs)

5-9 Yrs

3d ago·via hirist.com
write
Share an Interview