Home
Communities
Companies
- Companies
  
  Discover best places to work
- Compare Companies
  
  Compare & find best workplace
- Add Office Photos
  
  Bring your workplace to life
- Add Company Benefits
  
  Highlight your company's perks
Reviews
- Company reviews
  
  Read reviews for 6L+ companies
- Write a review
  
  Rate your former or current company
Salaries
- Browse salaries
  
  Discover salaries for 6L+ companies
- Salary calculator
  
  Calculate your take home salary
- Are you paid fairly?
  
  Check your market value
- Share your salary
  
  Help other jobseekers
- Gratuity calculator
  
  Check your gratuity amount
- HRA calculator
  
  Check how much of your HRA is tax-free
- Salary hike calculator
  
  Check your salary hike
Interviews
- Company interviews
  
  Read interviews for 40K+ companies
- Campus placements
  
  Interviews questions for 2K+ colleges
- Share interview questions
  
  Contribute your interview questions
Jobs
Awards

WINNERS AWAITED!
- ABECA 2025
  
  WINNERS AWAITED!
  
  AmbitionBox Employee Choice Awards - 4th Edition
- ABECA 2024
  
  AmbitionBox Employee Choice Awards - 3rd Edition
- AmbitionBox Best Places to Work 2022
  
  2nd Edition
- AmbitionBox Best Places to Work 2021
  
  1st Edition

Add office photos

Employer? Claim Account for FREE

Fork Technologies

Compare

3.1

based on 10 Reviews

18 Fork Technologies Jobs

Site Reliability Engineer - System Administration & Support (7-9 yrs)

Fork Technologies

3.1

based on 10 Reviews

7-9 years

Fork Technologies

posted 2d ago

Job Role Insights

Flexible timing

Key skills for the job

Software Configuration Management Linux System Administration Site Reliability Engineering IT Infrastructure Monitoring Tools CI CD Pipeline

+ 3 more

Job Description

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of our production systems. You will collaborate with cross-functional teams, identify and resolve system bottlenecks, and proactively monitor the health of our infrastructure. If you are passionate about infrastructure management, automation, and maintaining highly available systems, this role is for you!

Key Responsibilities :

System Administration and Support :

- Maintain and manage both Linux and Windows-based systems, ensuring their performance, availability, and security.

- Install, configure, and upgrade system software and hardware.

- Develop, implement, and enforce security policies to protect the infrastructure.

System Architecture and Configuration Management :

- Work with development and operations teams to design and maintain scalable, fault-tolerant, and highly available systems.

- Use tools like Ansible, Puppet, Chef, or SaltStack for configuration management and automation of tasks.

- Design and implement infrastructure-as-code (IAC) solutions using tools such as Terraform or CloudFormation.

Networking and Protocols Expertise :

- Strong understanding of networking protocols such as TCP/IP, HTTP, DNS, and Load Balancing techniques to ensure optimal performance and uptime of systems.

- Manage network services and ensure high availability and low latency of services.

Monitoring and Performance Optimization :

- Implement and configure monitoring tools such as Grafana, Prometheus, and Loki to track system health and performance metrics.

- Set up alerts and dashboards to proactively monitor key system metrics (e.g., CPU, memory, disk I/O, network usage).

- Analyze logs and metrics to identify patterns, detect issues early, and recommend improvements to ensure reliability and stability.

Incident Response and Troubleshooting :

- Lead incident response efforts by coordinating with development, operations, and support teams to resolve critical incidents swiftly.

- Troubleshoot issues across complex systems and services, from application to networking issues, in order to restore services quickly.

- Conduct root cause investigations of incidents to implement long-term solutions and minimize recurrence.

CI/CD and Automation :

- Maintain and improve CI/CD pipelines to enable seamless software delivery and system updates with minimal downtime.

- Automate manual processes and tasks to increase efficiency and reduce the chance of human error.

- Develop and manage scripts and tools for automating deployment, monitoring, backup, and recovery operations.

Load Testing and Performance Benchmarking :

- Perform API and load testing using tools like Gatling and JMeter to assess the scalability and performance of critical services and APIs.

- Analyze performance results and recommend improvements to ensure systems are able to handle increasing traffic loads and scale seamlessly.

Collaboration and Documentation :

- Collaborate with cross-functional teams, including development, QA, and operations, to implement system improvements, optimize performance, and ensure service reliability.

- Maintain comprehensive documentation for system configurations, processes, troubleshooting steps, and operational procedures.

- Effectively communicate complex technical concepts to both technical and non-technical stakeholders.

Key Skills and Qualifications :

Extensive Knowledge of Linux and Windows Systems :

- Strong hands-on experience with system administration and troubleshooting in both Linux (e.g., Ubuntu, CentOS) and Windows environments.

System Architecture and Configuration Management :

- Experience with configuration management tools (e.g., Ansible, Puppet, Chef).

- Familiarity with containerization (e.g., Docker, Kubernetes) and cloud services (e.g., AWS, Azure, GCP).

Networking Knowledge :

- In-depth knowledge of TCP/IP, HTTP, DNS, Load Balancing, and related networking concepts and protocols.

Monitoring Tools and Metrics Analysis :

- Proficient in using monitoring tools such as Grafana, Prometheus, and Loki for real-time system monitoring and alerting.

- Experience with analyzing and troubleshooting system performance metrics and logs.

Incident Management and Root Cause Analysis :

- Proven experience in managing incidents, conducting post-mortems, and implementing measures to prevent future incidents.

CI/CD and Automation :

- Hands-on experience with CI/CD tools like Jenkins, GitLab CI, or CircleCI for automating deployments.

- Ability to write automation scripts in languages like Python, Bash, or Ruby to streamline operational workflows.

Performance Testing :

- Expertise in using performance testing tools like Gatling and JMeter to assess system scalability and performance under load.

Collaboration and Documentation Skills :

- Excellent interpersonal skills with the ability to work collaboratively in cross-functional teams.

- Strong technical writing skills for documenting complex systems and processes.

Preferred Qualifications :

- Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent work experience).

- Certifications in relevant technologies (e.g., AWS Certified Solutions Architect, Kubernetes Administrator, Red Hat Certified Engineer).

- Experience with distributed systems, container orchestration (e.g., Kubernetes), and microservices architecture.

- Familiarity with database technologies (e.g., MySQL, PostgreSQL, MongoDB, Cassandra).