30 Whitefield Careers Jobs
Observability & Ops Tools Engineer (4-6 yrs)
Whitefield Careers
posted 1d ago
Key skills for the job
Overview :
The Observability & Ops Tools Engineer plays a vital role in enhancing the visibility of our systems and ensuring efficient operations.
This position focuses on developing, maintaining, and optimizing observability tools that provide real-time insights into system performance and reliability.
As organizations increasingly rely on complex architectures, the need for effective monitoring and incident response has never been more critical.
The engineer will collaborate with cross-functional teams to create and implement reliable solutions that support business objectives while minimizing downtime and enhancing user experience.
This role requires a blend of technical expertise, analytical skills, and an understanding of operational frameworks in cloud environments.
By fostering a culture of observability and proactive issue resolution, the engineer will contribute significantly to our technological landscape and organizational effectiveness.
Key Responsibilities :
- Design and implement observability tools that enhance service monitoring.
- Develop automated scripts for deployment and operational tasks.
- Monitor system health and application performance metrics.
- Collaborate with development teams to embed observability into applications.
- Manage incident response processes for system outages and performance issues.
- Analyze system logs and metrics to troubleshoot issues effectively.
- Establish best practices for operational excellence.
- Utilize cloud platforms for logging and monitoring solutions.
- Integrate tools and technologies into existing operational workflows.
- Conduct regular performance tuning for applications and infrastructure.
- Facilitate workshops and training for team members on observability techniques.
- Provide recommendations for tool selections based on operational needs.
- Participate in on-call rotations to provide support during incidents.
- Continuously evaluate and improve current operational processes.
- Create comprehensive documentation for tools and procedures.
Required Qualifications :
- Bachelor's degree in Computer Science or related field.
- 4+ years of experience in operations or systems engineering.
- Strong knowledge of observability tools (e.g, Prometheus, Grafana).
- Experience with incident management frameworks (eg, ITIL).
- Familiarity with cloud services (AWS, Azure, GCP).
- Proficient in scripting languages (Python, Bash, etc.)
- Understanding of networking and security principles.
- Experience with containerization technologies (Docker, Kubernetes).
- Strong analytical and problem-solving abilities.
- Ability to work collaboratively in a cross-functional team environment.
- Excellent communication and interpersonal skills.
- Experience with performance monitoring tools.
- Knowledge of CI/CD practices and tools.
- Ability to handle multiple tasks and meet deadlines under pressure.
- Proactive mindset with a focus on continuous improvement.
- Certifications related to DevOps or cloud technologies are a plus.
Functional Areas: Other
Read full job description