46 Dash Hire Jobs
Production Engineer - Prometheus/Grafana (2-7 yrs)
Dash Hire
posted 4d ago
Key skills for the job
About the Role :
We are seeking a dedicated and technically proficient Production Engineer to join our team. In this role, you will be a key contributor to maintaining the stability and efficiency of our production systems. You will utilize your expertise in scripting, observability, log management, and automation to ensure smooth operations, troubleshoot errors, and enhance our testing frameworks. We are looking for a candidate with solid engineering skills who is eager to contribute to a collaborative and fast-paced environment.
Responsibilities :
- Ensure the smooth operation of production systems through proactive monitoring and maintenance.
- Respond promptly to system alerts and incidents, minimizing downtime and impact.
- Implement and manage observability tools such as Prometheus, Grafana, ELK (Elasticsearch, Logstash, Kibana), and
- Develop and maintain dashboards and alerts to track key performance indicators (KPIs).
- Analyze monitoring data to identify trends, potential issues, and areas for optimization.
- Manage and analyze system logs to troubleshoot errors and identify root causes.
- Implement log aggregation and analysis solutions to improve error detection and resolution.
- Develop and maintain error handling procedures and documentation.
- Develop and maintain automation scripts using Python, Bash, or other scripting languages to streamline operations.
- Automate routine tasks, deployments, and infrastructure management to improve efficiency and reduce manual effort.
- Implement Infrastructure as Code (IaC) principles to manage infrastructure configurations.
- Investigate and resolve production incidents, performing root cause analysis and implementing effective solutions.
- Collaborate with development and operations teams to resolve complex issues and ensure timely resolution.
- Enhance and maintain UI automation and testing frameworks to improve testing efficiency and coverage.
- Participate in deployment processes, ensuring smooth and reliable releases.
- Implement and maintain CI/CD pipelines to automate software delivery.
- Identify and address performance bottlenecks in production systems.
- Implement performance tuning and optimization strategies to improve system efficiency and responsiveness.
- Monitor and analyze system performance metrics to identify areas for improvement.
- Create and maintain comprehensive documentation for system configurations, procedures, and troubleshooting guides.
- Share knowledge and best practices with team members through training sessions and knowledge base
articles.
- Contribute to the development and improvement of internal tools and processes.
Requirements :
Essential :
- 2+ years of experience in Production Engineering, DevOps, or Testing Automation.
- Strong scripting skills in Python, Bash, or similar languages.
- Hands-on experience with observability tools like Prometheus, Grafana, ELK, or Datadog.
- Experience with log management and error handling.
- Familiarity with UI automation and testing frameworks.
- Strong problem-solving and analytical skills.
- Ability to work independently and as part of a team.
- Good communication and interpersonal skills.
Functional Areas: Manufacturing
Read full job description10-15 Yrs