1 Pluang Technologies Devops Engineer Job
Pluang - DevOps Engineer - Monitoring Tools (2-4 yrs)
Pluang Technologies
posted 1mon ago
Fixed timing
Key skills for the job
Responsibilities :
- Implement and maintain robust monitoring solutions using tools like Prometheus, Grafana, ELK, New Relic, etc.
- Configure alerting mechanisms to ensure proactive identification and resolution of potential issues.
- Creating and maintaining various Ansible playbooks for automation pieces.
- Ensure configuration and compliance with configuration management tools.
- Administer and troubleshoot Linux-based systems.
- Troubleshoot problems across a wide array of services and functional areas.
- Oversee the monitoring and stability of applications hosted on EKS (Elastic Kubernetes Service).
- Work closely with development teams to optimize application performance.
- Prepare detailed reports on infrastructure resource usage.
- Identify means to optimize infrastructure utilization and reduce costs.
- Demonstrate expertise in managing and optimizing infrastructure on AWS, GCP, and Azure.
- Collaborate with cross-functional teams to ensure seamless integration with cloud services.
- Create documentation outlining the setup, configuration, and maintenance procedures for each monitoring tool.
- Develop and document incident response plans to address system outages or performance degradation promptly.
- Maintain an incident response playbook for reference during critical situations.
- Implement and document incident reporting procedures, including the creation of incident tickets,
categorization, and prioritization.
- Lead incident management efforts, ensuring timely resolution and post-incident reviews for continuous improvement.
Requirements :
- Hands-on experience with monitoring tools like Newrelic, Prometheus, Grafana, ELK or Datadog. (Preference is Newrelic).
- Hands-on experience with Incident Management tools like Opsginie and PagerDuty.
- Install, customize, support, and enhance system monitoring infrastructure.
- Integrate monitoring and incident management tools with the infrastructure.
- Support the day-to-day operation of our monitoring functions.
- Sit with teams and design end-to-end monitoring of the APIs and relevant workloads that are critical.
- Hands-on experience with Cloud platforms such as AWS/GCP or private cloud environments.
- Strong experience in Container Technologies (Docker/ Kubernetes) and containerizing applications.
- Monitoring concepts to be very strong, should have experience with ELK stack, Prometheus, and Grafana.
- Strong knowledge of Linux (ubuntu, Centos, and RHEL).
- System troubleshooting and problem-solving across platform and application domains.
- Proficiency in any programming or scripting language such as Shell Script, Python, or Ruby.
- Experience with infrastructure-as-code (e. g. Terraform).
- Experience with continuous integration, unit, and integration testing.
- Experience with RDBMS and NoSQL databases - PostgreSQL, MongoDB.
- Lead incident response efforts, providing timely resolution of system outages and performance issues.
- Ability to work independently with minimal direction; self-starter/self-motivated.
- Fintech experience - advantageous.
Functional Areas: Software/Testing/Networking
Read full job description