52 TQUANTA Technologies Jobs
3-10 years
TQuanta Technologies - Nagois Developer - Monitoring Systems (3-10 yrs)
TQUANTA Technologies
posted 7d ago
Flexible timing
Key skills for the job
Job Description :
We are seeking a skilled Nagios Administrator / Developer to manage and operate our monitoring systems. This individual will be responsible for implementing, configuring, and maintaining Nagios monitoring for servers, applications (including SaaS), and infrastructure.
The ideal candidate will have expertise in Nagios, including the Distributed Monitoring Model, along with Python scripting, REST APIs, the ELK stack, Grafana, networking, and datacenter environments. They should be capable of integrating monitoring solutions across diverse systems and services, ensuring the infrastructure runs smoothly and effectively.
Key Responsibilities :
Nagios Administration and Customization :
- Install, configure, and maintain the Nagios monitoring platform for servers, network devices, applications, and SaaS-based solutions.
- Develop custom Nagios plugins and scripts to extend the tool's capabilities and meet organizational monitoring needs.
- Configure alerts, notifications, and escalation paths for various services, ensuring timely resolution of issues.
- Maintain the distributed monitoring model to ensure scalability and reliability.
Monitoring Integration & Automation :
- Integrate Nagios with other systems using REST APIs and scripting for seamless data exchange.
- Develop and maintain automation scripts using Python to automate monitoring tasks, log collection, and report generation.
- Set up status page monitoring for critical SaaS applications, ensuring uptime and performance are continuously tracked.
- Website content monitoring to track changes or disruptions in web-based applications, ensuring service integrity and immediate alerting on performance degradation or downtime.
Networking & Datacenter Monitoring :
- Ensure the datacenter environment is covered with robust monitoring solutions, capturing critical metrics and alerting on potential failures
- Monitor and troubleshoot network devices (routers, switches, firewalls) and ensure proactive alerts for network issues.
- Collaborate with the datacenter team to monitor and maintain infrastructure components, including servers, storage, and network services.
Server & Application Monitoring :
- Manage and monitor on-premises and cloud-based servers, ensuring uptime, performance, and compliance with SLA requirements.
- Identify underutilized or idle servers through performance monitoring and data analysis, establishing key performance indicators (KPIs) such as CPU utilization, memory usage, disk I/O, and network traffic to assess server performance.
- Analyze these KPIs to provide actionable insights to leadership on resource allocation, recommending optimization strategies for consolidating or decommissioning unused or underutilized servers to enhance overall efficiency and reduce operational costs
- Implement monitoring for business-critical applications, including SaaS-based platforms such as Status Page monitoring, to ensure early detection of performance or availability issues.
ELK Stack Administration :
- Develop and manage Logstash pipelines to ingest, filter, and transform logs from various sources (applications, servers, etc.).
- Design and develop dashboards and visualizations in Kibana to provide actionable insights into system performance and application behavior.
- Optimize Elasticsearch clusters for performance, scalability, and reliability.
Performance Tuning & Optimization :
- Continuously optimize Nagios configurations to ensure efficient resource usage and minimize alert fatigue.
- Conduct root-cause analysis for recurring incidents and develop solutions to enhance system stability and performance.
- Implement capacity planning to scale Nagios and ELK stack environments as infrastructure grows.
Required Skills & Qualifications :
- Expert knowledge of Nagios : Hands-on experience in deploying, configuring, and maintaining Nagios for various infrastructure components.
- Python scripting experience : Ability to create and maintain scripts for automation, plugin development, and integration tasks.
- Strong knowledge of REST APIs for integrating Nagios with other tools and platforms.
- Proficiency with the ELK stack (Elasticsearch, Logstash, Kibana) for log collection and analysis.
- Experience on integrating Nagios to other tools like Grafana for visualization.
- Solid understanding of networking fundamentals and monitoring network devices such as routers, switches, and firewalls.
- Experience working with datacenter environments, including server, network, and application monitoring.
- Familiarity with SaaS applications and experience with status page monitoring.
- Strong knowledge of Linux/Unix system administration and cloud-based environments (AWS, Azure, etc.).
- Experience with cloud infrastructure monitoring (AWS, Azure, GCP).
- Knowledge of ITIL processes and experience with incident and problem management in a monitoring environment.
Soft Skills :
- Excellent analytical and problem-solving skills, with the ability to diagnose and resolve complex system issues efficiently.
- Strong communication and collaboration abilities, enabling seamless interaction with cross-functional teams and stakeholders
- Ability to work independently as well as in a team environment, taking ownership of tasks and driving them to completion.
- Adaptability and eagerness to learn and implement new technologies, staying up to date with the latest industry trends and best practices
Mandatory Skills : Nagios
Functional Areas: Other
Read full job descriptionPrepare for TQUANTA Technologies roles with real interview advice
3-10 Yrs
3-10 Yrs
6-15 Yrs
5-15 Yrs
3-10 Yrs