i
Tricog Health Services
1 Tricog Health Services Site Reliability Engineer Job
12-20 years
Lead DevOps & Site Reliability Engineer (12-20 yrs)
Tricog Health Services
posted 2mon ago
Flexible timing
Key skills for the job
Roles & Responsibilities :
- Responsible for the stability, high availability, and scalability of the environments by using automation, self-healing/immutable microservice-based architectures /clustering on cloud platforms.
- Collaborating with developers, Test Engineers, Product Managers, and InfoSec Managers for their DevSecOps and SRE requirements and providing the architectural solution as well as implementation.
- Championing & driving application, infrastructure, development environment & organization security efforts, etc.
- Lead developer productivity projects in the areas of environment automation, monitoring, and updates.
- Consulting with management on the operational requirements of software solutions.
- Contributing expertise on information system options, risk, and operational impact.
- Mentoring junior SRE/DevSecOps in gaining experience and assuming DevSecOps responsibilities.
- Overseeing routine maintenance procedures, performing diagnostic tests, debugging the system/process faults, and identifying areas to save cost.
- Ensuring proper documentation and internal publishing of design tradeoffs, SRE/DevSecOps best practices, and lessons learned.
- Keeping up with software development, DevSecOps, and SRE trends and innovation.
- Design, implement, and evolve highly scalable and fault-tolerant distributed components using core DevSecOps principles.
- Hands-on experience in using Docker, and Kubernetes with proper metrics instrumentation in software components, to help facilitate real-time and remote troubleshooting/performance monitoring.
- Design and build automated code deployment systems that simplify development work and make our work more consistent and predictable.
- Work closely with Operations & Infrastructure teams, developers, and other stakeholders for cross-functional development activities.
- Collaborate with the security team to implement and verify secure coding techniques, and integrate code security tools for Continuous Integration.
- Scanning repositories for security vulnerabilities : SonarCube, Blackduck, Zed Attack Proxy (ZAP), Vault, OWASP dep check, early threat modeling, and Security design reviews.
- Automated deployment, Continuous integration, Continuous delivery, and release engineering to Development, QA, and Production environments.
- Contribute to an efficient development process pipeline by leveraging best-in-class CI/CD tools.
- Experience with configuration automation tools (Puppet/Ansible/Chef/Salt).
- Experience with middleware and database systems like Kafka, Aerospike, Cassandra, MySQL, NoSQL, etc.
- Understand and own component security analysis, test stage data, and log analysis including code and data flow review.
- Design and implement APIs, abstractions, and integration patterns to solve challenging distributed computing problems.
- Support in triaging and troubleshooting of highly distributed services in the production environment.
Skills Required :
- Extensive experience in DevSecOps engineering, team management, and collaboration, knowledge of any major programming languages such as Python and Java, and writing code and scripts.
- Experience in working in a highly agile environment with a working familiarity with the entire software development lifecycle including version control, build process, testing, and code release.
- Experience with operating system internals, file systems, disk/storage, and networking protocols.
- Proficiency in documenting processes and monitoring performance metrics.
- Advanced knowledge of best practices related to data encryption, certificate management, and cybersecurity.
- Previous experience in the healthcare domain is a big plus.
- Hands-on experience in Splunk, Sysdig, Elasticsearch, Prometheus, Grafana, etc.is a big plus.
- Experience with Docker Networking, Service Mesh, and Proxies is a big plus.
- Certifications from CNCF CKA / CKAD are a plus.
- Experience with distributed databases, distributed computing, and high-frequency transactions is a plus.
Qualification/Experience :
- Minimum of 12 years with 10+ years of SRE/DevOps experience.
- BTech/BE/BS or MTech/MCA/ME/MS.
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Site Reliability Engineer roles with real interview advice