Education: Bachelor s (BE) or Masters (MS, MCS). (Computer Science preferred) We at Altizon are on a mission to build easy-to-use products to solve real-world, hard, and complex problems for our customers. We are building mission-critical products and solutions to help our customers in manufacturing make data-driven decisions. Our goal is to provide data insights at our user s fingertips in the most simplistic yet powerful way. We have built expertise in building Digital Factory solutions over the last 11 years and continue to strive by learning continuously and through innovations. Our customers in the Food Beverage, Chemical, and Automobile industries love our products. Our flagship IIOT platform is a critical element in their operations playbook; whether it s improving productivity and throughput on the shop floor, predictive maintenance, digital checklist, or predicting quality using our advanced AI/ML algorithms, or taking a bird s eye view of their factory efficiency from a control tower. To maintain our leadership position in the above markets and continue to deliver new and innovative products, we are looking to hire a Site Reliability Engineer with strong statistical and analytics concepts with expertise in Python or R. The successful candidate will have 1 to 3 years of experience and deep knowledge of these technologies. What you will do
Get an opportunity to work on a cutting-edge, highly scalable technology stack that handles billions of events at scale.
Maintain the health and integrity of the platform infrastructure and data processing pipelines. This includes Compute, Network Storage infra.
Design, develop, and employ tools, scripts, instrumentation, and dashboards that will monitor the availability and performance of each component of the platform.
Handle server/service outages with priority and communicate any unavailability to stakeholders.
Make sure that the platform and its internal services are working according to the laid-down security guidelines and compliance requirements.
Implement backup and disaster recovery strategies.
Be responsible for continuous builds, nightly builds, and associated tooling.
Be responsible for maintaining Docker images/scripts for internal microservices.
Own DevOps scripts for deploying new/updating existing stacks with a required footprint (small/HA) - Day 1 and Day 2.