You will automate and streamline our operations and processes, build and maintain tools for deployment, monitoring and operations. You will also be responsible for automating the complete deployment, so that we can bring up or down the whole stack in a very short time.
Key Tasks and Responsibilities:
Automate, streamline all deployment activities
Monitor all production instances for incidents and trends
Implement best practices to ensure for an always-up, always-available service
Ensure the SaaS environment follows all the best practices of security and is attack proof.
Optimize cloud infrastructure to reduce TCO
Work closely with Engineering to understand changes in each release and keep all tools up-to-date to ensure automated deployments
Plan, execute all planned downtimes for upgrades, maintenance activities etc
Own up all infrastructure related troubleshooting during unplanned outages
Escalate and communicate issues timely and thrive to seek quick resolution
Our Ideal Candidate
Strong Linux background
Strong security and networking background
Knowledge of Chef or Puppet
Scripting (Ruby, Python, Bash)
Monitoring through JMX
Monitoring system (Nagios, Zabbix)
Ability to use a wide variety of open source technologies and cloud services
Experience on AWS EC2 (or other private/public Cloud providers)
Knowledge of best practices and IT operations in an always-up, always-available service
BE/B.Tech in computer science or similar streams.
Overall 4 6 years experience with a minimum 2 years of relevant experience in managing large AWS environments.