The position is intended to contribute input to project plans, schedules, and methodologies for the operation and maintenance of multiple system environments. The candidate will respond to system management alerts to address operational exceptions. A Senior System Engineer with experience is required to lead various project initiatives, which will involve visibility across all internal stakeholder groups. This role requires collaboration with various business verticals, including Development, QA, IT Operations, and Customer Operations teams, to ensure successful project execution and system management.
Responsibilities
Develop scripts for automating common processes and managing team changes and service requests effectively, ensuring proper documentation and closure of tasks.
Collaborate with support groups to enhance system monitoring and reporting, assisting in analysis and recovery from problems.
Work closely with development and support groups to coordinate operations and effectively communicate or escalate issues to meet deadlines.
Maintain a thorough understanding of all system components and application products to provide effective support.
Develop resilient infrastructure using Infrastructure-as-Code and automate processes to enhance efficiency and standardization.
Stay updated with industry best practices in cloud architecture and operations, promoting adherence to these standards.
Have experience in problem-solving by leading teams in identifying , researching, and coordinating resources necessary for effective troubleshooting.
Possess skills in communicating, implementing, and achieving business objectives .
Use scripting for automation and utilize system and problem management tools.
Install and troubleshoot applications in a Cloud Infrastructure environment.
Requirements
Over 5 years of experience in Systems Engineering.
Bachelors degree in engineering , Computer Science, or equivalent experience required .
Expertise in change management.
Proficient in program installation and troubleshooting.
Comprehensive knowledge of virtual server environments.
Skilled in scripting for automation.
Advanced understanding of system and problem management tools.
Familiarity with system recovery procedures.
Strong problem-solving abilities.
Excellent oral and written communication skills.
Extensive knowledge of AWS services, including Lambda, Apache Airflow, IAM, S3, EC2, ECS, and EKS, for building and managing cloud-based applications.
Experience with Terraform for Infrastructure as Code ( IaC ) to automate and manage infrastructure deployment.
Proficiency in implementing Continuous Integration and Continuous Deployment (CI/CD) pipelines using GitHub Actions.
Skilled in using Ansible for configuration management, application deployment, and orchestration.
Familiarity with GitHub for version control, collaboration, and managing code repositories.
Ability to utilize New Relic for application performance monitoring and optimization.
Solid experience with shell scripting to automate routine tasks and enhance operational efficiency.
Demonstrated ability to troubleshoot and resolve issues related to cloud infrastructure and application deployments.
Strong communication and collaboration skills to work effectively with cross-functional teams.