NCR Atleos, headquartered in Atlanta, is a leader in expanding financial access
Our dedicated 20,000 employees optimize the branch, improve operational efficiency and maximize self-service availability for financial institutions and retailers across the globe
TITLE: Site Reliability Engineer, G10
LOCATION: Hyderabad, India
Summary
We are looking for a Site Reliability Engineer (SRE), initially focused on production AppOps, who can manage scalable systems, using best practices around automation, that improve reliability, and velocity and enable monitoring of the operational health of services throughout their lifecycle including metrics collection, aggregation, and visualization
As a member of the SRE team, you will support NCRs Financial Services business unit, product, and technology teams to improve the design and operation of systems, focusing on making them scalable, reliable, and efficient while ensuring production performance and high availability of products/services primarily deployed/running in the cloud
You will influence the development and implementation of reliable production systems and services to address emerging business needs (such as Cloud-based SaaS)
SREs pride themselves on the resiliency and stability of production systems, yet at the same time are committed to innovation and operational improvement through the application of software engineering practices to operations
The SRE will support innovation and operational improvement through the application of software engineering practices to operations
You will make our products easier to adopt and use by making improvements to the product, tools, processes, and documentation
You are someone who strives for six 9s or better in availability/uptime!
Key Areas of Responsibility (or where we need your support):
Maintenance, scale production services and servers for complex and high-throughput cloud services
Bridge and own the union between development, quality, security, and operations
Improving the scalability, service reliability, capacity, and performance of the SaaS services
Writing automation code for provisioning and operating infrastructure at a massive scale
To be an experienced software engineer focused on application reliability and scalability
Contribution to the continuous improvement of our software delivery processes and practices in a multi-location, multidisciplinary team to empower and accelerate product development
To design, configure, manage, and monitor systems in support of our product development teams
To participate in disaster recovery planning and execution
This also includes Windows and Linux Servers running in private data centers and/or using cloud PaaS providers (Azure)
Collaborating with other teams to promote the code using CI/CD and AppSec tooling
Accountable to collaborate with development/support/dependent teams and use intuition, experience and understanding to create SLIs, SLOs, and SLAs
Responsible to implement monitoring alerts, build dashboards, and manage escalation paths
Accountable for prompt support and preparation of PIR/RCA during/for the critical incidents to help not only to remediate/resolve the problem but also to minimize the downtime window
Participate in on-call Rota/schedules, and during off-hours it may require providing assistance for production outage scenarios
Ideal Technical And Professional Skills
BS degree in Computer Science or related technical field or 5 years prior relevant experience
Extensive experience in a DevOps / SRE role with demonstrable experience in deploying and managing large-scale production environments in Azure, AWS, GCP, and multi-data center environments
Experience developing and debugging code (i e,
, one or more of the following: Ansible, Python, Shell, Perl, Golang or JavaScript, Java, C, C++, Dot net)
2+ years deploying and supporting high-traffic, scalable web applications/services
2+ years with Azure/GCP/AWS
2+ years with Docker, Kubernetes, and an early version of OpenShift
Experience with Linux, Shell Scripting, PKI TLS/SSL, Network, firewalls, load balancers and backup
Experience in designing, analyzing, and running large-scale distributed systems
Experience in hosting and solving problems in public-facing services securely in Azure, AWS or GCP
Experience with orchestration, automation, and configuration management tools like Ansible (or Puppet, Chef, Terraform, Helm or related technology), git and Fabric
Excellent analysis, debugging, root-cause identification, and troubleshooting skills
Experience with Kubernetes, system virtualization, on-prem and/or hybrid cloud computing, cloud Identity, security systems, cloud monitoring and logging, and/or local/cloud storage
Experience with one or more CI/CD and related tools like Azure DevOps/Jenkins/GitHub Actions, Artifactory, Harness, CloudBuild
Experience with application disaster recovery, migration, roll-back plans, expansion, routine deployments, and system upgrades
Experience with log management, including monitoring, aggregation, alerting, and graphing (i e,
Bonus points for experience with Kafka, Elasticsearch, or Cassandra
Extra bonus points for Cloud certifications and exposure to Harness
Visit our careers site for a list of the benefits offered in your region in addition to a competitive base salary and strong work/family programs
Statement to Third Party Agencies
To ALL recruitment agencies: NCR Atleos only accepts resumes from agencies on the NCR Atleos preferred supplier list
Please do not forward resumes to our applicant tracking system, NCR Atleos employees, or any NCR Atleos facility
NCR Atleos is not responsible for any fees or charges associated with unsolicited resumes
EEO Statement
NCR Atleos is an equal opportunity employer
It is NCR Atleos' policy to hire, train, promote and pay associates based on their job-related qualifications, ability, and performance, without regard to race, colour, creed, religion, national origin, citizenship status, sex, marital status, age, physical or mental disability, sexual orientation, or veteran status
NOTE: Please review HR CMP Policy 420 concerning guidelines around internal employee transfers between roles
EEO Statement : NCR Atleos is an equal-opportunity employer
It is NCR Atleos policy to hire, train, promote, and pay associates based on their job-related qualifications, ability, and performance, without regard to race, color, creed, religion, national origin, citizenship status, sex, sexual orientation, gender identity/expression, pregnancy, marital status, age, mental or physical disability, genetic information, medical condition, military or veteran status, or any other factor protected by law
Offers of employment are conditional upon passage of screening criteria applicable to the job
EEO Statement
NCR Atleos is an equal-opportunity employer
It is NCR Atleos policy to hire, train, promote, and pay associates based on their job-related qualifications, ability, and performance, without regard to race, color, creed, religion, national origin, citizenship status, sex, sexual orientation, gender identity/expression, pregnancy, marital status, age, mental or physical disability, genetic information, medical condition, military or veteran status, or any other factor protected by law
Statement to Third Party Agencies
To ALL recruitment agencies: NCR Atleos only accepts resumes from agencies on the NCR Atleos preferred supplier list
Please do not forward resumes to our applicant tracking system, NCR Atleos employees, or any NCR Atleos facility
NCR Atleos is not responsible for any fees or charges associated with unsolicited resumes