About The Role : Come and join a dynamic and challenging team within the Intel Data Center and AI Group focused on engineering, developing and supporting world class platforms and component building blocks aligned to Intel's Data Center roadmap and strategies. We are seeking a well-rounded Site Reliability Engineer to work with a team of architects and infrastructure engineers in designing, developing, implementing, and maintaining our infrastructure services. In addition, this role will involve scaling systems sustainably through automation, and evolution, to accelerate our design and validation teams and achieve best-in-class reliability. Qualifications The candidate must have a Bachelor's degree in Computer Science, Electrical Engineering, or related fields with 6+ years of industry experience or Master's degree with 4+ years of industry experience.Minimum qualifications:
5+ years of experience in the following areas:
Experience with Linux fundamentals, System administration scripting, performance tuning, scalability and troubleshooting.
Experience working with Datacenter/Cloud Hardware and infrastructure.
Experience with developing and deploying Cluster/Datacenter/Cloud solutions.
Experience with Kubernetes deployment and/or operation.
Experience with Elastic (ELK) deployment, operation, and optimization.
Experience with infrastructure automation tooling (example:Ansible and/or Jenkins).
Preferred qualifications:Experience in the following areas:
Knowledge of server platforms as demonstrated by hands on bring-up of systems geared towards cloud/architectures.
Experience with Intel Data Center platform hardware.
Preferred, if you have Certifications based on Kubernetes, Devops, Elastic Deployment.
Experience working with monitoring and visualization tools (Prometheus, Grafana)
Experience in scripting with Python.
Experience working with containerization tooling, deployment and/or support.
Experience in system and network performance analysis and optimization.
Experience managing clusters with AI accelerators
The ideal candidate should exhibit the following behavioral traits:- Proven Problem Solving skills.- Strong Written/verbal communication and listening skills.- Self-directed work ethic, collaborative, can-do attitude.- Technical leadership skills, and ability to influence cross-division- Tolerance of ambiguity. Highly effective in an ambiguous environment and demonstrated willingness to develop clear strategies and plans with appropriately managed risks.
Inside this Business Group The Data Center & Artificial Intelligence Group (DCAI) is at the heart of Intel's transformation from a PC company to a company that runs the cloud and billions of smart, connected computing devices. The data center is the underpinning for every data-driven service, from artificial intelligence to 5G to high-performance computing, and DCG delivers the products and technologiesâ"”spanning software, processors, storage, I/O, and networking solutionsâ"”that fuel cloud, communications, enterprise, and government data centers around the world.