i
Tesco
68 Tesco Jobs
10-15 years
Systems Engineer III - Kafka Administration (10-15 yrs)
Tesco
posted 2mon ago
Flexible timing
Key skills for the job
Key Responsibilities As System Engineer III
Kafka Administration :
- Install, configure, and maintain Kafka clusters, ensuring high availability and reliability.
- Manage Kafka brokers topics partitions, and configurations to optimize performance.
- Implement and manage Kafka security protocols, including SSL/SASL, encryption, and access control lists (ACLs)as well Manage Kafka quotas
- Monitor Kafka clusters for performance issues, troubleshoot problems, and implement solutions.
- Perform regular Kafka upgrades, patching, and maintenance to ensure stability and security.
Monitoring & Alerting :
- Set up comprehensive monitoring systems to track Kafka cluster health performance and resource usage.
- Implement real-time alerting mechanisms for critical Kafka metrics such as lag; throughput; broker performance; and disk usage.
- Use monitoring tools such as Prometheus; Grafana; Splunk; or Datadog to create dashboards and alerts.
- Continuously refine and improve alert thresholds to minimize false positives and ensure timely issue detection.
Performance Tuning & Optimization :
- Analyse and optimize Kafka performance; including tuning broker configurations; producer/consumer settings; and JVM parameters.
- Conduct capacity planning and ensure Kafka infrastructure can handle growing data volumes.
- Troubleshoot and resolve Kafka performance issues; including slow consumers; high latency; and broker instability.
Azure Cloud Management :
- Manage and maintain Azure cloud resources; including virtual machines; storage accounts; virtual networks; databases; and other Azure services.
- Implement and manage security measures; including identity and access management; network security groups; and encryption.
- Monitor and optimize Azure performance; ensuring scalability; reliability; and cost-effectiveness.
- Plan and execute Azure resource provisioning; scaling; and disaster recovery strategies.
Middleware Platform Administration :
- Deep technical knowledge in Biztalk RTI SFG Ab Initio kafka Azure Administration Hadoop & Tibco ( capacity planning troubleshooting performance optimisation deployment & management of application
- Very good experience with system automation and deployment tools (Chef Puppet Ansible)
- Very good understanding of Unix /Linux command Shell scripting & power shell scripting
- Deep technical knowledge of networking (VPN subnet firewall SSH GTM LTM etc.)
- Technical knowledge of modern build tools like Jenkins Ant
- Deep Technical knowledge in architecting designing and integrating new solutions in a large scale enterprise of highly distributed applications (i.e. having an architectural sense for ensuring availability reliability maintainability scalability etc.)
- Technical knowledge on working of virtual machine & Physical machine
- Deep technical knowledge on Hardware configurations and setup like racks disk topology RAID etc.
- Sound knowledge on back up and restoration using Veeam or networker
Capacity Planning & Scaling :
- Perform capacity planning and forecasting to ensure that Kafka /Apicurio clusters can handle growth and increasing data volumes.
- Implement horizontal and vertical scaling strategies to meet business needs.
- Work closely with infrastructure teams to optimize hardware resources for Kafka deployments.
Automation & Scripting :
- Develop and maintain automation scripts for Kafka / Apicurio administration tasks using tools like Ansible; Terraform; or custom scripts.
- Create and maintain monitoring and alerting scripts to ensure the health and performance of Kafka clusters.
- Implement and manage CI/CD pipelines for API deployment and management.
- Automate routine maintenance tasks.
Incident Management & Support :
- Act as the primary point of contact for Kafka /Apicurio/Azure-related incidents; ensuring quick resolution and minimal downtime.
- Collaborate with development; DevOps; and SRE teams to diagnose and resolve Kafka-related issues.
- Participate in on-call rotations to provide 24/7 support for Kafka environments.
Documentation & Training :
- Document configurations; best practices; and troubleshooting procedures in confluence page.
- Provide training and mentorship to junior team members on Platform administration
- Develop and maintain runbooks for incident response
Collaboration :
- Collaborate with Infrastructure development, DevOps, and IT teams to implement and support Kafka-based solutions.
- Participate in on-call rotations and provide support for critical issues outside of regular business hours as needed.
- Engage with stakeholders to understand requirements and design solutions that align with business goals.
Functional Areas: Other
Read full job descriptionPrepare for System Engineer 3 roles with real interview advice