POSITION OVERVIEW:
We are looking for an experienced and highly skilled Data Architect to join our team. The ideal candidate will be a strategic thinker, capable of designing and implementing large-scale data architectures that support efficient, scalable, and robust data pipelines. You will be responsible for building and optimizing ETL/ELT processes, designing data models, and developing microservices that integrate with various systems. In addition, you will be responsible for driving best practices in both data engineering and DevOps. If you have a strong background in Java, Python, Spark, and data engineering, along with expertise in DevOps practices, this role is for you.
RESPONSIBILITIES:
- Design and Build Scalable ETL Pipelines : Architect, design, and optimize large-scale ETL/ELT pipelines that prioritize performance, scalability, and data integrity. Ensure the seamless integration of data pipelines within the overall data ecosystem.
- Microservices Development and Maintenance : Lead the design and development of microservices for managing data processing tasks. Create reusable, scalable, and maintainable services for data handling and ensure they can scale efficiently to meet business needs.
- Data Integration and Transformation : Work collaboratively with cross-functional teams to design and integrate data from multiple systems, including HCM and Data Platform. Focus on high-quality data transformation, cleansing, and deduplication, ensuring efficient, maintainable pipelines.
- Optimize Data Processing : Drive the optimization of ETL pipeline performance, with a focus on real-time data processing and low-latency delivery. Troubleshoot, debug, and resolve issues in complex distributed systems.
- Database and Big Data Tools Management : Leverage SQL and big data tools (e.g., Hive, HBase, Parquet) to manage large datasets, querying and storing data at scale. Ensure high-performance and cost-effective storage and querying of data.
- Performance Optimization & Data Quality : Lead initiatives to maintain and improve data quality by implementing best practices for data deduplication, transformation, and performance optimizations across data pipelines.
- Collaboration and Mentorship : Work closely with Data Scientists, Analysts, and Engineers to ensure smooth data integration and high-quality outcomes. Provide mentorship and leadership to junior engineers, encouraging the adoption of best practices in data engineering and DevOps
SKILLS & COMPETIENCES
- Strong Technical Expertise:
- In-depth knowledge of Python, Java, or Scala for building scalable data pipelines and microservices.
- Proven experience with Apache Spark, Kafka, and Airflow for building data workflows and handling large datasets.
- Proficient in SQL for querying structured data and interacting with databases, ensuring efficient data operations.
- Experience in microservices architecture to build, deploy, and manage data services in a scalable, modular fashion.
- Expertise with DevOps tools such as Jenkins, Git, Docker, and Kubernetes to support continuous integration and continuous delivery (CI/CD) processes for data solutions.
- Big Data & Distributed Systems :
- Hands-on experience with big data tools like Hive, HBase, Parquet, and other data storage solutions to manage large-scale data.
- Proficient in data transformation, deduplication, and performance optimization techniques within distributed systems.
- Familiarity with cloud-based data infrastructure and data lake architectures, ensuring scalability and reliability for high-volume workloads.
- Machine Learning Integration & Optimization :
- Familiarity with integrating Machine Learning workflows into data engineering pipelines, automating ML data processing and deployment.
- Problem-Solving & Debugging :
- Root Cause Analysis & Performance Tuning: Quickly diagnose and resolve issues in data pipelines and distributed systems, optimizing for throughput, latency, and resource efficiency.
- Error Handling & Fault Tolerance: Implement robust error-handling mechanisms and fault-tolerant systems to ensure data integrity and reliable operations in production environments.
- Scalability & Real-Time Debugging: Troubleshoot scalability issues and optimize real-time data processing pipelines, ensuring low-latency and high-throughput performance at scale.
- Proactive Monitoring & Alerts: Set up and refine monitoring, logging, and alerting systems to detect and address potential issues before they affect system performance.
- Cross-Team Collaboration: Work closely with cross-functional teams to resolve complex integration and system issues, ensuring smooth data flow and operations across the architecture.
WORK EXPERIENCE & EDUCATION
- Requires 10+ years of professional experience is essential for understanding system architecture, programming, and technical troubleshooting.
- Outstanding written and verbal communication skills.
- Bachelors/master s in computer science required.
Employment Type: Full Time, Permanent
Read full job description