Data Pipeline Development: Design, develop, and maintain data pipelines utilizing Google Cloud Platform (GCP) services like Dataflow, Dataproc, and Pub/Sub.
Data Ingestion & Transformation: Build and implement data ingestion and transformation processes using tools such as Apache Beam and Apache Spark.
Data Storage Management: Optimize and manage data storage solutions on GCP, including BigQuery, Cloud Storage, and Cloud SQL.
Security Implementation: Implement data security protocols and access controls with GCP's Identity and Access Management (IAM) and Cloud Security Command Center.
System Monitoring & Troubleshooting: Monitor and troubleshoot data pipelines and storage solutions using GCP's Stackdriver and Cloud Monitoring tools.
Generative AI Systems: Develop and maintain scalable systems for deploying and operating generative AI models, ensuring efficient use of computational resources.
Gen AI Capability Building: Build generative AI capabilities among engineers, covering areas such as knowledge engineering, prompt engineering, and platform engineering.
Knowledge Engineering: Gather and structure domain-specific knowledge to be utilized by large language models (LLMs) effectively.
Prompt Engineering: Design effective prompts to guide generative AI models, ensuring relevant, accurate, and creative text output.
Collaboration: Work with data experts, analysts, and product teams to understand data requirements and deliver tailored solutions.
Automation: Automate data processing tasks using scripting languages such as Python.
Best Practices: Participate in code reviews and contribute to establishing best practices for data engineering within GCP.
Continuous Learning: Stay current with GCP service innovations and advancements.
Core data services (GCS, BigQuery, Cloud Storage, Dataflow, etc.).
Skills and Experience:
Experience: 5+ years of experience in Data Engineering or similar roles.
Proficiency in GCP: Expertise in designing, developing, and deploying data pipelines, with strong knowledge of GCP core data services (GCS, BigQuery, Cloud Storage, Dataflow, etc.).
Generative AI & LLMs: Hands-on experience with Generative AI models and large language models (LLMs) such as GPT-4, LLAMA3, and Gemini 1.5, with the ability to integrate these models into data pipelines and processes.
Experience in Webscraping
Technical Skills:
Strong proficiency in Python and SQL for data manipulation and querying. Experience with distributed data processing frameworks like Apache Beam or Apache Spark is a plus.
Security Knowledge: Familiarity with data security and access control best practices. • Collaboration: Excellent communication and problem-solving skills, with a demonstrated ability to collaborate across teams.
Project Management: Ability to work independently, manage multiple projects, and meet deadlines.
Preferred Knowledge: Familiarity with Sustainable Finance, ESG Risk, CSRD, Regulatory Reporting, cloud infrastructure, and data governance best practices.
Bonus Skills: Knowledge of Terraform is a plus.
Education:
Degree: Bachelors or masters degree in computer science, Information Technology, or a related field.
Experience: 3-5 years of hands-on experience in data engineering.