i
Cloudsufi
1 Cloudsufi Data Engineer Job
Data Engineer
Cloudsufi
posted 17d ago
CLOUDSUFI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified candidates receive consideration for employment without regard to race, colour, religion, gender, gender identity or expression, sexual orientation and national origin status. We are dedicated to providing equal opportunities in employment, advancement, and all other areas of our workplace. Please explore more at https://www.cloudsufi.com/
Experience 7+yrs
Mandatory Skills GCP data engineering, required: ETL, Data lakes, data cleaning
Secondary Skills Hadoop & Big Data required:
What we are looking for
Core Skills - Data Pipeline Design and Implementation
Develop and maintain robust, scalable, and efficient ETL/ELT pipelines for structured and unstructured data.
Automate data ingestion from diverse sources such as databases, APIs, logs, IoT devices, and external feeds.
Implement real-time data streaming pipelines using technologies like Apache Kafka, Apache Flink, or Spark Streaming.
Data Storage and Management
Design and optimize storage solutions for structured data (e.g., relational databases, data warehouses).
Architect and maintain systems for unstructured data (e.g., document stores, object storage, NoSQL databases).
Optimize data partitioning, compression, and indexing strategies for performance.
Data Integration and Processing
Collaborate with data scientists, analysts, and stakeholders to ensure seamless integration of data into downstream systems.
Process and transform unstructured data formats such as images, audio, video, and text for ML/AI applications.
Work with tools like Apache NiFi, Airflow, or similar orchestration frameworks for scheduling and monitoring workflows.
Data Governance and Quality
Implement and enforce data governance policies, ensuring data quality, security, and compliance.
Monitor data pipelines to proactively identify and resolve issues such as data drift, missing values, and schema mismatches.
Ensure adherence to standards for metadata management and data cataloging.
Technology Leadership
Evaluate and implement emerging technologies for handling structured and unstructured data efficiently.
Provide guidance and mentorship to junior data engineers. Contribute to architecture and design discussions for enterprise-wide data solutions.
Performance Optimization
Optimize query performance and resource utilization for both batch and real-time workloads. Implement caching, parallel processing, and other techniques to accelerate data processing.
Documentation and Reporting
Create comprehensive documentation for data pipelines, schemas, and storage systems. Collaborate with stakeholders to generate reports and dashboards for data visibility.
Required Skills and Qualifications
Technical Skills
Programming
Expertise in Python, Scala, Java, or similar languages. Proficiency in SQL for querying and manipulating structured data.
Data Storage Technologies
Relational Databases: PostgreSQL, MySQL, SQL Server.
Data Warehouses: Snowflake, Redshift, BigQuery, Databricks.
NoSQL Databases: MongoDB, Cassandra, HBase.
Unstructured Data Storage: HDFS, S3, Google Cloud Storage, or Azure Blob Storage.
Big Data Tools
Hands-on experience with Apache Hadoop, Apache Spark, or similar frameworks.
Streaming and Messaging Systems
Proficiency with Apache Kafka, RabbitMQ, or Google Pub/Sub.
Data Orchestration
Experience with tools like Apache Airflow, Luigi, or Prefect.
Cloud Platforms
Deep experience with Google Cloud Platform.
Data Processing
Familiarity with processing unstructured data (e.g., NLP for text, computer vision for images).
DevOps for Data
Knowledge of CI/CD pipelines, Docker, Kubernetes, and Infrastructure as Code (IaC) tools like Terraform.
Employment Type: Full Time, Permanent
Read full job descriptionPrepare for Data Engineer roles with real interview advice