We are looking for a Big Data Intern to assist our data engineering and analytics teams in managing large-scale datasets, optimizing data pipelines, and leveraging big data technologies. This internship will provide hands-on experience in data processing, cloud computing, and distributed systems .
Key Responsibilities
Assist in building, maintaining, and optimizing big data pipelines using Apache Spark, Hadoop, or Kafka .
Work with structured and unstructured data from various sources and ensure efficient data processing.
Help in designing data storage solutions using SQL/NoSQL databases (eg, MySQL, MongoDB, Cassandra).
Collaborate with data engineers to implement ETL processes for large-scale data processing.
Work with cloud platforms (AWS, GCP, Azure) for data storage, computing, and analytics.
Conduct data cleaning, transformation, and validation for analytical use.
Contribute to real-time data streaming and batch processing solutions.
Required Qualifications
Currently pursuing or recently completed a degree in Computer Science, Data Engineering, Big Data, or a related field .
Basic knowledge of Big Data frameworks such as Hadoop, Spark, or Kafka .
Proficiency in Python, Java, or Scala for data processing.
Experience with SQL and NoSQL databases for handling large datasets.
Understanding of cloud computing and storage solutions (AWS S3, Google BigQuery, Azure Data Lake).
Strong analytical and problem-solving skills.
Preferred Qualifications
Familiarity with Apache Airflow, Flink, or Presto for workflow management.
Exposure to data warehousing solutions (Snowflake, Redshift, BigQuery).
Experience with containerization and orchestration tools (Docker, Kubernetes).
Knowledge of real-time data processing and streaming analytics.
Benefits
Hands-on experience with cutting-edge Big Data technologies .
Mentorship from experienced data engineers and cloud architects .
Exposure to real-world data challenges in a fast-paced environment.
Opportunity to convert to a full-time role based on performance.