Add office photos
Employer?
Claim Account for FREE

EPAM Systems

3.7
based on 1.5k Reviews
Filter interviews by

10+ Greenlight Financial Technology Interview Questions and Answers

Updated 13 Mar 2025
Popular Designations

Q1. Write code to print reverse of a sentence word by word.

Add your answer

Q2. Write code for printing duplicate numbers in a list.

Ans.

This code identifies and prints duplicate numbers from a given list using a dictionary to track occurrences.

  • Use a dictionary to count occurrences of each number.

  • Iterate through the list and update the count in the dictionary.

  • Print numbers that have a count greater than 1.

  • Example: For the list [1, 2, 3, 2, 4, 3], the output should be 2 and 3.

Add your answer

Q3. Difference between cache and persist, repartition and coalesce.

Ans.

Cache stores data in memory for quick access, while persist saves it to disk. Repartition changes data distribution; coalesce reduces partitions.

  • Cache: Stores DataFrame in memory for faster access during subsequent operations.

  • Persist: Saves DataFrame to disk, allowing for fault tolerance but slower than cache.

  • Repartition: Increases or decreases the number of partitions, potentially shuffling data across nodes.

  • Coalesce: Reduces the number of partitions without a full shuffle, ...read more

Add your answer

Q4. Elaboration of Spark optimization techniques. Types of transformations, shuffling.

Ans.

Spark optimization techniques enhance performance through efficient data processing and resource management.

  • Use DataFrames and Datasets for optimized execution plans.

  • Leverage lazy evaluation to minimize unnecessary computations.

  • Apply partitioning to distribute data evenly across nodes, e.g., using 'repartition' or 'coalesce'.

  • Minimize shuffling by using narrow transformations like 'map' and 'filter' instead of wide transformations like 'groupBy'.

  • Broadcast smaller datasets to a...read more

Add your answer
Discover Greenlight Financial Technology interview dos and don'ts from real experiences

Q5. How will you handle data skewness in spark

Ans.

Data skewness can be handled in Spark by using techniques like partitioning, bucketing, and broadcasting.

  • Partitioning the data based on a key column can distribute the data evenly across the cluster.

  • Bucketing can further divide the data into smaller buckets based on a hash function.

  • Broadcasting small tables can reduce the amount of data shuffled across the network.

  • Using dynamic allocation can also help in handling data skewness by allocating more resources to tasks that are t...read more

Add your answer

Q6. Hive types of tables and difference between them

Ans.

Hive supports two types of tables: Managed and External, each with distinct data management and storage characteristics.

  • Managed Tables: Hive manages both the schema and the data. Dropping the table deletes the data.

  • External Tables: Hive manages only the schema. Dropping the table does not delete the data, which remains in the external storage.

  • Use Managed Tables for temporary data that can be recreated easily.

  • Use External Tables for data that is shared with other applications ...read more

Add your answer
Are these interview questions helpful?

Q7. What Azure solutions have you worked with?

Ans.

I have worked with Azure Data Factory, Azure Databricks, and Azure SQL Database.

  • Azure Data Factory for data integration and orchestration

  • Azure Databricks for big data processing and analytics

  • Azure SQL Database for relational database management

Add your answer

Q8. Difference between RDD, Dataframe, Dataset.

Ans.

RDD, DataFrame, and Dataset are core abstractions in Apache Spark for handling distributed data processing.

  • RDD (Resilient Distributed Dataset) is the fundamental data structure in Spark, representing an immutable distributed collection of objects.

  • DataFrames are similar to RDDs but are optimized for performance and allow for schema-based operations, making them easier to use.

  • Datasets combine the benefits of RDDs and DataFrames, providing type safety and the ability to use both...read more

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. Connecting Spark to Azure SQL Database.

Ans.

Connecting Spark to Azure SQL Database involves configuring JDBC and using Spark's DataFrame API for data operations.

  • Use the JDBC driver for Azure SQL Database to establish a connection.

  • Example connection string: 'jdbc:sqlserver://<server>.database.windows.net:1433;database=<database>;user=<user>@<server>;password=<password>'

  • Utilize Spark's DataFrame API to read and write data: df.write.jdbc(url, table, properties).

  • Ensure that the Azure SQL Database firewall allows access fro...read more

Add your answer

Q10. Discuss project and it's architecture.

Ans.

A data engineering project focused on building a scalable ETL pipeline for healthcare data analytics.

  • Architecture includes data ingestion, processing, and storage layers.

  • Used Apache Kafka for real-time data streaming from various sources.

  • Implemented Apache Spark for batch processing and data transformation.

  • Stored processed data in Amazon Redshift for analytics and reporting.

  • Utilized Airflow for orchestrating ETL workflows and scheduling tasks.

Add your answer

Q11. What tech stack are used

Ans.

The tech stack used includes Python, SQL, Apache Spark, Hadoop, AWS, and Docker.

  • Python for data processing and analysis

  • SQL for database querying

  • Apache Spark for big data processing

  • Hadoop for distributed storage and processing

  • AWS for cloud services

  • Docker for containerization

Add your answer

Q12. types of Variables in Scala

Ans.

Scala has two types of variables - mutable and immutable.

  • Scala has mutable variables that can be reassigned using the var keyword.

  • Scala also has immutable variables that cannot be reassigned once they are initialized using the val keyword.

  • Example: var mutableVariable = 10; val immutableVariable = 20;

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at Greenlight Financial Technology

based on 7 interviews
1 Interview rounds
HR Round
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Data Engineer Interview Questions from Similar Companies

3.8
 • 72 Interview Questions
3.7
 • 32 Interview Questions
4.0
 • 29 Interview Questions
3.0
 • 16 Interview Questions
3.5
 • 15 Interview Questions
3.8
 • 13 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
75 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter