10+ Jhajharia Nirman Limited Interview Questions and Answers

Question 1

Asked in

Senior Data Engineer Interview

Q1. how to migrate 1000s of tables using spark(databricks) notebooks

Add your answer

Answer

Use Spark (Databricks) notebooks to migrate 1000s of tables efficiently.

Utilize Spark's parallel processing capabilities to handle large volumes of data
Leverage Databricks notebooks for interactive data exploration and transformation
Automate the migration process using scripts or workflows
Optimize performance by tuning Spark configurations and cluster settings

Question 2

Asked in

Senior Data Engineer Interview

Q2. What is the process for finding the missing number from a list?

Add your answer

Answer

To find the missing number from a list, calculate the sum of all numbers in the list and subtract it from the expected sum of the list.

Calculate the sum of all numbers in the list using a loop or a built-in function.
Calculate the expected sum of the list using the formula n*(n+1)/2, where n is the length of the list.
Subtract the sum of the list from the expected sum to find the missing number.

Question 3

Asked in

Senior Data Engineer Interview

Q3. Dataflow vs Dataproc, layering processing and curated environments in gcp , Data cleaning

Add your answer

Answer

Dataflow and Dataproc are both processing services in GCP, but with different approaches and use cases.

Dataflow is a fully managed service for executing batch and streaming data processing pipelines.
Dataproc is a managed Spark and Hadoop service for running big data processing and analytics workloads.
Dataflow provides a serverless and auto-scaling environment, while Dataproc offers more control and flexibility.
Dataflow is suitable for real-time streaming and complex data tran...read more

Question 4

Asked in

Senior Data Engineer Interview

Q4. What are some methods for optimizing Spark performance?

Add your answer

Answer

Optimizing Spark performance involves tuning configurations, partitioning data, caching, and using efficient transformations.

Tune Spark configurations for memory allocation, parallelism, and resource management.
Partition data properly to distribute work evenly across nodes and minimize shuffling.
Cache intermediate results in memory to avoid recomputation.
Use efficient transformations like map, filter, and reduceByKey instead of costly operations like groupByKey.
Opt for column...read more

Question 5

Asked in

Senior Data Engineer Interview

Q5. end to end project architecture.

Add your answer

Answer

The end-to-end project architecture involves designing and implementing the entire data pipeline from data ingestion to data visualization.

Data ingestion: Collecting data from various sources such as databases, APIs, and files.
Data processing: Cleaning, transforming, and aggregating the data using tools like Apache Spark or Hadoop.
Data storage: Storing the processed data in data warehouses or data lakes like Amazon S3 or Google BigQuery.
Data analysis: Performing analysis on t...read more

Question 6

Asked in

Senior Data Engineer Interview

Q6. delete duplicates from table in spark and sql

Add your answer

Answer

To delete duplicates from a table in Spark and SQL, you can use the DISTINCT keyword or the dropDuplicates() function.

In SQL, you can use the DISTINCT keyword in a SELECT statement to retrieve unique rows from a table.
In Spark, you can use the dropDuplicates() function on a DataFrame to remove duplicate rows.
Both methods compare all columns by default, but you can specify specific columns to consider for duplicates.
You can also use the partitionBy() function in Spark to remov...read more

Question 7

Asked in

Senior Data Engineer Interview

Q7. expectations from EPAM

Add your answer

Answer

I expect EPAM to provide challenging projects, opportunities for growth, a collaborative work environment, and support for continuous learning.

Challenging projects that allow me to utilize my skills and knowledge
Opportunities for professional growth and advancement within the company
A collaborative work environment where teamwork is valued
Support for continuous learning through training programs and resources

Question 8

Asked in

Senior Data Engineer Interview

Q8. types of transformations,no of jobs,tasks,actions

Add your answer

Answer

The question is asking about types of transformations, number of jobs, tasks, and actions in the context of a Senior Data Engineer role.

Types of transformations: Extract, Transform, Load (ETL), MapReduce, Spark transformations, SQL transformations
Number of jobs: Depends on the complexity and scale of the data engineering projects
Number of tasks: Varies based on the number of data sources, data transformations, and data destinations
Actions: Data ingestion, data cleaning, data ...read more

Question 9

Asked in

Senior Data Engineer Interview

Q9. optimisation in spark,sql,bigquery,airflow

Add your answer

Answer

Optimization techniques in Spark, SQL, BigQuery, and Airflow.

Use partitioning and bucketing in Spark to optimize data processing.
Optimize SQL queries by using indexes, query rewriting, and query optimization techniques.
In BigQuery, use partitioning and clustering to improve query performance.
Leverage Airflow's task parallelism and resource allocation to optimize workflow execution.

Question 10

Asked in

Senior Data Engineer Interview

Q10. architecture of spark,airflow,bigquery,

Add your answer

Answer

Spark is a distributed processing engine, Airflow is a workflow management system, and BigQuery is a fully managed data warehouse.

Spark is designed for big data processing and provides in-memory computation capabilities.
Airflow is used for orchestrating and scheduling data pipelines.
BigQuery is a serverless data warehouse that allows for fast and scalable analytics.
Spark can be integrated with Airflow to schedule and monitor Spark jobs.
BigQuery can be used as a data source or...read more

10+ Jhajharia Nirman Limited Interview Questions and Answers

Q1. how to migrate 1000s of tables using spark(databricks) notebooks

Q2. What is the process for finding the missing number from a list?

Q3. Dataflow vs Dataproc, layering processing and curated environments in gcp , Data cleaning

Q4. What are some methods for optimizing Spark performance?

Q5. end to end project architecture.

Q6. delete duplicates from table in spark and sql

Q7. expectations from EPAM

Q8. types of transformations,no of jobs,tasks,actions

Q9. optimisation in spark,sql,bigquery,airflow

Q10. architecture of spark,airflow,bigquery,

More about working at EPAM Systems

Interview Process at Jhajharia Nirman Limited

Top Senior Data Engineer Interview Questions from Similar Companies