Add office photos
Employer?
Claim Account for FREE

KPMG India

3.5
based on 5.3k Reviews
Filter interviews by

10+ Moringa Techsolv Interview Questions and Answers

Updated 17 Oct 2024
Popular Designations

Q1. How do you handle changing schema from source. What are the common issues faced in hadoop and how did you resolve it?

Ans.

Handling changing schema from source in Hadoop

  • Use schema evolution techniques like Avro or Parquet to handle schema changes

  • Implement a flexible ETL pipeline that can handle schema changes

  • Use tools like Apache NiFi to dynamically adjust schema during ingestion

  • Common issues include data loss, data corruption, and performance degradation

  • Resolve issues by implementing proper testing, monitoring, and backup strategies

Add your answer

Q2. Write Pyspark code to read csv file and show top 10 records.

Ans.

Pyspark code to read csv file and show top 10 records.

  • Import the necessary libraries

  • Create a SparkSession

  • Read the CSV file using the SparkSession

  • Display the top 10 records using the show() method

View 1 answer

Q3. What are the optimization techniques applied in pyspark code?

Ans.

Optimization techniques in PySpark code include partitioning, caching, and using broadcast variables.

  • Partitioning data based on key columns to optimize join operations

  • Caching frequently accessed data in memory to avoid recomputation

  • Using broadcast variables to efficiently share small data across nodes

  • Using appropriate data types and avoiding unnecessary type conversions

  • Avoiding shuffling of data by using appropriate transformations and actions

  • Using appropriate data structures...read more

View 2 more answers

Q4. Write pyspark code to change column name, divide one column by another column.

Ans.

Pyspark code to change column name and divide one column by another column.

  • Use 'withColumnRenamed' method to change column name

  • Use 'withColumn' method to divide one column by another column

  • Example: df = df.withColumnRenamed('old_col_name', 'new_col_name').withColumn('new_col_name', df['col1']/df['col2'])

Add your answer
Discover Moringa Techsolv interview dos and don'ts from real experiences

Q5. 1. What is columnar storage,parquet,delta? Why it is used

Ans.

Columnar storage is a data storage format that stores data in columns rather than rows, improving query performance.

  • Columnar storage stores data in a column-wise manner instead of row-wise.

  • It improves query performance by reducing the amount of data that needs to be read from disk.

  • Parquet is a columnar storage file format that is optimized for big data workloads.

  • It is used in Apache Spark and other big data processing frameworks.

  • Delta is an open-source storage layer that prov...read more

Add your answer

Q6. Given a dictionary, find out the greatest number for same key in Python.

Ans.

Find the greatest number for same key in a Python dictionary.

  • Use max() function with key parameter to find the maximum value for each key in the dictionary.

  • Iterate through the dictionary and apply max() function on each key.

  • If the dictionary is nested, use recursion to iterate through all the keys.

Add your answer
Are these interview questions helpful?

Q7. RDDs vs DataFrames. Which is better and why

Ans.

DataFrames are better than RDDs due to their optimized performance and ease of use.

  • DataFrames are optimized for better performance than RDDs.

  • DataFrames have a schema, making it easier to work with structured data.

  • DataFrames support SQL queries and can be used with Spark SQL.

  • RDDs are more low-level and require more manual optimization.

  • RDDs are useful for unstructured data or when fine-grained control is needed.

Add your answer

Q8. Write function to check if number is an Armstrong Number

Ans.

Function to check if a number is an Armstrong Number

  • An Armstrong Number is a number that is equal to the sum of its own digits raised to the power of the number of digits

  • To check if a number is an Armstrong Number, we need to calculate the sum of each digit raised to the power of the number of digits

  • If the sum is equal to the original number, then it is an Armstrong Number

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. 4. How to connect SQL server to databricks

Ans.

To connect SQL server to Databricks, use JDBC/ODBC drivers and configure the connection settings.

  • Install the appropriate JDBC/ODBC driver for SQL server

  • Configure the connection settings in Databricks

  • Use the JDBC/ODBC driver to establish the connection

Add your answer

Q10. how do you copy data from on-premise to azure cloud

Ans.

Data can be copied from on-premise to Azure cloud using various methods like Azure Data Factory, Azure Storage Explorer, Azure Data Migration Service, etc.

  • Use Azure Data Factory to create data pipelines for moving data from on-premise to Azure cloud

  • Utilize Azure Storage Explorer to manually copy data from on-premise to Azure Blob Storage

  • Leverage Azure Data Migration Service for migrating large volumes of data from on-premise databases to Azure SQL Database

  • Consider using Azure...read more

Add your answer

Q11. How to initiate Sparkcontext

Ans.

To initiate Sparkcontext, create a SparkConf object and pass it to SparkContext constructor.

  • Create a SparkConf object with app name and master URL

  • Pass the SparkConf object to SparkContext constructor

  • Example: conf = SparkConf().setAppName('myApp').setMaster('local[*]') sc = SparkContext(conf=conf)

  • Stop SparkContext using sc.stop()

Add your answer

Q12. 3. Explain detail project architecture

Ans.

The project architecture involves the design and organization of data pipelines and systems for efficient data processing and storage.

  • The architecture includes components such as data sources, data processing frameworks, storage systems, and data delivery mechanisms.

  • It focuses on scalability, reliability, and performance to handle large volumes of data.

  • Example: A project architecture may involve using Apache Kafka for real-time data ingestion, Apache Spark for data processing...read more

Add your answer

Q13. what is integration run time in adf

Ans.

Integration run time in ADF is a compute infrastructure used to run activities in Azure Data Factory pipelines.

  • Integration run time is a managed compute infrastructure in Azure Data Factory.

  • It is used to run activities within pipelines, such as data movement or data transformation tasks.

  • Integration run time can be auto-scaled based on the workload requirements.

  • It supports various data integration scenarios, including batch processing and real-time data processing.

  • Examples of ...read more

Add your answer

Q14. Optimisation techniques used

Ans.

Optimisation techniques used in data engineering

  • Partitioning data to improve query performance

  • Using indexing to speed up data retrieval

  • Implementing caching mechanisms to reduce data access time

  • Optimizing data storage formats for efficient storage and processing

  • Parallel processing and distributed computing for faster data processing

  • Using compression techniques to reduce storage space and improve data transfer

  • Applying query optimization techniques like query rewriting and query...read more

Add your answer

Q15. Optimising technique that you have used

Ans.

I have used partitioning and indexing to optimize query performance.

  • Implemented partitioning on large tables to improve query performance by limiting the data scanned

  • Created indexes on frequently queried columns to speed up data retrieval

  • Utilized clustering keys to physically organize data on disk for faster access

Add your answer

Q16. Spark optimization techniques

Ans.

Spark optimization techniques involve partitioning, caching, and tuning resources for efficient data processing.

  • Partitioning data to distribute workload evenly

  • Caching frequently accessed data to avoid recomputation

  • Tuning resources like memory allocation and parallelism

  • Using broadcast variables for small lookup tables

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at Moringa Techsolv

based on 10 interviews
2 Interview rounds
Technical Round - 1
Technical Round - 2
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Data Engineer Interview Questions from Similar Companies

3.9
 • 72 Interview Questions
3.4
 • 18 Interview Questions
3.3
 • 16 Interview Questions
3.7
 • 15 Interview Questions
3.0
 • 12 Interview Questions
3.8
 • 12 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter