Add office photos
Employer?
Claim Account for FREE

PwC

3.4
based on 8.9k Reviews
Filter interviews by

20+ New Canara Refrigeration Interview Questions and Answers

Updated 20 Jan 2025
Popular Designations

Q1. What is data flow? Difference with ADF pipeline and data flow

Ans.

Data flow is a visual representation of data movement and transformation. ADF pipeline is a set of activities to move and transform data.

  • Data flow is a drag-and-drop interface to design data transformation logic

  • ADF pipeline is a set of activities to orchestrate data movement and transformation

  • Data flow is more flexible and powerful than ADF pipeline

  • Data flow can be used to transform data within a pipeline or as a standalone entity

Add your answer

Q2. What is the difference between reparation and coalesce? What is the difference between persist and cache?

Ans.

repartition vs coalesce, persist vs cache

  • repartition is used to increase or decrease the number of partitions in a DataFrame, while coalesce is used to decrease the number of partitions without shuffling

  • persist is used to persist the DataFrame in memory or disk for faster access, while cache is a shorthand for persisting the DataFrame in memory only

  • repartition example: df.repartition(10)

  • coalesce example: df.coalesce(5)

  • persist example: df.persist()

  • cache example: df.cache()

Add your answer

Q3. What are the concepts of coalesce and repartition in data processing?

Ans.

Coalesce and repartition are concepts used in data processing to control the number of partitions in a dataset.

  • Coalesce is used to reduce the number of partitions in a dataset without shuffling the data, which can improve performance.

  • Repartition is used to increase or decrease the number of partitions in a dataset by shuffling the data across the cluster.

  • Coalesce is preferred over repartition when reducing partitions to avoid unnecessary shuffling of data.

  • Repartition is usefu...read more

Add your answer

Q4. What is afd? build dynamic pipeline spark arcticture sql data flow

Ans.

AFD is not a commonly used term in data engineering. Can you provide more context?

    Add your answer
    Discover New Canara Refrigeration interview dos and don'ts from real experiences

    Q5. What is the SQL query to find the third highest salary from a given table?

    Ans.

    Use SQL query with ORDER BY and LIMIT to find the third highest salary from a table.

    • Use ORDER BY clause to sort salaries in descending order

    • Use LIMIT 1 OFFSET 2 to skip the first two highest salaries

    • Example: SELECT salary FROM employees ORDER BY salary DESC LIMIT 1 OFFSET 2

    Add your answer

    Q6. What are transformations, and how many types of transformations exist?

    Ans.

    Transformations are operations performed on data to convert it from one form to another. There are mainly two types of transformations: narrow and wide.

    • Transformations are operations performed on data to convert it from one form to another.

    • Narrow transformations are those where each input partition will contribute to only one output partition, e.g., map, filter.

    • Wide transformations are those where each input partition may contribute to multiple output partitions, e.g., groupB...read more

    Add your answer
    Are these interview questions helpful?

    Q7. What are the challenges you faced during migrating any data from one system to other?

    Ans.

    Challenges faced during data migration include data loss, compatibility issues, downtime, and security concerns.

    • Data loss: Ensuring all data is successfully transferred without any loss or corruption.

    • Compatibility issues: Ensuring data formats, structures, and systems are compatible for seamless migration.

    • Downtime: Minimizing downtime during migration to avoid disruption to operations.

    • Security concerns: Ensuring data security and privacy are maintained throughout the migratio...read more

    Add your answer

    Q8. Is nested for each possible in ADF?

    Ans.

    Yes, nested for each is possible in ADF.

    • Nested for each can be used to iterate through nested arrays or objects.

    • It can be used in mapping data flows and pipelines.

    • Example: For each customer, for each order, for each item in order.

    • It can improve performance by reducing the number of activities in a pipeline.

    Add your answer
    Share interview questions and help millions of jobseekers 🌟

    Q9. Difference between coalesce and reparation

    Ans.

    Coalesce is used to return the first non-null value among its arguments, while reparation is not a standard function in SQL.

    • Coalesce is a standard SQL function, while reparation is not.

    • Coalesce returns the first non-null value among its arguments.

    • Reparation is not a standard SQL function and may refer to a custom function or process specific to a certain system or application.

    Add your answer

    Q10. how to delete duplicate from a database

    Ans.

    To delete duplicates from a database, you can use SQL queries to identify and remove duplicate records.

    • Use the DISTINCT keyword in a SELECT query to retrieve unique records

    • Identify duplicate records using GROUP BY and HAVING clauses

    • Delete duplicate records using DELETE statement with subquery to keep only one instance

    Add your answer

    Q11. Explain about spark job process and its planning

    Ans.

    Spark job process involves job submission, DAG creation, task scheduling, and task execution.

    • Spark job is submitted to the SparkContext by the user.

    • Spark creates a Directed Acyclic Graph (DAG) of the job's stages and tasks.

    • Tasks are scheduled by the Spark scheduler based on data locality and resource availability.

    • Tasks are executed on worker nodes in the cluster.

    • Output is collected and returned to the user.

    Add your answer

    Q12. Repartition vs coalesce, dag vs lineage

    Ans.

    Explanation of repartition vs coalesce and dag vs lineage in data engineering

    • Repartition: increases or decreases the number of partitions in a DataFrame or RDD

    • Coalesce: decreases the number of partitions in a DataFrame or RDD

    • DAG (Directed Acyclic Graph): a graph that represents the flow of data and operations in a Spark job

    • Lineage: the history of transformations that were applied to a RDD or DataFrame

    • Repartition is a shuffle operation and can be expensive, while coalesce is a...read more

    Add your answer

    Q13. what is spark,explain its ecosystem

    Ans.

    Spark is a fast and general-purpose cluster computing system for big data processing.

    • Spark provides APIs in Java, Scala, Python, and R for distributed data processing.

    • It includes components like Spark SQL for SQL and structured data processing, Spark Streaming for real-time data processing, MLlib for machine learning, and GraphX for graph processing.

    • Spark can run on top of Hadoop, Mesos, Kubernetes, or in standalone mode.

    • It uses in-memory processing to speed up data processin...read more

    Add your answer

    Q14. explain about oom and driverhead memory

    Ans.

    OOM stands for Out Of Memory and driverhead memory refers to the memory allocated to the driver in a Spark application.

    • OOM occurs when a system runs out of memory to allocate for processes, leading to crashes or performance issues.

    • Driverhead memory in Spark is the memory allocated to the driver program, which coordinates tasks and manages the overall execution of the application.

    • Adjusting memory settings like executor memory, driver memory, and memory overhead can help preven...read more

    Add your answer

    Q15. what is data skewness

    Ans.

    Data skewness is a measure of asymmetry in the distribution of data values.

    • Data skewness indicates the lack of symmetry in the data distribution.

    • Positive skewness means the tail on the right side of the distribution is longer or fatter.

    • Negative skewness means the tail on the left side of the distribution is longer or fatter.

    • Skewness value of 0 indicates a perfectly symmetrical distribution.

    Add your answer

    Q16. Write code to print reverse of string.

    Ans.

    Code to print reverse of string

    • Use a loop to iterate through the characters of the string in reverse order

    • Append each character to a new string to build the reversed string

    • Return the reversed string

    Add your answer

    Q17. Dataframes in Pyspark

    Ans.

    Dataframes in Pyspark are distributed collections of data organized into named columns.

    • Dataframes are similar to tables in a relational database.

    • They can be created from various data sources like CSV, JSON, Parquet, etc.

    • Dataframes support SQL queries and transformations using PySpark functions.

    Add your answer

    Q18. Ready to travel on site

    Ans.

    Yes, I am ready to travel on site for data engineering projects.

    • I am willing to travel for client meetings, project kick-offs, and on-site troubleshooting.

    • I understand the importance of face-to-face interactions in project delivery.

    • I have previous experience traveling for work, such as attending conferences or training sessions.

    • I am flexible with my schedule and can accommodate last-minute travel if needed.

    Add your answer

    Q19. Repartition vs coalease

    Ans.

    Repartition is used to increase or decrease the number of partitions in a DataFrame, while coalesce is used to decrease the number of partitions without shuffling data.

    • Repartition involves shuffling data across the network, which can be expensive in terms of performance and resources.

    • Coalesce is a more efficient operation as it minimizes data movement by only creating new partitions if necessary.

    • Example: Repartition(10) will create 10 partitions in a DataFrame, while coalesce...read more

    Add your answer

    Q20. Copy Activity in ADF

    Ans.

    Copy Activity in ADF is used to move data between supported data stores

    • Copy Activity is a built-in activity in Azure Data Factory (ADF)

    • It can be used to move data between supported data stores such as Azure Blob Storage, SQL Database, etc.

    • It supports various data movement methods like copy, transform, and load (ETL)

    • You can define source and sink datasets, mapping, and settings in Copy Activity

    • Example: Copying data from an on-premises SQL Server to Azure Data Lake Storage usin...read more

    Add your answer
    Contribute & help others!
    Write a review
    Share interview
    Contribute salary
    Add office photos

    Interview Process at New Canara Refrigeration

    based on 16 interviews
    2 Interview rounds
    Technical Round
    HR Round
    View more
    Interview Tips & Stories
    Ace your next interview with expert advice and inspiring stories

    Top Data Engineer Interview Questions from Similar Companies

    3.8
     • 32 Interview Questions
    4.0
     • 29 Interview Questions
    3.5
     • 16 Interview Questions
    3.7
     • 15 Interview Questions
    3.8
     • 13 Interview Questions
    3.4
     • 10 Interview Questions
    View all
    Share an Interview
    Stay ahead in your career. Get AmbitionBox app
    qr-code
    Helping over 1 Crore job seekers every month in choosing their right fit company
    70 Lakh+

    Reviews

    5 Lakh+

    Interviews

    4 Crore+

    Salaries

    1 Cr+

    Users/Month

    Contribute to help millions

    Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

    Follow us
    • Youtube
    • Instagram
    • LinkedIn
    • Facebook
    • Twitter