Add office photos
Engaged Employer

Cognizant

3.8
based on 50.1k Reviews
Video summary
Proud winner of ABECA 2024 - AmbitionBox Employee Choice Awards
Filter interviews by

20+ Street Surge Technologies Interview Questions and Answers

Updated 22 Dec 2024
Popular Designations

Q1. What are all the issues you faced in your project? What is Global Parameter? Why do we need parameters inADF? What are the API's in Spark?

Ans.

Answering questions related to data engineering

  • Issues faced in project: data quality, scalability, performance, integration

  • Global parameter: a parameter that can be accessed across multiple components in a system

  • Parameters in ADF: used to pass values between activities in a pipeline

  • APIs in Spark: Spark SQL, Spark Streaming, MLlib, GraphX

Add your answer

Q2. What is the difference between supervised and unsupervised learning?

Ans.

Supervised learning uses labeled data to train the model, while unsupervised learning uses unlabeled data.

  • Supervised learning requires a target variable to be predicted, while unsupervised learning does not.

  • In supervised learning, the model learns from labeled training data, whereas in unsupervised learning, the model finds patterns in unlabeled data.

  • Examples of supervised learning include regression and classification tasks, while clustering is a common unsupervised learning...read more

Add your answer

Q3. How to find delta between two tables in SQL?

Ans.

To find delta between two tables in SQL, use the EXCEPT or MINUS operator.

  • Use the EXCEPT operator in SQL to return rows from the first table that do not exist in the second table.

  • Use the MINUS operator in SQL to return distinct rows from the first table that do not exist in the second table.

View 1 answer

Q4. What is ADLS How we can pass papameter from ADF to databricks

Ans.

ADLS is Azure Data Lake Storage, a scalable and secure data lake solution in Azure.

  • ADLS is designed for big data analytics workloads

  • It supports Hadoop Distributed File System (HDFS) and Blob storage APIs

  • It provides enterprise-grade security and compliance features

  • To pass parameters from ADF to Databricks, use the 'Set Parameters' activity in ADF and reference them in Databricks notebooks

Add your answer
Discover Street Surge Technologies interview dos and don'ts from real experiences

Q5. How did you overcome out of memory issues

Ans.

I optimized code, increased memory allocation, used efficient data structures, and implemented data partitioning.

  • Optimized code by identifying and fixing memory leaks

  • Increased memory allocation for the application

  • Used efficient data structures like arrays, hashmaps, and trees

  • Implemented data partitioning to distribute data across multiple nodes

Add your answer

Q6. What are the Optimizations you can do in spark

Ans.

Optimizations in Spark include partitioning, caching, broadcast variables, and using appropriate data structures.

  • Partitioning data based on key can improve performance by reducing data shuffling

  • Caching frequently accessed data in memory can avoid recomputation

  • Using broadcast variables can reduce data transfer between nodes

  • Choosing appropriate data structures like DataFrames or Datasets can optimize query execution

  • Using column pruning and predicate pushdown can reduce the amou...read more

Add your answer
Are these interview questions helpful?

Q7. In hadoop what happens if a name node fails

Ans.

If a name node fails in Hadoop, the entire Hadoop cluster becomes unavailable.

  • The name node is responsible for managing the metadata of the Hadoop file system.

  • If the name node fails, the cluster cannot access or process any data.

  • To handle name node failures, Hadoop provides mechanisms like high availability and automatic failover.

  • In high availability mode, there are multiple name nodes in the cluster, and if one fails, another takes over.

  • Automatic failover ensures uninterrupt...read more

Add your answer

Q8. Elaborate concepts of Object Oriented Programming in Python.

Ans.

Object Oriented Programming in Python focuses on creating classes and objects to organize code and data.

  • Python supports classes, objects, inheritance, polymorphism, and encapsulation.

  • Classes are blueprints for creating objects, which are instances of classes.

  • Inheritance allows a class to inherit attributes and methods from another class.

  • Polymorphism enables objects to be treated as instances of their parent class.

  • Encapsulation restricts access to certain components of an obje...read more

Add your answer
Share interview questions and help millions of jobseekers 🌟

Q9. How to connect S3 from Databricks

Ans.

To connect S3 from Databricks, you can use the AWS connector provided by Databricks.

  • Use the AWS connector provided by Databricks to connect to S3

  • Provide the necessary AWS credentials and S3 bucket details in the connector configuration

  • You can access S3 data using the file system API in Databricks

Add your answer

Q10. Word count by spark,, falt map and map difference

Ans.

Word count by spark, flatMap, and map difference

  • Spark is a distributed computing framework for big data processing

  • flatMap is used to split each input string into words

  • map is used to transform each word into a key-value pair for counting

  • The difference lies in how the data is processed and transformed

Add your answer

Q11. spark explain cluster mode vs client mode

Ans.

Cluster mode runs the Spark driver on one of the worker nodes, while client mode runs the driver on the client machine.

  • In cluster mode, the driver runs on one of the worker nodes in the cluster, while in client mode, the driver runs on the machine where the Spark application is submitted.

  • Cluster mode is suitable for production environments where fault tolerance and scalability are important, while client mode is more commonly used for development and testing purposes.

  • In clust...read more

Add your answer

Q12. What do you mean by CDC

Ans.

CDC stands for Change Data Capture, a process of identifying and capturing changes made to data in a database.

  • CDC is used to track changes in data over time, allowing for real-time data integration and analysis.

  • It captures inserts, updates, and deletes made to data, providing a historical record of changes.

  • CDC is commonly used in data warehousing, data replication, and data integration processes.

  • Examples of CDC tools include Oracle GoldenGate, Attunity Replicate, and Informat...read more

Add your answer

Q13. Give an example of decorators in Python?

Ans.

Decorators in Python are functions that modify the behavior of other functions.

  • Decorators are defined using the @decorator_name syntax before the function definition.

  • They can be used for logging, timing, authentication, etc.

  • Example: @staticmethod decorator in Python makes a method static.

Add your answer

Q14. Spark optimization used in our project

Ans.

Spark optimization techniques used in project

  • Partitioning data to optimize parallel processing

  • Caching frequently accessed data to reduce computation time

  • Using broadcast variables for efficient data sharing across nodes

  • Optimizing shuffle operations to minimize data movement

  • Tuning memory and CPU settings for better performance

Add your answer

Q15. Difference between coalesce and repartition?

Ans.

Coalesce reduces the number of partitions in a DataFrame, while repartition increases the number of partitions.

  • Coalesce is used to reduce the number of partitions in a DataFrame without shuffling data

  • Repartition is used to increase the number of partitions in a DataFrame and can involve shuffling data

  • Coalesce is more efficient for reducing partitions when no data movement is required

  • Repartition is typically used for evenly distributing data across a larger number of partition...read more

Add your answer

Q16. What is XCom in Airflow

Ans.

XCom in Airflow is a way for tasks to exchange messages or small amounts of data.

  • XCom allows tasks to communicate with each other by passing small pieces of data

  • It can be used to share information between tasks in a DAG

  • XCom can be used to pass information like task status, results, or any other data

Add your answer

Q17. Different types of joins in SQL

Ans.

Different types of joins in SQL include inner join, left join, right join, and full outer join.

  • Inner join: Returns rows when there is a match in both tables.

  • Left join: Returns all rows from the left table and the matched rows from the right table.

  • Right join: Returns all rows from the right table and the matched rows from the left table.

  • Full outer join: Returns rows when there is a match in either table.

Add your answer

Q18. Different types of Joins in spark

Ans.

Different types of joins in Spark include inner join, outer join, left join, right join, and full join.

  • Inner join: Returns only the rows that have matching values in both datasets.

  • Outer join: Returns all rows when there is a match in either dataset.

  • Left join: Returns all rows from the left dataset and the matched rows from the right dataset.

  • Right join: Returns all rows from the right dataset and the matched rows from the left dataset.

  • Full join: Returns all rows when there is ...read more

Add your answer

Q19. explain the architecture of delta lake

Ans.

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.

  • Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.

  • It stores data in Parquet format and uses Apache Spark for processing.

  • Delta Lake ensures data reliability and data quality by providing schema enforcement and data versioning.

  • It supports time travel queries, allowing users to access previous versions of...read more

Add your answer

Q20. Illustrate exception handling in python.

Ans.

Exception handling in Python allows for graceful handling of errors and preventing program crashes.

  • Use try-except blocks to catch and handle exceptions.

  • Multiple except blocks can be used to handle different types of exceptions.

  • Finally block can be used to execute code regardless of whether an exception was raised or not.

  • Custom exceptions can be defined by creating a new class that inherits from the built-in Exception class.

Add your answer

Q21. Pationong and bucket difference

Ans.

Partitioning is dividing data into smaller chunks for better organization and performance, while bucketing is grouping data based on a specific criteria.

  • Partitioning is dividing data into smaller subsets based on a column or key.

  • Bucketing is grouping data based on a specific number of buckets or ranges.

  • Partitioning is commonly used in distributed systems for better data organization and query performance.

  • Bucketing is often used for data skew handling and optimizing query perf...read more

Add your answer

Q22. Optimizations used in present project

Ans.

Various optimizations such as indexing, caching, and parallel processing were used in the project.

  • Implemented indexing on frequently queried columns to improve query performance

  • Utilized caching mechanisms to store frequently accessed data and reduce database load

  • Implemented parallel processing to speed up data processing tasks

  • Optimized algorithms and data structures for efficient data retrieval and manipulation

Add your answer

Q23. what is list in python

Ans.

A list in Python is a collection of items that are ordered and mutable.

  • Lists are created using square brackets []

  • Items in a list can be of different data types

  • Lists can be modified by adding, removing, or changing items

  • Example: my_list = [1, 'apple', True]

Add your answer

Q24. Tuning operations in databricks

Ans.

Tuning operations in Databricks involves optimizing performance and efficiency of data processing tasks.

  • Use cluster configuration settings to allocate resources efficiently

  • Optimize code by minimizing data shuffling and reducing unnecessary operations

  • Leverage Databricks Auto Optimize to automatically tune performance

  • Monitor job performance using Databricks Runtime Metrics and Spark UI

Add your answer

Q25. Flat map and map difference

Ans.

Flat map is used to flatten nested arrays while map is used to transform each element in an array.

  • Flat map is used to flatten nested arrays into a single array.

  • Map is used to transform each element in an array using a function.

  • Flat map is commonly used in functional programming languages like JavaScript and Scala.

  • Map is a higher-order function that applies a given function to each element in an array.

Add your answer

Q26. Write the binary sort program

Ans.

Binary sort program sorts an array by repeatedly dividing it into two halves and comparing elements.

  • Divide the array into two halves

  • Compare the middle element with the target value

  • Repeat the process on the sub-array where the target value may be located

Add your answer

Q27. Spark optimization techniques

Ans.

Spark optimization techniques involve partitioning, caching, and tuning configurations.

  • Partitioning data to distribute workload evenly

  • Caching frequently accessed data to avoid recomputation

  • Tuning configurations like memory allocation and parallelism

  • Using broadcast joins for small tables

  • Avoiding shuffling operations whenever possible

Add your answer

Q28. spark optimization techniques

Ans.

Optimization techniques in Spark improve performance and efficiency of data processing.

  • Partitioning data to distribute workload evenly

  • Caching frequently accessed data in memory

  • Using broadcast variables for small lookup tables

  • Avoiding shuffling operations whenever possible

Add your answer

More about working at Cognizant

Top Rated Mega Company - 2024
Top Rated IT/ITES Company - 2024
HQ - Teaneck. New Jersey., United States (USA)
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at Street Surge Technologies

based on 33 interviews
2 Interview rounds
Technical Round - 1
Technical Round - 2
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Data Engineer Interview Questions from Similar Companies

4.0
 • 29 Interview Questions
3.4
 • 18 Interview Questions
3.5
 • 16 Interview Questions
3.5
 • 11 Interview Questions
4.0
 • 11 Interview Questions
3.4
 • 10 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter