Filter interviews by
I applied via Naukri.com and was interviewed before Apr 2023. There was 1 interview round.
Pyspark code execution flow involves transformations and actions, project architecture includes components like data sources and processing, narrow transformations operate on a single partition while wide transformations shuffle data, query for second highest salary involves using window functions.
Pyspark code execution flow involves defining transformations and actions on RDDs or DataFrames.
Project architecture typica...
I applied via Naukri.com and was interviewed in Feb 2023. There were 3 interview rounds.
Sql queries and python programs
Top trending discussions
I applied via Job Portal and was interviewed in Mar 2024. There were 3 interview rounds.
Spark cluster sizing depends on workload, data size, memory requirements, and processing speed.
Consider the size of the data being processed
Take into account the memory requirements of the Spark jobs
Factor in the processing speed needed for the workload
Scale the cluster based on the number of nodes and cores required
Monitor performance and adjust cluster size as needed
Implement a pipeline based on given conditions and data requirement
Databricks is a unified data analytics platform that includes components like Databricks Workspace, Databricks Runtime, and Databricks Delta.
Databricks Workspace: Collaborative environment for data science and engineering teams.
Databricks Runtime: Optimized Apache Spark cluster for data processing.
Databricks Delta: Unified data management system for data lakes.
To read a JSON file, use a programming language's built-in functions or libraries to parse the file and extract the data.
Use a programming language like Python, Java, or JavaScript to read the JSON file.
Import libraries like json in Python or json-simple in Java to parse the JSON data.
Use functions like json.load() in Python to load the JSON file and convert it into a dictionary or object.
Access the data in the JSON fi...
To find the second highest salary in SQL, use the MAX function with a subquery or the LIMIT clause.
Use the MAX function with a subquery to find the highest salary first, then use a WHERE clause to exclude it and find the second highest salary.
Alternatively, use the LIMIT clause to select the second highest salary directly.
Make sure to handle cases where there may be ties for the highest salary.
Spark cluster configuration involves setting up memory, cores, and other parameters for optimal performance.
Specify the number of executors and executor memory
Set the number of cores per executor
Adjust the driver memory based on the application requirements
Configure shuffle partitions for efficient data processing
Enable dynamic allocation for better resource utilization
Building a data pipeline involves extracting, transforming, and loading data from various sources to a destination for analysis.
Identify data sources and determine the data to be collected
Extract data from sources using tools like Apache NiFi or Apache Kafka
Transform data using tools like Apache Spark or Python scripts
Load data into a destination such as a data warehouse or database
Schedule and automate the pipeline fo...
I applied via Naukri.com and was interviewed in Oct 2024. There were 2 interview rounds.
Data bricks is a unified analytics platform that provides a collaborative environment for data scientists, engineers, and analysts.
Data bricks simplifies the process of building data pipelines and training machine learning models.
It allows for easy integration with various data sources and tools, such as Apache Spark and Delta Lake.
Data bricks provides a scalable and secure platform for processing big data and running ...
Optimizing code involves identifying bottlenecks, improving algorithms, using efficient data structures, and minimizing resource usage.
Identify and eliminate bottlenecks in the code by profiling and analyzing performance.
Improve algorithms by using more efficient techniques and data structures.
Use appropriate data structures like hash maps, sets, and arrays to optimize memory usage and access times.
Minimize resource us...
SQL window function is used to perform calculations across a set of table rows related to the current row.
Window functions operate on a set of rows related to the current row
They can be used to calculate running totals, moving averages, rank, etc.
Examples include ROW_NUMBER(), RANK(), SUM() OVER(), etc.
I applied via Campus Placement
1 good coding question and 33 mcqs
Create a database to store information about colleges, students, and professors.
Create tables for colleges, students, and professors
Include columns for relevant information such as name, ID, courses, etc.
Establish relationships between the tables using foreign keys
Use SQL queries to insert, update, and retrieve data
Consider normalization to avoid data redundancy
I applied via Naukri.com and was interviewed in Sep 2023. There was 1 interview round.
I have used activities such as Copy Data, Execute Pipeline, Lookup, and Data Flow in Data Factory.
Copy Data activity is used to copy data from a source to a destination.
Execute Pipeline activity is used to trigger another pipeline within the same or different Data Factory.
Lookup activity is used to retrieve data from a specified dataset or table.
Data Flow activity is used for data transformation and processing.
To execute a second notebook from the first notebook, you can use the %run magic command in Jupyter Notebook.
Use the %run magic command followed by the path to the second notebook in the first notebook.
Ensure that the second notebook is in the same directory or provide the full path to the notebook.
Make sure to save any changes in the second notebook before executing it from the first notebook.
Data lake storage is optimized for big data analytics and can store structured, semi-structured, and unstructured data. Blob storage is for unstructured data only.
Data lake storage is designed for big data analytics and can handle structured, semi-structured, and unstructured data
Blob storage is optimized for storing unstructured data like images, videos, documents, etc.
Data lake storage allows for complex queries and ...
I applied via Naukri.com and was interviewed before Dec 2023. There were 2 interview rounds.
based on 2 interviews
Interview experience
based on 2 reviews
Rating in categories
Business Analyst
162
salaries
| ₹0 L/yr - ₹0 L/yr |
Software Engineer
107
salaries
| ₹0 L/yr - ₹0 L/yr |
Software Developer
62
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Business Analyst
51
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Software Engineer
37
salaries
| ₹0 L/yr - ₹0 L/yr |
Fractal Analytics
Mu Sigma
Tiger Analytics
LatentView Analytics