i
Synechron
Filter interviews by
Broadcast join is a type of join operation in big data processing where one of the datasets is small enough to be broadcasted to all nodes in the cluster.
Used when one dataset is small enough to fit in memory of all nodes
Reduces shuffling of data across the network
Improves performance by avoiding data transfer over the network
Repartition increases or decreases the number of partitions in a DataFrame, while coalesce only decreases the number of partitions.
Repartition involves shuffling data across the network, while coalesce tries to minimize shuffling by only creating new partitions if necessary.
Repartition is typically used when increasing the number of partitions for parallelism, while coalesce is used for decreasing partitions to optimiz...
Spark architecture is a distributed computing framework that consists of a cluster manager and worker nodes.
Consists of a cluster manager (e.g. Spark Standalone, YARN, Mesos) for resource management
Worker nodes execute tasks and store data in memory or disk
Supports various programming languages like Scala, Java, Python, and SQL
Uses Resilient Distributed Datasets (RDDs) for fault-tolerant distributed data processing
I applied via Naukri.com and was interviewed in Nov 2024. There were 2 interview rounds.
The Aptitude Test session accesses mathematical and logical reasoning abilities
Vlookup is a function in Excel used to search for a value in a table and return a corresponding value from another column.
Vlookup stands for 'Vertical Lookup'
It is commonly used in Excel to search for a value in the leftmost column of a table and return a value in the same row from a specified column
Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
Example: =VLOOKUP(A2, B2:D10, 3, FALSE) - searc...
My day in my previous organization involved analyzing large datasets, creating reports, and presenting findings to stakeholders.
Reviewing and cleaning large datasets to ensure accuracy
Creating visualizations and reports to communicate insights
Collaborating with team members to identify trends and patterns
Presenting findings to stakeholders in meetings or presentations
I possess strong technical skills in data analysis, including proficiency in programming languages, statistical analysis, and data visualization tools.
Proficient in programming languages such as Python, R, SQL
Skilled in statistical analysis and data modeling techniques
Experience with data visualization tools like Tableau, Power BI
Knowledge of machine learning algorithms and techniques
A Pivot Table is a data summarization tool used in spreadsheet programs to analyze, summarize, and present data in a tabular format.
Pivot tables allow users to reorganize and summarize selected columns and rows of data to obtain desired insights.
Users can easily group and filter data, perform calculations, and create visualizations using pivot tables.
Pivot tables are commonly used in Excel and other spreadsheet program...
To find the highest-paid employee in each department, we need to group employees by department and then select the employee with the highest salary in each group.
Group employees by department
Find the employee with the highest salary in each group
Retrieve the employee's name, salary, and department name
The aptitude test lasts 30 minutes and focuses on topics relevant to data engineering, including Spark, SQL, Azure, and PySpark.
The coding test is a one-hour examination on PySpark.
posted on 31 Dec 2024
Apache Spark architecture includes a cluster manager, worker nodes, and driver program.
Apache Spark architecture consists of a cluster manager, which allocates resources and schedules tasks.
Worker nodes execute tasks and store data in memory or disk.
Driver program coordinates tasks and communicates with the cluster manager.
Spark applications run as independent sets of processes on a cluster, coordinated by the SparkCon...
reduceBy is used to aggregate data based on key, while groupBy is used to group data based on key.
reduceBy is a transformation that combines the values of each key using an associative function and a neutral 'zero value'.
groupBy is a transformation that groups the data based on a key and returns a grouped data set.
reduceBy is more efficient for aggregating data as it reduces the data before shuffling, while groupBy shu...
RDD is a low-level abstraction representing a distributed collection of objects, while DataFrame is a higher-level abstraction representing a distributed collection of data organized into named columns.
RDD is more suitable for unstructured data and low-level transformations, while DataFrame is more suitable for structured data and high-level abstractions.
DataFrames provide optimizations like query optimization and code...
The different modes of execution in Apache Spark include local mode, standalone mode, YARN mode, and Mesos mode.
Local mode: Spark runs on a single machine with one executor.
Standalone mode: Spark runs on a cluster managed by a standalone cluster manager.
YARN mode: Spark runs on a Hadoop cluster using YARN as the resource manager.
Mesos mode: Spark runs on a Mesos cluster with Mesos as the resource manager.
posted on 20 Feb 2025
I applied via Job Fair and was interviewed in Nov 2024. There were 2 interview rounds.
I was interviewed in Dec 2024.
Data analysis is crucial for making informed decisions, identifying trends, and solving complex problems.
Helps in making informed decisions based on data-driven insights
Identifies trends and patterns that can lead to strategic business decisions
Aids in solving complex problems by analyzing data to find root causes
Improves efficiency and effectiveness of processes through data-driven optimizations
To become a data analyst, one should focus on acquiring relevant education, gaining experience with data analysis tools, and developing strong analytical and problem-solving skills.
Obtain a degree in a related field such as statistics, mathematics, computer science, or data science.
Gain experience with data analysis tools such as SQL, Python, R, and Excel.
Develop strong analytical and problem-solving skills by working ...
I applied via Naukri.com and was interviewed in Nov 2024. There was 1 interview round.
Enhanced optimization in AWS Glue improves job performance by automatically adjusting resources based on workload
Enhanced optimization in AWS Glue automatically adjusts resources like DPUs based on workload
It helps improve job performance by optimizing resource allocation
Users can enable enhanced optimization in AWS Glue job settings
Optimizing querying in Amazon Redshift involves proper table design, distribution keys, sort keys, and query optimization techniques.
Use appropriate distribution keys to evenly distribute data across nodes for parallel processing.
Utilize sort keys to physically order data on disk, reducing the need for sorting during queries.
Avoid using SELECT * and instead specify only the columns needed to reduce data transfer.
Use AN...
posted on 28 Sep 2024
I applied via Campus Placement and was interviewed in Aug 2024. There were 8 interview rounds.
Database Management system SQL and PlSQL
Database Base Management system SQL and PlSQL
Database Management system
Database Management system
Database Management system
Database Management system
Database Base Management system
based on 1 interview
Interview experience
Technical Lead
2.7k
salaries
| ₹11.5 L/yr - ₹40 L/yr |
Senior Associate
1.9k
salaries
| ₹8 L/yr - ₹27 L/yr |
Senior Software Engineer
1.5k
salaries
| ₹12.7 L/yr - ₹32 L/yr |
Senior Associate Technology L1
1k
salaries
| ₹9 L/yr - ₹29 L/yr |
Associate Specialist
802
salaries
| ₹12.1 L/yr - ₹40.5 L/yr |
TCS
Infosys
Wipro
HCLTech