i
Persistent Systems
Filter interviews by
Clear (1)
I applied via Naukri.com and was interviewed in Aug 2024. There were 2 interview rounds.
I am a Senior Data Engineer with experience in developing data pipelines and optimizing data storage for various projects.
Developed data pipelines using Apache Spark for real-time data processing
Optimized data storage using technologies like Hadoop and AWS S3
Worked on a project to analyze customer behavior and improve marketing strategies
My day-to-day job in the project involved designing and implementing data pipelines, optimizing data workflows, and collaborating with cross-functional teams.
Designing and implementing data pipelines to extract, transform, and load data from various sources
Optimizing data workflows to improve efficiency and performance
Collaborating with cross-functional teams including data scientists, analysts, and business stakeholde...
DAGs handle fault tolerance by rerunning failed tasks and maintaining task dependencies.
DAGs rerun failed tasks automatically to ensure completion.
DAGs maintain task dependencies to ensure proper sequencing.
DAGs can be configured to retry failed tasks a certain number of times before marking them as failed.
Shuffling is the process of redistributing data across partitions in a distributed computing environment.
Shuffling is necessary when data needs to be grouped or aggregated across different partitions.
It can be handled efficiently by minimizing the amount of data being shuffled and optimizing the partitioning strategy.
Techniques like partitioning, combiners, and reducers can help reduce the amount of shuffling in MapRed
Repartition increases or decreases the number of partitions in a DataFrame, while Coalesce only decreases the number of partitions.
Repartition can increase or decrease the number of partitions in a DataFrame, leading to a shuffle of data across the cluster.
Coalesce only decreases the number of partitions in a DataFrame without performing a full shuffle, making it more efficient than repartition.
Repartition is typically...
Incremental data is handled by identifying new data since the last update and merging it with existing data.
Identify new data since last update
Merge new data with existing data
Update data warehouse or database with incremental changes
SCD stands for Slowly Changing Dimension, a concept in data warehousing to track changes in data over time.
SCD is used to maintain historical data in a data warehouse.
There are three types of SCD - Type 1, Type 2, and Type 3.
Type 1 SCD overwrites old data with new data.
Type 2 SCD creates a new record for each change, preserving history.
Type 3 SCD maintains both old and new values in the same record.
SCD is important for...
Reverse a string using SQL and Python codes.
In SQL, use the REVERSE function to reverse a string.
In Python, use slicing with a step of -1 to reverse a string.
Use Spark and SQL to find the top 5 countries with the highest population.
Use Spark to load the data and perform data processing.
Use SQL queries to group by country and sum the population.
Order the results in descending order and limit to top 5.
Example: SELECT country, SUM(population) AS total_population FROM table_name GROUP BY country ORDER BY total_population DESC LIMIT 5
To find different records for different joins using two tables
Use the SQL query to perform different joins like INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN
Identify the key columns in both tables to join on
Select the columns from both tables and use WHERE clause to filter out the different records
A catalyst optimizer is a query optimization tool used in Apache Spark to improve performance by generating an optimal query plan.
Catalyst optimizer is a rule-based query optimization framework in Apache Spark.
It leverages rules to transform the logical query plan into a more optimized physical plan.
The optimizer applies various optimization techniques like predicate pushdown, constant folding, and join reordering.
By o...
Used query optimization techniques to improve performance in database queries.
Utilized indexing to speed up search queries.
Implemented query caching to reduce redundant database calls.
Optimized SQL queries by restructuring joins and subqueries.
Utilized database partitioning to improve query performance.
Used query profiling tools to identify and optimize slow queries.
Use the len() function to check the length of the data frame.
Use len() function to get the number of rows in the data frame.
If the length is 0, then the data frame is empty.
Example: if len(df) == 0: print('Data frame is empty')
Cores and worker nodes are decided based on the workload requirements and scalability needs of the data processing system.
Consider the size and complexity of the data being processed
Evaluate the processing speed and memory requirements of the tasks
Take into account the parallelism and concurrency needed for efficient data processing
Monitor the system performance and adjust cores and worker nodes as needed
Enforcing schema ensures that data conforms to a predefined structure and rules.
Ensures data integrity by validating incoming data against predefined schema
Helps in maintaining consistency and accuracy of data
Prevents data corruption and errors in data processing
Can lead to rejection of data that does not adhere to the schema
I applied via Naukri.com and was interviewed before Jun 2023. There were 3 interview rounds.
It’s just reasoning type questions.
SSIS stands for SQL Server Integration Services, a tool provided by Microsoft for data integration and workflow applications.
SSIS is a platform for building high-performance data integration and workflow solutions.
It allows you to create packages that move data from various sources to destinations.
SSIS includes a visual design interface for creating, monitoring, and managing data integration processes.
You can use SSIS ...
SSIS packages are used for ETL processes in SQL Server. Union combines datasets vertically, while merge combines them horizontally.
SSIS packages are used for Extract, Transform, Load (ETL) processes in SQL Server.
Union in SSIS combines datasets vertically, stacking rows on top of each other.
Merge in SSIS combines datasets horizontally, matching rows based on specified columns.
Union All in SSIS combines datasets vertica...
What people are saying about Persistent Systems
I applied via Referral
Data structures are a way to organize and store data efficiently.
Data structures are used to store and manipulate data in a structured manner.
They provide different ways to access and perform operations on the data.
Examples include arrays, linked lists, stacks, queues, trees, and graphs.
Developed a web-based project management tool for tracking tasks and deadlines
Used React.js for front-end development
Implemented RESTful APIs using Node.js and Express for back-end
Utilized MongoDB for database management
Incorporated user authentication and authorization features
Integrated real-time notifications using Socket.io
My name is John Smith.
Full name is John Smith
Common name in English-speaking countries
Easy to remember and pronounce
The question is unrelated to the medical field and is not a puzzle or riddle.
The question is asking about the meaning of the interviewer's name.
You can ask the interviewer about the origin or cultural significance of their name.
You can also mention that names often have different meanings in different languages or cultures.
My strengths include problem-solving skills, attention to detail, and strong communication abilities.
Strong problem-solving skills - I enjoy tackling complex issues and finding creative solutions.
Attention to detail - I am meticulous in my work and strive for accuracy in all tasks.
Strong communication abilities - I can effectively convey ideas and collaborate with team members.
posted on 16 Dec 2015
Yes, I can remove all the pcs from the lab and keep them in another lab right now.
Ensure all the necessary equipment and tools are available for the move
Coordinate with the lab staff to ensure a smooth transition
Label and document each PC for easy identification and setup in the new lab
Ensure proper packaging and handling to prevent any damage during the move
Fibonacci series is a sequence of numbers where each number is the sum of the two preceding ones.
The first two numbers of the series are always 0 and 1
The next number is the sum of the previous two numbers
The series goes on infinitely: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ...
Pseudo code: 1. Initialize variables a=0, b=1, c=0 2. Print a and b 3. Repeat steps 4-6 until desired number of terms 4. c=a+b 5. Pr
Algorithm for matrix multiplication
Create a result matrix with dimensions of the two input matrices
Iterate through each row and column of the result matrix
For each element in the result matrix, multiply corresponding row in first matrix with corresponding column in second matrix
Add the products obtained in the previous step to get the final value for the element
I worked on a project that involved sentiment analysis of customer reviews using Naive Bayes algorithm.
The project involved collecting customer reviews from various sources.
Preprocessing the data by removing stop words, stemming, and tokenizing.
Used Naive Bayes algorithm for sentiment analysis.
The algorithm was chosen because of its simplicity and effectiveness in text classification tasks.
The accuracy of the model was
Developed a web-based project management system for a construction company.
Used PHP and MySQL for backend development
Implemented user authentication and authorization
Designed a responsive UI using Bootstrap
Integrated Google Maps API for location tracking
Enabled file uploads and downloads for project documents
EXTC and IT are not mutually exclusive fields. My knowledge in EXTC complements my skills in IT.
My knowledge in EXTC gives me a strong foundation in electronics and communication, which are essential in the IT industry.
I have also gained programming skills through my coursework and projects in EXTC.
IT is a rapidly growing field with a lot of opportunities, and I believe my skills and knowledge make me a strong candidat...
Yes, I am open to relocation for the right opportunity.
I am willing to relocate for a position that aligns with my career goals
I am excited about the prospect of exploring a new city and culture
I am flexible and adaptable to new environments
I am familiar with several programming languages.
Java
Python
C++
JavaScript
SQL
A leader inspires and motivates while a manager plans and organizes.
Leaders focus on the big picture while managers focus on details
Leaders lead by example while managers delegate tasks
Leaders inspire and motivate while managers enforce rules and policies
Leaders are visionaries while managers are implementers
Examples of leaders: Steve Jobs, Martin Luther King Jr. Examples of managers: Tim Cook, COO of Apple
Yes, I am open to relocation for the right opportunity.
I am willing to relocate for the right job opportunity
I am open to exploring new places and cultures
I understand that relocation may come with challenges, but I am prepared to face them
I am excited about the prospect of starting fresh in a new location
I'm sorry, I don't have that information.
N/A
The TCS aptitude test was challenging but fair.
The test covered a wide range of topics including math, logic, and English.
The questions were designed to test problem-solving skills and critical thinking.
Time management was crucial as there were many questions to answer in a limited time.
Overall, the test was a good indicator of one's aptitude for software engineering.
I am a software engineer with 5 years of experience in developing web applications using Java, Spring Boot, and Angular.
5 years of experience in software development
Proficient in Java, Spring Boot, and Angular
Strong problem-solving skills
MindTree's focus on innovation, culture of learning, and diverse opportunities make it an ideal fit for my career growth.
Strong focus on innovation and cutting-edge technologies
Culture of continuous learning and development
Diverse opportunities for growth and career advancement
I am currently focusing on gaining practical experience in the software engineering field, but I may consider pursuing higher studies in the future.
Currently focusing on gaining practical experience in software engineering
Open to considering higher studies in the future
Higher studies could include a Master's degree in Computer Science or related field
Some of the top questions asked at the Persistent Systems Senior Data Engineer interview -
based on 2 interviews
Interview experience
based on 3 reviews
Rating in categories
Software Engineer
4.4k
salaries
| ₹0 L/yr - ₹0 L/yr |
Senior Software Engineer
3.9k
salaries
| ₹0 L/yr - ₹0 L/yr |
Lead Engineer
3.2k
salaries
| ₹0 L/yr - ₹0 L/yr |
Lead Software Engineer
3k
salaries
| ₹0 L/yr - ₹0 L/yr |
Project Lead
1.9k
salaries
| ₹0 L/yr - ₹0 L/yr |
Cognizant
TCS
IBM
Wipro