Fractal Analytics
100+ Welcord Components Interview Questions and Answers
Given two strings, 'S' and 'T' with lengths 'M' and 'N', find the length of the 'Longest Common Subsequence'.
For a string 'str'(per se) of length K, the subsequences are the strings c...read more
You are working as a cab driver. Your car moves in a straight line and moves toward the forward direction only. Initially, you have āCā empty seats for the passengers.
Now, you are given āNā number o...read more
Nth term of Fibonacci series F(n), where F(n) is a function, is calculated using the following formula -
F(n) = F(n-1) + F(n-2), Where, F(1) = F(2) = 1
Provided N you have to find out the ...read more
In one box there are 12 red and 12 green balls and in another box there are 24 res and 24 green balls.
You have two balls choose from each of the box with replacement such that they have the s...read more
Q5. What is truth? Like the one you have been taught or the one you learn yourself like your parents teach you to not cut nails at night etc ot go to the temple?
Truth is subjective and can be influenced by personal experiences and cultural beliefs.
Truth is not always objective or universal
It can be shaped by personal experiences and cultural beliefs
What is considered true in one culture may not be true in another
Truth can also change over time as new information is discovered
For example, the belief that the earth was flat was once considered true, but is now known to be false
Q6. What is adstock, decay, Due tos Contributions? How do you evaluate the model? What is seasonality and the formula for seasonality? Do seasonality have any contribution?
Explanation of adstock, decay, Due tos Contributions, seasonality and its formula.
Adstock is the measure of the lasting impact of advertising on consumer behavior.
Decay refers to the reduction in the effectiveness of advertising over time.
Due tos Contributions is the attribution of sales to different marketing channels.
Seasonality is the pattern of sales or other metrics that repeat over a fixed period of time.
The formula for seasonality is (Value in Period / Average Value of...read more
Q7. In a word count spark program which command will run on driver and which will run on executor
Commands that run on driver and executor in a word count Spark program.
The command to read the input file and create RDD will run on driver.
The command to split the lines and count the words will run on executor.
The command to aggregate the word counts and write the output will run on driver.
Driver sends tasks to executors and coordinates the overall job.
Executor processes the tasks assigned by the driver.
What is an immutable object and why is it useful
Acid properties in dbms
Explain atomicity and what is d in acid.
Q9. What are the key features and functionalities of Snowflake?
Snowflake is a cloud-based data warehousing platform known for its scalability, performance, and ease of use.
Snowflake uses a unique architecture called multi-cluster, which separates storage and compute resources for better scalability and performance.
It supports both structured and semi-structured data, allowing users to work with various data types.
Snowflake offers features like automatic scaling, data sharing, and built-in support for SQL queries.
It provides a web interfa...read more
Q10. Okay. How many people attended StanChart Mumbai Marathon?
The number of attendees at StanChart Mumbai Marathon is not available.
Data on the number of attendees is not available.
The organizers have not released any official figures.
It is unclear how many people participated in the marathon.
Top 5 best things that happened during covid lockdown for you.
Byjus bussiness model
Why is it better to have online coaching classes
Q12. 1. Describe one of your projects in detail. 2. Explain Random Forest and other ML models 3. Statistics
Developed a predictive model for customer churn using Random Forest algorithm.
Used Python and scikit-learn library for model development
Performed data cleaning, feature engineering, and exploratory data analysis
Tuned hyperparameters using GridSearchCV and evaluated model performance using cross-validation
Random Forest is an ensemble learning method that builds multiple decision trees and combines their predictions
Other ML models include logistic regression, support vector mac...read more
Q13. What are the important documents to be submitted during the RFP.
The important documents to be submitted during the RFP include proposal, pricing information, technical specifications, and references.
Proposal: A detailed document outlining the solution being offered, including the approach, methodology, and deliverables.
Pricing Information: A breakdown of the costs associated with the proposed solution, including any licensing fees, implementation costs, and ongoing maintenance fees.
Technical Specifications: Detailed information about the ...read more
Q14. How many tube lights are there in the city of Mumbai
It is not possible to accurately determine the number of tube lights in the city of Mumbai.
The number of tube lights in a city is not publicly available information.
The city of Mumbai has a large population and a vast number of buildings, making it impossible to count all the tube lights.
The number of tube lights can vary greatly depending on factors such as residential, commercial, and industrial areas.
Even if we consider an average number of tube lights per household or bui...read more
Q15. The puzzle of 2 cans of 3Litre and 5 litre used to measure other values (very common one)
The puzzle of 2 cans of 3Litre and 5 litre used to measure other values.
Fill 3L can and pour into 5L can, repeat until 5L can is full to get 2L
Empty 5L can and pour 2L from 3L can into 5L can
Fill 3L can and pour into 5L can until full, leaving 1L in 3L can
Total of 4L can be measured using these two cans
Q16. How can you derive (%share) in power bi
To derive % share in Power BI, use the 'Group By' function and create a measure using the 'Divide' function.
Use the 'Group By' function to group the data by the desired category
Create a measure using the 'Divide' function to calculate the percentage share
Add the measure to a visual to display the % share
Example: = DIVIDE(SUM(Sales[Revenue]), CALCULATE(SUM(Sales[Revenue]), ALL(Sales)))
Q17. CASE: There is a beach with uniformly distributed customers, you know that if you set up a stall there a competitor will appear. Where would you put your stall?
Q18. How can you find the common elements between two strings in python?
Finding common elements between two strings in Python.
Convert the strings into sets and use the intersection method to find common elements.
Iterate through each character in one string and check if it exists in the other string.
Use the difflib library to find the longest common substring between two strings.
Q19. How many mobile phones are sold each year in India
Approximately 150-200 million mobile phones are sold each year in India.
India is the second-largest smartphone market in the world after China.
The number of mobile phone users in India is expected to reach 1.25 billion by 2020.
The Indian smartphone market grew by 7% YoY in 2019.
Major players in the Indian smartphone market include Xiaomi, Samsung, and Vivo.
Q20. So why should we visit temples?
Visiting temples can provide spiritual and cultural experiences, as well as a sense of community and peace.
Temples offer a space for prayer and meditation
They can provide a sense of community and belonging
Visiting temples can offer cultural and historical insights
Many temples have beautiful architecture and artwork
Temples can provide a peaceful and calming environment
Some people believe that visiting temples can bring good luck or blessings
Q21. What metrics will you look at if you need to analyse how a retail manufacturer is performing?
Metrics to analyze retail manufacturer performance
Sales revenue
Profit margin
Inventory turnover
Customer satisfaction
Market share
Return on investment
Employee turnover rate
Q22. Joins in Sql, Modelling and visualization part in PowerBI
Answering about joins in SQL and modeling/visualization in PowerBI
Joins in SQL are used to combine data from two or more tables based on a related column
There are different types of joins such as inner join, left join, right join, and full outer join
PowerBI is a data visualization tool that allows users to create interactive reports and dashboards
Data modeling in PowerBI involves creating relationships between tables and defining measures and calculated columns
Visualization i...read more
Q23. Cumulative sum and rank functions in spark
Explanation of cumulative sum and rank functions in Spark
Cumulative sum function calculates the running total of a column
Rank function assigns a rank to each row based on the order of values in a column
Both functions can be used with window functions in Spark
Example: df.withColumn('cumulative_sum', F.sum('column').over(Window.orderBy('order_column').rowsBetween(Window.unboundedPreceding, Window.currentRow)))
Example: df.withColumn('rank', F.rank().over(Window.orderBy('column')...read more
Q24. Have you worked on Data Analytics RFPs in the past?
Yes, I have worked on several Data Analytics RFPs in the past.
I have experience in analyzing large datasets and providing insights to clients
I have worked on RFPs for clients in various industries such as finance, healthcare, and retail
I have collaborated with cross-functional teams to develop proposals that meet client requirements
Q25. Explain Transformers how different from previous RNN, LSTM etc.
Transformers are a type of neural network architecture that utilizes self-attention mechanisms to process sequential data.
Transformers use self-attention mechanisms to weigh the importance of different input elements, allowing for parallel processing of sequences.
Unlike RNNs and LSTMs, Transformers do not rely on sequential processing, making them more efficient for long-range dependencies.
Transformers have been shown to outperform traditional RNNs and LSTMs in tasks such as ...read more
Q26. Slowly change data handling in spark
Slowly changing data handling in Spark involves updating data over time.
Slowly changing dimensions (SCD) are used to track changes in data over time.
SCD Type 1 updates the data in place, overwriting the old values.
SCD Type 2 creates a new record for each change, with a start and end date.
SCD Type 3 adds a new column to the existing record to track changes.
Spark provides functions like `from_unixtime` and `unix_timestamp` to handle timestamps.
Q27. What is the difference between canvas app and model driven apps
Canvas apps allow for more customization and flexibility in design, while model-driven apps are more structured and data-driven.
Canvas apps are more visually appealing and customizable, allowing users to drag and drop elements to create the app interface.
Model-driven apps are more structured and data-driven, with a focus on displaying and manipulating data from a data source.
Canvas apps are better suited for scenarios where the user interface design is a priority, while model...read more
Q28. What is one learning from it?
One learning from what?
Please provide context or specify what 'it' refers to
Without context, it is impossible to provide a meaningful answer
Q29. Why do you prefer Azure cloud solution as recommendations for Data Engineering pipelines? Explain data pipelines scenario you managed in the project?
I prefer Azure cloud solution for Data Engineering pipelines due to its scalability, reliability, and integration with other Microsoft services.
Azure provides a wide range of tools and services specifically designed for data engineering tasks, such as Azure Data Factory, Azure Databricks, and Azure HDInsight.
Azure offers seamless integration with other Microsoft services like Power BI, SQL Server, and Azure Machine Learning, making it easier to build end-to-end data pipelines...read more
Q30. What are all the tools and used in whole life cycle of MLops and where you involved in ML engineering as well?
Q31. How much you know about fractal
Fractals are complex geometric shapes that can be split into parts, each of which is a reduced-scale copy of the whole.
Fractals exhibit self-similarity, meaning they look similar at any scale or magnification.
Examples of fractals include the Mandelbrot set, Koch snowflake, and Sierpinski triangle.
Fractals are used in various fields such as mathematics, computer graphics, and art.
Q32. Difference between Spark and MapReduce. Spark joins like broadcast and sort merge
Spark is faster than MapReduce due to in-memory processing and DAG execution.
Spark uses DAG (Directed Acyclic Graph) execution while MapReduce uses batch processing.
Spark performs in-memory processing while MapReduce writes to disk after each operation.
Spark has a more flexible programming model with support for multiple languages.
Spark has built-in libraries for machine learning, graph processing, and stream processing.
MapReduce is better suited for batch processing of large...read more
Q33. How is the RFP process setup in your current org.
The RFP process in my current org involves a cross-functional team approach.
The RFP is received by the sales team and then assigned to a cross-functional team.
The team includes representatives from sales, product, engineering, legal, and finance.
The team reviews the RFP and determines if the company can meet the requirements.
If the decision is made to respond, the team works together to create a proposal.
The proposal is reviewed and approved by senior management before submis...read more
Q34. What are different types of Attention?
Different types of Attention include self-attention, global attention, and local attention.
Self-attention focuses on relationships within the input sequence itself.
Global attention considers the entire input sequence when making predictions.
Local attention only attends to a subset of the input sequence at a time.
Examples include Transformer's self-attention mechanism, Bahdanau attention, and Luong attention.
Q35. What was the challenge in end-to-end product delivery and implementation of solution roadmap?
The challenge in end-to-end product delivery and implementation of solution roadmap involved coordinating multiple teams, managing dependencies, and ensuring alignment with business goals.
Coordinating cross-functional teams to ensure timely delivery of each component of the product
Managing dependencies between different teams and components to avoid delays
Ensuring alignment of the solution roadmap with the overall business goals and objectives
Handling unexpected challenges an...read more
Q36. What are the evidences?
The evidences refer to the proof or supporting facts that validate a claim or argument.
Evidences can be in the form of data, statistics, research studies, expert opinions, eyewitness accounts, etc.
For example, in a court case, evidences can include DNA samples, fingerprints, and witness testimonies.
In scientific research, evidences can include experimental data, peer-reviewed studies, and expert analysis.
In journalism, evidences can include interviews, documents, and photogra...read more
Q37. write SQL queries for given scenario
Writing SQL queries for a given scenario
Use SELECT statement to retrieve data from tables
Use WHERE clause to filter data based on specific conditions
Use JOIN clause to combine data from multiple tables
Q38. Explain about any of the FMCG MMM model you have done
I have implemented the FMCG MMM model for a leading consumer goods company to analyze the impact of marketing activities on sales.
Used historical sales data, marketing spend, and external factors to build the model
Identified key drivers of sales performance and optimized marketing strategies
Evaluated the effectiveness of different marketing channels and campaigns
Provided actionable insights to improve ROI and drive revenue growth
Q39. sum and sumx in power bi
Sum and SumX are DAX functions used in Power BI to calculate the sum of values in a column or table.
Sum calculates the sum of values in a column or table.
SumX calculates the sum of an expression evaluated for each row in a table.
Both functions can be used in measures and calculated columns.
Example: Sum(Sales[Revenue]) calculates the total revenue for the Sales table.
Example: SumX(Orders, [Quantity]*[Price]) calculates the total sales for each order in the Orders table.
Q40. How did you implement end to end MLOps (Dev and Deployment)
Q41. What genre of books?
I enjoy reading a variety of genres, including mystery, science fiction, and historical fiction.
Mystery
Science fiction
Historical fiction
Q42. System design tradeoffs and basic principles
System design tradeoffs involve balancing various factors to optimize performance and efficiency.
Consider scalability, reliability, latency, and cost when designing systems
Tradeoffs may involve sacrificing one aspect for the benefit of another
Examples include choosing between consistency and availability in distributed systems
Q43. What do you know what sales cycle.
Sales cycle refers to the process of selling a product or service from initial contact with a potential customer to closing the deal.
Sales cycle involves identifying potential customers
Qualifying leads to determine if they are a good fit for the product or service
Presenting the product or service to the potential customer
Handling objections and negotiating terms
Closing the deal and following up with the customer
Sales cycle can vary in length depending on the complexity of the...read more
Q44. What are the scopes of GA3
GA3 is a plant hormone that regulates various growth processes in plants.
Regulates seed germination
Promotes stem elongation
Influences flowering and fruit development
Used in agriculture to increase crop yield
Can be applied externally to plants to induce growth responses
Q45. Why fractal?
Fractals offer a unique way to understand complex patterns and structures in nature and mathematics.
Fractals can be found in natural phenomena such as snowflakes, coastlines, and ferns.
They have practical applications in computer graphics, data compression, and cryptography.
Fractal geometry provides a new perspective on understanding the behavior of complex systems.
Fractals have been used to model the growth of tumors and the spread of diseases in medical research.
The study o...read more
Q46. Why Informatica cloud is better than Azure Cloud solution?
Informatica Cloud offers more comprehensive data integration capabilities compared to Azure Cloud.
Informatica Cloud provides a wide range of data integration tools and services for various data sources and formats.
Informatica Cloud offers advanced data quality and data governance features that are not available in Azure Cloud.
Informatica Cloud has a strong focus on data security and compliance, with built-in encryption and access controls.
Informatica Cloud has a user-friendly...read more
Q47. What kind of data modelling you worked with GenAI project?
I have experience working with data modelling in the GenAI project to optimize algorithms and improve performance.
Utilized various data modelling techniques to analyze and interpret data
Developed predictive models to enhance decision-making processes
Collaborated with data scientists to refine and validate models
Implemented machine learning algorithms to improve accuracy and efficiency
Q48. What are window functions in SQL
Window functions in SQL are used to perform calculations across a set of table rows related to the current row.
Window functions are used to calculate values based on a set of rows related to the current row.
They allow you to perform calculations without grouping the rows into a single output row.
Examples of window functions include ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE().
Q49. Reason to switch
Seeking new challenges and opportunities for growth
Desire to work on more diverse projects
Opportunity for career advancement
Seeking a better work-life balance
Interested in learning new skills or technologies
Q50. Difference between Data scientist, ML and AI
Data scientists analyze data to gain insights, machine learning (ML) involves algorithms that improve automatically through experience, and artificial intelligence (AI) refers to machines mimicking human cognitive functions.
Data scientists analyze large amounts of data to uncover patterns and insights.
Machine learning involves developing algorithms that improve automatically through experience.
Artificial intelligence refers to machines performing tasks that typically require ...read more
Q51. First Letter of firstname in a column in any data manipulation package supported on databricks
The function to extract the first letter of the firstname in a column varies based on the data manipulation package used.
Use SUBSTR function in SQL
Use str_extract function in R
Use substring function in Python
Q52. What are the underlying assumptions of logistic regression?
Q53. How is seasonality calculated ?
Seasonality is calculated by analyzing historical data to identify recurring patterns or trends that occur at specific times of the year.
Identify historical data for a specific time period (e.g. monthly, quarterly)
Use statistical methods such as moving averages or regression analysis to analyze the data
Look for patterns or trends that repeat at the same time each year
Calculate the average or percentage change in data points during specific time periods
Q54. Difference between GPT and BERT model
GPT is a generative model while BERT is a transformer model for natural language processing.
GPT is a generative model that predicts the next word in a sentence based on previous words.
BERT is a transformer model that considers the context of a word by looking at the entire sentence.
GPT is unidirectional, while BERT is bidirectional.
GPT is better for text generation tasks, while BERT is better for understanding the context of words in a sentence.
Q55. Documents for foreign remittance
Documents required for foreign remittance include invoices, purchase orders, and wire transfer instructions.
Invoices for goods or services being paid for
Purchase orders to verify the transaction
Wire transfer instructions to ensure proper routing of funds
Proof of identification for both parties involved
Any necessary customs or tax documents
Documentation of any applicable fees or charges
Q56. Tell me accounts payable process?
Accounts payable process involves receiving, verifying, and processing invoices for payment.
Receive invoices from vendors
Verify the accuracy of invoices against purchase orders and receipts
Code and enter invoices into accounting system
Obtain approval for payment
Schedule payments and issue checks or electronic payments
Reconcile vendor statements
Maintain accurate records and files
Q57. How will you resolve conflicts ?
I will address conflicts by actively listening, seeking common ground, and collaborating on solutions.
Actively listen to all parties involved to understand their perspectives
Seek common ground and areas of agreement to build consensus
Collaborate with the team to find mutually beneficial solutions
Use conflict resolution techniques such as mediation or compromise
Focus on the issue at hand rather than personal differences
Q58. What are the types of transformation?
Types of transformations include filtering, sorting, aggregating, joining, and pivoting.
Filtering: Selecting a subset of rows based on certain criteria.
Sorting: Arranging rows in a specific order based on one or more columns.
Aggregating: Combining multiple rows into a single result, such as summing or averaging values.
Joining: Combining data from multiple sources based on a common key.
Pivoting: Restructuring data from rows to columns or vice versa.
Q59. Difference between group by and distinct
Group by is used to group rows that have the same values into summary rows, while distinct is used to remove duplicate rows from a result set.
Group by is used with aggregate functions like COUNT, SUM, AVG, etc.
Distinct is used to retrieve unique values from a column or set of columns.
Group by is used to perform operations on groups of rows, while distinct is used to filter out duplicate rows.
Group by is used in conjunction with SELECT statement, while distinct is used as a ke...read more
Q60. What is a data layer
A data layer is a software component that separates the data access logic from the business logic in an application.
It acts as an intermediary between the database and the application's business logic
Helps in managing data access, storage, and retrieval
Improves scalability and maintainability of the application
Examples include ORM frameworks like Hibernate in Java or Entity Framework in .NET
Q61. Design a complete mlops pipeline with all the steps in it.
Designing a complete MLOps pipeline with all the necessary steps.
Data collection and preprocessing
Model training and evaluation
Model deployment
Monitoring and feedback loop
Automated retraining
Version control and collaboration
Q62. What is logistic regression and its formula?
Q63. SQL query on 2nd largest
Use a subquery to find the 2nd largest value in a SQL table.
Use a subquery to find the maximum value in the table
Exclude the maximum value from the results to find the 2nd largest value
Q64. How do you manage client obligations?
I manage client obligations by setting clear expectations, communicating regularly, and prioritizing tasks based on deadlines and importance.
Set clear expectations with clients regarding deliverables and timelines
Communicate regularly to provide updates on progress and address any concerns
Prioritize tasks based on deadlines and importance to ensure all client obligations are met
Proactively identify potential issues and address them before they impact client obligations
Q65. Handling Jsons in python
Python provides built-in libraries like json to handle JSON data easily.
Use json module to load and parse JSON data
Use json.dumps() to convert Python objects into JSON strings
Use json.loads() to convert JSON strings into Python objects
Q66. why Fractal, etc
Fractals are used in data science for analyzing complex and self-similar patterns.
Fractals are useful for analyzing data with repeating patterns at different scales.
They are used in image compression, signal processing, and financial market analysis.
Fractal analysis can help in understanding the underlying structure of data and making predictions.
Q67. Difference between linearity and logistic assumptions are?
Q68. Explain your projects in detail from data preprocessing to deployment
I have worked on various projects involving data preprocessing, model building, and deployment.
I start by cleaning and preprocessing the raw data to remove missing values and outliers.
I then perform feature engineering to create new features and select the most relevant ones for model building.
Next, I train machine learning models using algorithms like Random Forest, XGBoost, and Neural Networks.
I evaluate the models using metrics like accuracy, precision, recall, and F1 scor...read more
Q69. What is SCD and there types?
SCD stands for Slowly Changing Dimension. There are three types: Type 1, Type 2, and Type 3.
SCD is used in data warehousing to track changes in dimension data over time.
Type 1 SCD overwrites old data with new data, losing historical information.
Type 2 SCD creates new records for each change, preserving historical data.
Type 3 SCD keeps both old and new data in the same record, with separate columns for each version.
Q70. What is Spark context
Spark context is the main entry point for Spark functionality and represents the connection to a Spark cluster.
Main entry point for Spark functionality
Represents connection to a Spark cluster
Used to create RDDs, broadcast variables, and accumulators
Q71. ML algorithms in detail
ML algorithms are tools used to analyze data, make predictions, and learn patterns from data.
ML algorithms can be categorized into supervised, unsupervised, and reinforcement learning.
Examples of supervised learning algorithms include linear regression, decision trees, and support vector machines.
Examples of unsupervised learning algorithms include k-means clustering, hierarchical clustering, and principal component analysis.
Reinforcement learning algorithms involve an agent ...read more
Q72. What are the SQL commands can you explain it
SQL commands are used to interact with databases to perform various operations like querying, updating, and deleting data.
SELECT: Retrieves data from a database
INSERT INTO: Adds new records to a table
UPDATE: Modifies existing records in a table
DELETE: Removes records from a table
CREATE TABLE: Creates a new table in the database
ALTER TABLE: Modifies an existing table structure
DROP TABLE: Deletes a table from the database
Q73. Design a dashboard with pages and kpi
Design a dashboard with multiple pages and key performance indicators (KPIs)
Identify key metrics to track on the dashboard
Organize the dashboard into separate pages for different categories or departments
Use visualizations like charts, graphs, and tables to display KPIs
Include filters and interactive elements for user customization
Ensure the dashboard is user-friendly and easy to navigate
Consider the audience and their specific needs when designing the dashboard
Q74. Why is spark a lazy execution
Spark is lazy execution to optimize performance by delaying computation until necessary.
Spark delays execution until an action is called to optimize performance.
This allows Spark to optimize the execution plan and minimize unnecessary computations.
Lazy evaluation helps in reducing unnecessary data shuffling and processing.
Example: Transformations like map, filter, and reduce are not executed until an action like collect or saveAsTextFile is called.
Q75. Why Analytics?
Analytics helps in making data-driven decisions and improving business outcomes.
Analytics provides insights into customer behavior and preferences.
It helps in identifying trends and patterns in data.
Analytics can optimize business processes and improve efficiency.
It enables businesses to make informed decisions based on data.
Analytics can help in predicting future outcomes and trends.
Examples: Predictive maintenance in manufacturing, customer segmentation in retail, fraud det...read more
Q76. Explain Sorting algorithms
Sorting algorithms are methods used to arrange elements in a specific order.
Sorting algorithms are used to rearrange elements in a specific order, such as numerical or alphabetical.
Common sorting algorithms include Bubble Sort, Selection Sort, Insertion Sort, Merge Sort, Quick Sort, and Heap Sort.
Each sorting algorithm has its own time complexity and efficiency based on the size of the input data.
Sorting algorithms can be stable (maintains the relative order of equal elements...read more
Q77. Elaborate Cash flow statement.
Cash flow statement is a financial report that shows the inflow and outflow of cash in a business over a period of time.
It shows the sources of cash inflow and the uses of cash outflow.
It is divided into three sections: operating activities, investing activities, and financing activities.
Operating activities include cash transactions related to the day-to-day business operations.
Investing activities include cash transactions related to the purchase or sale of long-term assets...read more
Q78. Debugging of kubernetes deployment.
Debugging kubernetes deployment involves identifying and resolving issues in the deployment process.
Check the deployment logs for errors and warnings
Verify the configuration files for correctness
Use kubectl commands to inspect the deployment status
Check the health of the pods and containers
Use debugging tools like kubectl exec and logs to troubleshoot issues
Q79. What is inheritance
Inheritance is a concept in object-oriented programming where a class can inherit attributes and methods from another class.
Allows for code reusability by creating a new class based on an existing class
Derived class inherits properties and behaviors of the base class
Supports the concept of polymorphism and encapsulation
Example: Class 'Car' can inherit from class 'Vehicle' and inherit its attributes like 'color' and methods like 'drive()'
Q80. What is sparkconfig
SparkConfig is a configuration object used in Apache Spark to set various parameters for Spark applications.
SparkConfig is used to set properties like application name, master URL, and other Spark settings.
It is typically created using SparkConf class in Spark applications.
Example: val sparkConf = new SparkConf().setAppName("MyApp").setMaster("local")
Q81. What is overfitting and underfitting?
Overfitting and underfitting are common problems in machine learning where a model performs too well on training data but poorly on unseen data, or performs poorly on both training and unseen data due to oversimplification.
Overfitting occurs when a model learns the noise in the training data rather than the underlying pattern, leading to poor generalization on unseen data.
Underfitting happens when a model is too simple to capture the underlying structure of the data, resultin...read more
Q82. Explain Invoices and its process.
Invoices are bills sent by a vendor to a customer for goods or services provided.
Invoices include details such as the vendor's name and address, the customer's name and address, the date of the invoice, a description of the goods or services provided, and the amount due.
The process of invoicing involves creating and sending the invoice to the customer, tracking the payment status, and following up on any overdue payments.
Accounts payable managers are responsible for ensuring ...read more
Q83. What is linked service
A linked service is a connection to an external data source or destination in Azure Data Factory.
Linked services define the connection information needed to connect to external data sources or destinations.
They can be used in pipelines to read from or write to the linked data source.
Examples of linked services include Azure Blob Storage, Azure SQL Database, and Salesforce.
Linked services can store connection strings, authentication details, and other configuration settings.
Q84. Main pillars of Project management
Main pillars of Project management include scope, time, cost, quality, communication, risk, and procurement.
Scope management involves defining and controlling what is included in the project.
Time management focuses on creating and maintaining a project schedule.
Cost management involves budgeting and controlling project costs.
Quality management ensures that the project meets the required standards.
Communication management involves effective communication with stakeholders.
Risk...read more
Q85. Spark optimization techniques
Spark optimization techniques improve performance and efficiency of Spark applications.
Partitioning data to reduce shuffling
Caching frequently used data
Using broadcast variables for small data
Tuning memory allocation and garbage collection
Using efficient data formats like Parquet
Avoiding unnecessary data shuffling
Using appropriate hardware configurations
Optimizing SQL queries with appropriate indexing and partitioning
Q86. Hive- partition vs bucketing
Hive partitioning is dividing data into smaller, manageable parts while bucketing is dividing data into equal parts based on a hash function.
Partitioning is useful for filtering data based on a specific column
Bucketing is useful for evenly distributing data for faster querying
Partitioning can be done on multiple columns while bucketing is done on a single column
Partitioning creates separate directories for each partition while bucketing creates separate files for each bucket
Q87. Hive optimization techniques
Hive optimization techniques improve query performance and reduce execution time.
Partitioning tables to reduce data scanned
Using bucketing to group data for faster querying
Using vectorization to process data in batches
Using indexing to speed up lookups
Using compression to reduce storage and I/O costs
Q88. Preferred mode of working
I prefer working in a collaborative environment that encourages open communication and feedback.
I enjoy working in a team where everyone's ideas are valued and considered.
I appreciate a work culture that fosters open communication and encourages feedback.
I am comfortable working independently as well, but I believe that collaboration leads to better results.
I am adaptable and can work in different modes depending on the situation, but my preferred mode is a collaborative one.
Q89. Performance enhancemnets in pyspark
Performance enhancements in PySpark involve optimizing code, tuning configurations, and utilizing efficient data structures.
Use partitioning to distribute data evenly across nodes
Cache intermediate results to avoid recomputation
Optimize joins by broadcasting smaller tables
Use efficient data formats like Parquet or ORC
Tune Spark configurations for memory and parallelism
Q90. Explain prepaid expenses
Prepaid expenses are payments made in advance for goods or services that will be received in the future.
Prepaid expenses are recorded as assets on the balance sheet
They are gradually expensed over time as the goods or services are received
Examples include prepaid rent, insurance premiums, and subscriptions
Prepaid expenses are commonly used in accounting to ensure accurate financial reporting
Q91. What is dataset
A dataset is a collection of data that is organized in a structured format for easy access and analysis.
A dataset can consist of tables, files, or other types of data sources.
It is used for storing and managing data for analysis and reporting purposes.
Examples of datasets include customer information, sales data, and sensor readings.
Datasets can be structured, semi-structured, or unstructured depending on the type of data they contain.
Q92. What do you know about ESG
Q93. How NN works like CNN?
Q94. What is partition pruning
Partition pruning is a query optimization technique that reduces the amount of data scanned by excluding irrelevant partitions.
Partition pruning is used in partitioned tables to skip scanning partitions that do not contain data relevant to the query.
It helps improve query performance by reducing the amount of data that needs to be processed.
For example, if a query filters data based on a specific partition key, partition pruning will only scan the relevant partitions instead ...read more
Q95. Implementation of fetch api on sandbox
Using fetch API to make requests to a sandbox environment for testing purposes.
Use the fetch function to make HTTP requests to the sandbox URL
Handle the response using promises and the .then() method
Set the appropriate headers and request method for the API endpoint
Parse the response data using JSON methods if needed
Q96. What is namespace in python
Namespace in Python is a system to make sure that all the names in a program are unique and can be used without any conflict.
Namespaces are containers for mapping names to objects.
Python uses namespaces to avoid naming conflicts and to create a unique space for each variable, function, etc.
There are different types of namespaces in Python such as local, global, and built-in namespaces.
Q97. What is scopes in pytjon
Scopes in Python refer to the visibility of variables within a program.
Variables defined inside a function have local scope and are only accessible within that function.
Global variables can be accessed from any part of the program.
Nonlocal variables are used in nested functions to access variables from the outer function.
Q98. What is k means algorithm
K-means is a clustering algorithm that partitions data into k clusters based on similarity.
K-means is an unsupervised learning algorithm
It starts by randomly selecting k centroids
Data points are assigned to the nearest centroid
Centroids are recalculated based on the mean of the assigned data points
The process is repeated until convergence or a maximum number of iterations is reached
Q99. longest substring without repeating charachter
Q100. Transformations in databricks
Transformations in Databricks involve manipulating data using functions like map, filter, reduce, etc.
Transformations are operations that are applied to RDDs in Databricks
Common transformations include map, filter, reduce, flatMap, etc.
Transformations are lazy evaluated and create a new RDD
Example: map transformation to convert each element in an RDD to uppercase
Top HR Questions asked in Welcord Components
Interview Process at Welcord Components
Top Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month