Top 250 Data Management Interview Questions and Answers
Updated 12 Dec 2024
Q1. How to take backup?
Backups can be taken using various methods depending on the type of data and system. It is important to have a backup strategy in place.
Identify the data that needs to be backed up
Choose a backup method (full, incremental, differential)
Select a backup location (local, cloud, offsite)
Schedule backups regularly
Test backups to ensure data can be restored
Consider disaster recovery options
Q2. What are the checklist for data migration?
Checklist for data migration includes planning, data profiling, data cleansing, testing, and validation.
Plan the migration process
Profile the data to identify potential issues
Cleanse the data to ensure accuracy and consistency
Test the migration process thoroughly
Validate the migrated data to ensure completeness and accuracy
Q3. Difference between Data Management & Data Governance?
Data management involves the process of collecting, storing, processing, and maintaining data, while data governance is the overall management of the availability, usability, integrity, and security of data.
Data management focuses on the technical aspects of handling data, while data governance focuses on the policies, procedures, and standards for managing data
Data management involves tasks such as data entry, data cleaning, data integration, and data storage, while data gov...read more
Q4. how to role back updates
To roll back updates, identify the update and use the appropriate command to remove it.
Identify the update that needs to be rolled back
Use the appropriate command to remove the update
Verify that the rollback was successful
Q5. How do you operate data
Data can be operated by entering, editing, organizing, analyzing, and presenting it in a meaningful way.
Enter data accurately and efficiently
Edit data for errors and inconsistencies
Organize data in a logical and structured manner
Analyze data to identify patterns and trends
Present data in a clear and concise manner
Use appropriate software and tools for data entry and analysis
Ensure data security and confidentiality
Maintain data integrity and accuracy
Q6. What is Backup, How to Configure Backup.
Backup is the process of creating a copy of data to protect against loss. It can be configured using various methods.
Identify the data to be backed up
Choose a backup method (full, incremental, differential)
Select a backup location (external hard drive, cloud storage)
Schedule regular backups
Test backups to ensure data can be restored
Examples: Windows Backup and Restore, Time Machine for Mac, cloud backup services like Dropbox or Google Drive
Q7. How can you take data backup
Data backup can be taken using various methods such as cloud storage, external hard drives, and network-attached storage (NAS).
Cloud storage: Use services like Google Drive, Dropbox, or Amazon S3 to store data remotely.
External hard drives: Connect an external hard drive to the system and copy the data onto it.
Network-attached storage (NAS): Set up a dedicated storage device on the network to backup data from multiple systems.
Backup software: Utilize specialized backup softwa...read more
Q8. What is the Data Store
A data store is a centralized location where data is stored and organized for easy retrieval and manipulation.
Data store is used to store and manage large amounts of data.
It can be a physical device like a hard drive or a virtual storage system.
Data can be stored in various formats such as databases, files, or cloud storage.
Data stores provide mechanisms for data access, retrieval, and modification.
Examples of data stores include databases like MySQL, file systems like NTFS, ...read more
Data Management Jobs
Q9. What is the data entry
Data entry is the process of inputting, organizing, and managing data into a computer system or database.
Data entry involves accurately inputting data from various sources into a computer system.
It includes tasks such as typing, scanning, and verifying data for accuracy.
Data entry operators may work with spreadsheets, databases, or specialized software.
Examples of data entry tasks include entering customer information, updating inventory records, or transcribing documents.
Att...read more
Q10. How you will avoid material, customer, vendor duplication in MDM
To avoid duplication in MDM, implement data validation rules, establish unique identifiers, and regularly cleanse and merge data.
Implement data validation rules to ensure that only accurate and complete data is entered into the MDM system.
Establish unique identifiers for materials, customers, and vendors to prevent duplication.
Regularly cleanse and merge data to identify and resolve any duplicate records.
Utilize matching algorithms and fuzzy logic to identify potential duplic...read more
Q11. What is master data
Master data is the core data that is used as a base for transactional data in an organization.
Master data is static data that is not frequently changed.
It is used as a reference data for transactional data.
Examples of master data include customer data, vendor data, material data, etc.
Master data is maintained centrally and is shared across different departments in an organization.
It is critical for accurate reporting and decision-making.
Q12. Maintain the complaint data on excell
To maintain complaint data on Excel, use Excel's data entry and management tools.
Create a new Excel workbook for complaint data
Use Excel's data entry tools to input complaint data
Use Excel's sorting and filtering tools to manage and analyze complaint data
Regularly update the Excel workbook with new complaint data
Back up the Excel workbook to prevent data loss
Q13. Explain data integration & pipeline
Data integration & pipeline involves combining data from multiple sources and processing it to make it usable for analysis.
Data integration is the process of combining data from different sources into a single, unified view.
Data pipeline refers to the series of steps that data goes through from collection to analysis.
Data integration ensures that data is clean, consistent, and accurate before being processed in the pipeline.
Examples of data integration tools include Talend, I...read more
Q14. What is ETL. Knowledge of Data Ware house
ETL stands for Extract, Transform, Load. It is a process of moving data from source systems to a target data warehouse.
Extract: Data is extracted from various sources such as databases, files, APIs, etc.
Transform: Data is transformed to fit the target data warehouse schema and to ensure data quality.
Load: Data is loaded into the target data warehouse for analysis and reporting.
ETL is a crucial process in building a data warehouse.
ETL tools such as Informatica, Talend, and SSI...read more
Q15. What is data lake in electronic data
A data lake is a centralized repository that allows for the storage of large amounts of structured and unstructured data at a low cost.
Data lakes store data in its raw form, without the need to structure it beforehand.
They can store various types of data such as logs, sensor data, social media feeds, and more.
Data lakes enable organizations to perform advanced analytics and data processing on a wide range of data sources.
Q16. How do you do Data Catalog and Lineage?
Data catalog and lineage are done through metadata management and tracking data flow.
Create a metadata repository to store information about data sources, data types, and data lineage.
Track data flow through the use of data lineage tools and techniques such as data mapping and data profiling.
Ensure data quality by implementing data governance policies and procedures.
Regularly update the metadata repository to reflect changes in data sources and data flow.
Examples of data line...read more
Q17. How to use data validation?
Data validation is used to ensure that data entered into a system meets certain criteria.
Data validation can be used to prevent errors and ensure accuracy.
It can be used to restrict input to certain values or formats.
Examples include validating email addresses, phone numbers, and dates.
Data validation can also be used to ensure that required fields are filled out.
It can be implemented through programming or through built-in features in software.
Q18. How to handle big data
Handling big data involves collecting, storing, analyzing, and interpreting large volumes of data to derive insights and make informed decisions.
Utilize data management tools like Hadoop, Spark, or SQL databases
Implement data cleaning and preprocessing techniques to ensure data quality
Use data visualization tools like Tableau or Power BI to present findings
Apply statistical analysis and machine learning algorithms for predictive modeling
Ensure data security and compliance wit...read more
Q19. Difference between a Data Warehouse and Data lake?
Data warehouse is structured and used for reporting and analysis, while data lake is unstructured and used for exploration and experimentation.
Data warehouse stores structured data for easy access and analysis.
Data lake stores unstructured and raw data for exploration and experimentation.
Data warehouse is typically used for reporting and business intelligence.
Data lake is used for data science and machine learning projects.
Data warehouse requires schema-on-write, meaning data...read more
Q20. What's are the tools of back up?
Tools for backup include software, hardware, and cloud-based solutions.
Backup software such as Acronis, Veeam, and Backup Exec
Hardware solutions like tape drives, external hard drives, and network-attached storage (NAS)
Cloud-based backup services like Amazon S3, Google Drive, and Microsoft OneDrive
Backup generators and uninterruptible power supplies (UPS) to ensure power continuity
Backup scripts and automation tools to schedule and manage backups
Q21. How can clean survey update on system?
To update clean survey on the system, follow these steps:
Access the system's survey module
Select the clean survey option
Enter the relevant information such as date, location, and cleanliness rating
Save the survey to update it on the system
Ensure the system is connected to the internet for real-time updates
Q22. Explain what data cleansing is
Data cleansing is the process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in datasets.
Data cleansing involves identifying and handling missing values in datasets.
It also includes removing duplicate records or entries.
Data cleansing may involve correcting spelling mistakes or formatting issues in data.
It helps improve data quality and reliability for analysis and decision-making.
Example: Removing rows with missing values, standardizing d...read more
Q23. Is it possible to standardise datasources for different dashboards?
Yes, it is possible to standardise datasources for different dashboards.
Standardising datasources involves creating a unified data structure and format.
Data can be transformed and cleaned to ensure consistency across dashboards.
Using data integration tools or platforms can help automate the standardisation process.
Examples of standardising datasources include merging multiple databases into a single source or converting different file formats into a common format.
Q24. What is structured data?
Structured data is a standardized format for providing information about a webpage and its content.
Structured data helps search engines understand the content of a webpage.
It uses a specific vocabulary to label and organize information.
Common structured data formats include JSON-LD, Microdata, and RDFa.
Examples of structured data include product information, reviews, and event details.
Q25. what is data gateway
Data gateway is a tool that connects on-premises data sources with cloud-based applications.
Data gateway allows for secure transfer of data between on-premises and cloud environments
It acts as a bridge between on-premises data sources and cloud services like Power BI
Data gateway helps in maintaining data security and compliance
Examples of data gateways include Power BI Gateway and Azure Data Gateway
Q26. What are the steps for LO cockpit dataSource enhancement?
Steps for enhancing LO cockpit dataSource
Identify the fields to be added
Create append structure for the fields
Enhance the datasource with the append structure
Activate the datasource
Test the datasource
Q27. How to get daily updated data to my email?
To get daily updated data to your email, you can use automated scripts or tools that fetch the data and send it to your email address.
Use a programming language like Python to write a script that fetches the data from a source and sends it to your email using SMTP.
Utilize APIs provided by the data source to retrieve the data and then use an email service's API to send it to your email address.
Use third-party tools like Zapier or IFTTT that allow you to automate data retrieval...read more
Q28. Convert excel data into normalised form.
To convert excel data into normalised form, first identify unique entities, create separate tables for them, and establish relationships between tables.
Identify unique entities in the excel data such as customers, products, orders, etc.
Create separate tables for each entity with unique identifiers for each record.
Establish relationships between tables using foreign keys to link related data.
Normalize the data by removing redundant information and ensuring data integrity.
Examp...read more
Q29. How you store and manage data and document
I store and manage data and documents using a combination of physical filing systems and digital storage solutions.
Utilize physical filing systems for hard copies of documents
Organize digital files into folders on a secure server
Regularly backup important data to prevent loss
Implement access controls to protect sensitive information
Q30. What efficient modelling is crucial for manging large data sets?
Efficient data modelling techniques like normalization and indexing are crucial for managing large data sets.
Normalization helps reduce redundancy and improve data integrity by organizing data into separate tables and linking them through relationships.
Indexing helps improve query performance by creating indexes on columns frequently used in search conditions.
Partitioning can also be used to divide large data sets into smaller, more manageable chunks.
Data compression techniqu...read more
Q31. What transformations did use in Informtica
I have used various transformations in Informatica such as Filter, Router, Expression, Aggregator, Joiner, Lookup, and Sorter.
Filter transformation is used to filter rows based on a condition.
Router transformation is used to route data to different targets based on conditions.
Expression transformation is used to perform calculations or manipulate data.
Aggregator transformation is used to perform aggregate calculations like sum, average, etc.
Joiner transformation is used to jo...read more
Q32. What is Data mart?
A data mart is a subset of a larger data warehouse that is designed to serve a specific business unit or department.
Contains a subset of data from a larger data warehouse
Designed to serve a specific business unit or department
Provides a more focused view of data for decision-making
Can be used for reporting, analysis, and data mining
Examples: Sales data mart, HR data mart, Finance data mart
Q33. Difference between Trusted source and Target Source Reconciliation
Trusted source reconciliation compares data from a reliable source with the target source to identify discrepancies.
Trusted source is a reliable source of data used for comparison
Target source is the source of data being reconciled
Discrepancies are identified and resolved to ensure data accuracy
Q34. Explain end to end data flow
End to end data flow refers to the complete journey of data from its source to its destination, including all processes and systems involved.
Data is collected from various sources such as databases, applications, sensors, etc.
It is then processed and transformed through various stages like extraction, transformation, and loading (ETL).
The data is stored in a data warehouse or data lake for analysis and reporting.
Finally, the insights derived from the data are used for decisio...read more
Q35. What are the phases of CDM
The phases of CDM are data collection, data cleaning, data analysis, and data interpretation.
Data collection involves gathering relevant data from various sources.
Data cleaning involves removing errors, inconsistencies, and outliers from the collected data.
Data analysis involves applying statistical methods and techniques to analyze the cleaned data.
Data interpretation involves drawing meaningful conclusions and insights from the analyzed data.
Q36. How to handle the large number of data
Utilize data management tools, prioritize data based on relevance, and implement efficient data processing techniques.
Utilize data management tools such as databases, data warehouses, and data lakes to efficiently store and organize large volumes of data.
Prioritize data based on relevance to the research project or analysis to focus on key insights and reduce processing time.
Implement efficient data processing techniques such as parallel processing, data compression, and inde...read more
Q37. How to Tack Online data Backup?
Online data backup can be achieved through various methods and technologies.
Use cloud storage services like Microsoft Azure, Amazon S3, or Google Cloud Storage.
Implement backup software solutions like Veeam, Acronis, or Commvault.
Utilize network-attached storage (NAS) devices for local backups.
Create redundant copies of critical data to ensure data integrity.
Regularly test and verify the backup and restore processes.
Consider implementing a disaster recovery plan to handle dat...read more
Q38. What is the import process
The import process involves bringing in data or goods from another country or system.
Identify the source of the import
Ensure compliance with import regulations and tariffs
Arrange for transportation and delivery
Complete necessary documentation and customs clearance
Inspect and verify the quality and quantity of the imported goods
Q39. How many day are taken by online sites to create a backup ?
The number of days taken by online sites to create a backup varies depending on the size of the site and the frequency of backups.
The backup process can take anywhere from a few minutes to several hours or even days.
Factors that affect backup time include the amount of data being backed up, the speed of the internet connection, and the backup method used.
For example, a small site with minimal data may only take a few minutes to back up, while a large site with terabytes of da...read more
Q40. How to ensure data Center management and cost simulation
To ensure data center management and cost simulation, a comprehensive approach is needed.
Regular monitoring and analysis of data center performance and costs
Implementing cost-saving measures such as virtualization and energy-efficient hardware
Using simulation tools to model different scenarios and optimize resource allocation
Collaborating with stakeholders to align data center strategy with business goals
Regularly reviewing and updating data center policies and procedures
Inve...read more
Q41. How to manage large datasets
Large datasets can be managed by using efficient data storage techniques, data indexing, data partitioning, and utilizing parallel processing.
Utilize efficient data storage techniques such as using databases optimized for large datasets like Hadoop, MongoDB, or Cassandra
Implement data indexing to quickly retrieve specific data points without scanning the entire dataset
Partition large datasets into smaller chunks to distribute the workload and improve query performance
Utilize ...read more
Q42. What is your DWM ?
DWM stands for Dynamic Workload Manager.
DWM is a software tool used in production engineering to manage and optimize workloads.
It helps in balancing the workload across multiple resources to ensure efficient utilization.
DWM monitors resource usage, predicts workload demands, and adjusts resource allocation accordingly.
It can prioritize critical tasks, allocate resources based on priority, and dynamically adjust as needed.
For example, in a manufacturing plant, DWM can allocate...read more
Q43. Why the offline data is so important for any company?
Offline data is important for companies as it provides insights into customer behavior and preferences.
Offline data can help companies understand customer behavior and preferences
It can be used to identify trends and patterns in customer data
Offline data can also be used to improve customer experience and personalize marketing efforts
Examples of offline data include in-store purchases, customer service interactions, and surveys
Offline data can be combined with online data to ...read more
Q44. How to pat file backup
To pat file backup, use a backup software or manually copy the files to a backup location.
Use a backup software like Acronis True Image or EaseUS Todo Backup
Manually copy the files to an external hard drive or cloud storage
Ensure that the backup location is secure and easily accessible
Schedule regular backups to avoid data loss
Q45. Difference between structured data and unstructured data
Structured data is organized and easily searchable, while unstructured data lacks a predefined format and is harder to analyze.
Structured data is organized into a predefined format, such as tables or databases.
Unstructured data does not have a specific format and includes text documents, images, videos, etc.
Structured data is easily searchable and can be analyzed using traditional methods.
Unstructured data requires advanced techniques like natural language processing to extra...read more
Q46. What are the backup strategies?
Backup strategies are plans and procedures put in place to protect data in case of loss or corruption.
Regularly scheduled backups to ensure data is up to date
Offsite backups to protect against physical damage or theft
Incremental backups to save storage space and time
Automated backups to reduce human error
Testing backups to ensure they can be restored successfully
Q47. What is the lifecycle of data
The lifecycle of data refers to the stages of data from its creation to its disposal.
Data creation
Data storage
Data processing and analysis
Data sharing and dissemination
Data archiving and disposal
Q48. how do you implement incremental refresh
Incremental refresh in Power BI allows for loading only new or modified data to improve performance.
Set up incremental refresh policy in Power BI Service
Define a range of values for the refresh policy
Use parameters to filter data based on the refresh policy
Schedule regular refreshes to update the dataset
Q49. WHat do you know about Global Data
Global Data is a leading provider of business intelligence and analytics solutions.
Global Data offers market research reports, data analytics, and consulting services to various industries.
It provides insights into market trends, competitive landscapes, and consumer behavior.
Global Data's clients include Fortune 500 companies, government agencies, and academic institutions.
The company has a global presence with offices in North America, Europe, Asia-Pacific, and the Middle Ea...read more
Q50. What is backup agent
A backup agent is a software component that facilitates the backup and restore process by managing communication between the backup server and the devices being backed up.
Backup agent acts as an intermediary between the backup server and the devices being backed up.
It facilitates the transfer of data from the devices to the backup server and vice versa.
Backup agents often provide features like compression, encryption, and deduplication to optimize the backup process.
Examples ...read more
Q51. Do you know about backups
Yes, I am familiar with backups.
I understand the importance of regular backups to prevent data loss.
I am experienced in setting up and managing backup systems.
I am knowledgeable about different types of backups, such as full, incremental, and differential backups.
I am familiar with backup software and tools, such as Veeam, Acronis, and Backup Exec.
I am aware of best practices for backup storage and retention, including offsite backups and disaster recovery plans.
Q52. how to load data incrementally
Loading data incrementally involves updating only new or changed data instead of reloading entire dataset.
Identify the key field that determines new or updated records
Use timestamp or versioning to track changes
Implement a process to extract, transform, and load only new or updated data
Consider using tools like Change Data Capture (CDC) or Incremental Load in ETL processes
Q53. How would reduce the data size.
Reduce data size by removing unnecessary columns, aggregating data, using data compression techniques, and optimizing data storage.
Remove unnecessary columns that are not being used in analysis
Aggregate data by grouping similar data points together
Use data compression techniques like gzip or snappy to reduce file size
Optimize data storage by using efficient data structures and algorithms
Consider using data deduplication to remove redundant data
Q54. How to migrate Salesforce CPQ data from external system to Salesforce
Salesforce CPQ data can be migrated from external system to Salesforce using Data Loader or third-party tools.
Export data from external system in CSV format
Map fields in CSV file to Salesforce CPQ fields
Use Data Loader or third-party tools to import data into Salesforce
Validate data after import to ensure accuracy
Q55. How can we manage data on the website to reduce load
Data management techniques like caching, compression, and database optimization can help reduce website load.
Implement caching mechanisms to store frequently accessed data and reduce server load.
Use data compression techniques like Gzip to reduce the size of files transferred between the server and client.
Optimize database queries to reduce the amount of data retrieved and processed during each request.
Consider implementing lazy loading for images and videos to only load them...read more
Q56. What are the Data management tools
Data management tools are software applications used to collect, store, organize, and analyze data for decision-making purposes.
Database management systems (DBMS) like MySQL, Oracle, SQL Server
Data visualization tools like Tableau, Power BI, QlikView
Data integration tools like Informatica, Talend, SnapLogic
Data quality tools like Trillium, Informatica Data Quality
Big data tools like Hadoop, Spark, Kafka
Q57. Why to use data pages, activity and data transform
Data pages, activities, and data transforms are essential components in Pega for efficient data handling and processing.
Data pages are used to efficiently retrieve and store data from external systems or databases, reducing the need for repeated database calls.
Activities are used to define business logic and automate tasks, such as data manipulation, decision-making, and integration with external systems.
Data transforms are used to transform and manipulate data within Pega, s...read more
Q58. difference between EDI & EDM
EDI is Electronic Data Interchange used for exchanging business documents electronically, while EDM is Electronic Document Management used for managing digital documents.
EDI is used for exchanging structured business documents like invoices, purchase orders, etc.
EDM is used for managing and organizing digital documents like contracts, reports, etc.
EDI involves the direct computer-to-computer exchange of business documents in a standard electronic format.
EDM focuses on the sto...read more
Q59. Manage data in Google sheets
Managing data in Google Sheets involves organizing, analyzing, and updating information in a collaborative online spreadsheet platform.
Use Google Sheets to create and edit spreadsheets
Organize data into rows and columns
Apply formulas and functions for data analysis
Share and collaborate with team members
Use filters and sorting options for data organization
Import and export data from other sources
Q60. How to mentain data on excel
Maintaining data on Excel involves organizing information in rows and columns, using formulas and functions, and ensuring data accuracy.
Organize data in rows and columns for easy access and analysis
Use formulas and functions to perform calculations and manipulate data
Ensure data accuracy by double-checking entries and using validation tools
Regularly update and backup data to prevent loss
Use filters and sorting options to quickly find specific information
Q61. What are precautions or improvisation can be taken to improve for data management
Precautions and improvisations for improving data management in drug safety
Implementing standardized data entry protocols to ensure consistency and accuracy
Regularly conducting data quality checks and audits to identify and correct errors
Utilizing advanced data management software and tools for efficient data processing
Ensuring data security and confidentiality measures are in place to protect sensitive information
Providing training and ongoing support for staff involved in d...read more
Q62. create a data to excel, power bi
Creating a data visualization in Excel and Power BI
Collect and organize the data you want to visualize
Open Excel and input the data into a spreadsheet
Create charts or graphs to represent the data visually
Save the Excel file
Open Power BI and import the Excel file
Create interactive visualizations using the imported data
Q63. How to plan the data migration and best practices
Data migration planning requires a thorough understanding of the existing data, target system, and potential risks.
Identify the scope and objectives of the migration
Analyze the existing data and identify any data quality issues
Choose the appropriate migration method (e.g. ETL, manual)
Develop a detailed migration plan with timelines and milestones
Test the migration process thoroughly before executing it
Ensure data security and compliance throughout the migration process
Train u...read more
Q64. How you used data management and analytics in your last role?
I utilized data management and analytics to track project progress, identify trends, and make data-driven decisions.
Implemented data management systems to organize and store project data efficiently
Utilized analytics tools to analyze project performance and identify areas for improvement
Generated reports and dashboards to track key metrics and communicate findings to stakeholders
Used data insights to make informed decisions and drive project success
Q65. Tell me the overall plan for Informatica Installation and Upgrade
The overall plan for Informatica Installation and Upgrade involves several steps.
Assess the current system and determine the appropriate version to upgrade to
Ensure all prerequisites are met, including hardware and software requirements
Back up all data and configurations before beginning the installation or upgrade process
Install or upgrade the Informatica software
Configure the system and test functionality
Migrate data and configurations from the previous version if necessary...read more
Q66. How to dump data from csv into bq
Use Google Cloud Storage to load CSV data into BigQuery
Upload the CSV file to Google Cloud Storage
Create a BigQuery table with the appropriate schema
Use the 'bq load' command to load the data from the CSV file into the BigQuery table
Q67. How do You manage your data in laboratory, what is responsibility of vendor management ,
I manage laboratory data through proper organization, documentation, and storage. Vendor management involves selecting, evaluating, and maintaining relationships with suppliers.
Implementing a robust data management system to ensure accuracy and traceability
Regularly backing up data to prevent loss
Establishing protocols for data entry, analysis, and reporting
Reviewing and updating data management procedures as needed
Vendor management includes selecting reliable suppliers, nego...read more
Q68. Do you understand data well?
Yes, I have a strong understanding of data and its analysis.
I have experience in collecting, organizing, and analyzing data from various sources.
I am proficient in using statistical tools and software to interpret data.
I have successfully presented data-driven insights to stakeholders in previous roles.
Q69. How to create Hierarchy in rpd
Hierarchies can be created in rpd by defining parent-child relationships between columns.
Identify the columns that will be part of the hierarchy
Create a logical table source for each column
Define the parent-child relationship between the columns using the 'Hierarchy' tab in the 'Physical Layer'
Create a presentation hierarchy in the 'Business Model and Mapping' layer
Test the hierarchy in the 'Answers' section
Q70. what is IR, and difference between dataset and linked service
IR stands for Integration Runtime. Dataset is a representation of data, while linked service is a connection to the data source.
IR is a compute infrastructure used to provide data integration capabilities
Dataset is a structured representation of data used in data engineering tasks
Linked service is a connection to a data source, providing access to the data
IR enables data movement and transformation between different data sources
Dataset defines the schema and structure of the ...read more
Q71. How many files are getting in a month?
The average number of files received per month varies depending on the department and workload.
The number of files received per month can range from a few hundred to several thousand depending on the department.
The workload and seasonality can also affect the number of files received.
It is important to have a system in place to manage the influx of files and ensure timely processing.
For example, in a busy HR department, the number of files received per month can be around 100...read more
Q72. What are the constraints faced in Data Validation. explain each of them with illustration.
Constraints faced in Data Validation with illustrations
1. Format constraints: Ensuring data follows a specific format (e.g. date in MM/DD/YYYY format)
2. Range constraints: Validating data falls within a specified range (e.g. age between 18-65)
3. Mandatory constraints: Ensuring required fields are not empty (e.g. email address field)
4. Consistency constraints: Checking data consistency across multiple fields (e.g. start date before end date)
5. Uniqueness constraints: Verifying...read more
Q73. How can do data verification
Data verification can be done by comparing data against a trusted source or using software tools.
Verify data accuracy by comparing it against a trusted source
Use software tools like data validation rules or checksums to ensure data integrity
Perform data cleansing to remove duplicates or errors
Conduct manual checks to identify anomalies or inconsistencies
Implement data security measures to prevent unauthorized access or modification
Q74. how to onboarding the data in to splunk different sources
Data onboarding in Splunk involves configuring data inputs from various sources.
Identify the data sources and their formats
Configure data inputs using Splunk Web or configuration files
Use Splunk Add-ons for specific data sources
Validate data inputs and troubleshoot any issues
Monitor data inputs for changes and adjust configurations accordingly
Q75. What is rto and rpo
RTO (Recovery Time Objective) is the targeted duration of time within which a business process must be restored after a disaster. RPO (Recovery Point Objective) is the maximum tolerable period in which data might be lost due to a disaster.
RTO is the maximum acceptable downtime for a business process.
RPO is the maximum amount of data loss that is acceptable for a business process.
RTO and RPO are key metrics in disaster recovery planning.
For example, if a company has an RTO of ...read more
Q76. what is filter context
Filter context determines which rows of data are visible to calculations in DAX formulas.
Filter context is dynamic and changes based on user interactions.
It can be set by slicers, filters, or relationships between tables.
Filter context is used to calculate results based on the current filter selections.
It helps in determining which rows of data are included in calculations.
Q77. Explain Type of Backup?
Type of backup refers to the method used to back up data, such as full, incremental, or differential backups.
Full Backup: A complete copy of all data is made.
Incremental Backup: Only changes made since the last backup are saved.
Differential Backup: Only changes made since the last full backup are saved.
Mirror Backup: An exact copy of the data is created.
Snapshot Backup: A point-in-time copy of data is taken.
Cloud Backup: Data is stored on remote servers.
Offline Backup: Data i...read more
Q78. Difference between retention and archival
Retention is keeping data for a specific period while archival is keeping data indefinitely.
Retention is for a specific period, while archival is indefinite
Retention is for compliance and legal purposes, while archival is for historical purposes
Retention is usually automated, while archival requires manual intervention
Retention policies can be applied to specific content types, while archival is applied to entire sites or collections
Q79. What is deduplication on commvault
Deduplication on Commvault is a data reduction technique that eliminates redundant data to save storage space.
It identifies and eliminates duplicate data blocks within and across backup jobs.
It reduces the amount of data that needs to be stored and transferred, improving backup and recovery times.
It can be applied at the client, media agent, or storage policy level.
For example, if multiple users have the same file on their computers, Commvault will only store one copy of the ...read more
Q80. What if a customer refuses to share logs and data?
Explain importance of logs, offer alternative solutions, emphasize need for troubleshooting
Explain the importance of logs in diagnosing and resolving technical issues
Offer alternative solutions such as remote troubleshooting or guiding the customer through the process
Emphasize the need for collaboration and transparency to effectively resolve the issue
Q81. How to handle incremental load
Incremental load can be handled by identifying new or updated data and merging it with existing data.
Identify new or updated data using timestamps or unique identifiers
Extract and transform the new data
Merge the new data with existing data using a join or union operation
Load the merged data into the target system
Q82. Difference between test and oot data
Test data is used to evaluate the performance of a model during training, while out-of-time (OOT) data is used to evaluate the model's performance on unseen data.
Test data is typically a subset of the original dataset used to train the model.
OOT data is data that was not available at the time of model training and is used to simulate real-world scenarios.
Test data helps assess how well the model generalizes to new, unseen data, while OOT data helps evaluate the model's perfor...read more
Q83. What's MDM, data quality,
MDM stands for Master Data Management, which is the process of creating and managing a single, accurate and complete view of an organization's data.
MDM involves the processes, governance, policies, standards and tools that consistently define and manage the critical data of an organization.
Data quality refers to the state of completeness, consistency, accuracy, and reliability of data.
Data quality is crucial for effective decision-making, operational efficiency, and regulator...read more
Q84. How to check data set & mount point information in human readable format ?
To check data set & mount point info in human readable format, use the 'df' command.
Open the terminal and type 'df -h' to display the information in human-readable format.
The 'df' command shows the file system disk space usage, including the mount point and file system type.
The '-h' option displays the sizes in a human-readable format, such as 'K' for kilobytes, 'M' for megabytes, and 'G' for gigabytes.
You can also use the 'mount' command to display the mounted file systems a...read more
Q85. How do you find any IDOCs are going into failure
IDOC failures can be monitored through system logs and monitoring tools.
Monitor system logs for any IDOC failure messages
Use monitoring tools like SAP Solution Manager to track IDOC status
Set up alerts for immediate notification of IDOC failures
Regularly check IDOC processing status in SAP transaction codes like WE02 or WE05
Q86. What are the kinds of MDM tools?
Mobile Device Management (MDM) tools include cloud-based, on-premises, and hybrid solutions.
Cloud-based MDM tools: ManageEngine Mobile Device Manager Plus, VMware AirWatch
On-premises MDM tools: Microsoft Intune, IBM MaaS360
Hybrid MDM tools: Citrix XenMobile, MobileIron
Q87. How will you manage any difficulties regarding data mismatch ?
I will verify the source of data and cross-check with other sources to resolve any discrepancies.
Verify the source of data
Cross-check with other sources
Communicate with relevant parties to resolve discrepancies
Maintain accurate records of all data entries and changes
Q88. how do u handle data when address is coming late
Handle late address data by setting up a process for updating records and communicating with stakeholders.
Establish a protocol for updating address information once it is received.
Communicate with relevant parties to ensure the updated address is properly recorded.
Update any necessary documentation or systems with the new address.
Track the progress of late address updates to ensure completion.
Implement measures to prevent delays in address updates in the future.
Q89. What is pim ?
PIM stands for Product Information Management. It is a system used to manage and centralize product data for e-commerce businesses.
PIM helps businesses organize and maintain accurate and up-to-date product information.
It allows businesses to manage product attributes, descriptions, images, and other relevant data in a centralized database.
PIM systems enable businesses to efficiently distribute product information across multiple sales channels, such as websites, marketplaces,...read more
Q90. What is Delta Lake and its benefits
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.
Provides ACID transactions for big data workloads
Ensures data reliability and quality by enabling schema enforcement and data versioning
Supports batch and streaming data processing
Improves data quality and reliability by enabling schema enforcement and data versioning
Q91. Explain concepts around data governance and data architecture implementation
Data governance involves establishing policies and procedures for managing data assets, while data architecture implementation focuses on designing and implementing the structure and organization of data within an organization.
Data governance involves defining roles and responsibilities for managing data, establishing data quality standards, and ensuring compliance with regulations.
Data architecture implementation includes designing data models, creating data storage solution...read more
Q92. What is the Data source you use
We primarily use SQL databases for storing and retrieving data.
SQL databases like MySQL, PostgreSQL, and Microsoft SQL Server are commonly used
We also utilize NoSQL databases like MongoDB for certain projects
Data may also be sourced from APIs, external services, or flat files
Q93. How do we manage large data.
Large data can be managed through data storage solutions, data processing techniques, and data visualization tools.
Utilize data storage solutions such as databases, data lakes, and cloud storage to store large volumes of data.
Implement data processing techniques like data cleaning, transformation, and aggregation to analyze and extract insights from large datasets.
Use data visualization tools like Tableau, Power BI, or matplotlib to present large data in a visually appealing ...read more
Q94. What is the process of DAM?
DAM stands for Digital Asset Management, which is the process of organizing, storing, and retrieving digital assets such as images, videos, documents, and more.
Organizing digital assets in a centralized repository
Adding metadata to assets for easy search and retrieval
Controlling access to assets based on user permissions
Ensuring version control and asset security
Integrating with other systems for seamless workflows
Q95. What is difference between Snapshot and backup?
Snapshot is a point-in-time copy of data, while backup is a complete copy of data stored separately.
Snapshot captures the current state of data at a specific moment, while backup stores a complete copy of data at a separate location.
Snapshots are usually faster to create and restore compared to backups.
Backups are typically stored for longer periods and provide more comprehensive data protection.
Snapshots are often used for quick recovery of data in case of errors or failures...read more
Q96. What is data governance? what is data stewardship? how do you ensure data governance is achieved
Data governance is the overall management of the availability, usability, integrity, and security of data within an organization. Data stewardship is the responsibility for managing and ensuring the quality of data within a specific domain or department.
Data governance involves defining policies, procedures, and standards for data management.
Data stewardship involves implementing and enforcing those policies within a specific area of the organization.
To ensure data governance...read more
Q97. How would you deal with datasets having lots of categories
Utilize feature engineering techniques like one-hot encoding or target encoding to handle datasets with many categories.
Use feature engineering techniques like one-hot encoding to convert categorical variables into numerical values
Consider using target encoding to encode categorical variables based on the target variable
Apply dimensionality reduction techniques like PCA or LDA to reduce the number of features
Use tree-based models like Random Forest or XGBoost which can handle...read more
Q98. 1. How to copy the Business phone on account to the associated contacts and Opportunities?
To copy Business phone on account to associated contacts and opportunities, use process builder or workflow rule.
Create a process builder or workflow rule on account object
Add criteria to check if Business phone is not null
Add immediate action to update Business phone on associated contacts and opportunities
Use field update action to update Business phone field on contacts and opportunities
Map the Business phone field from account to contacts and opportunities
Q99. How to add data mappings
Data mappings can be added by defining the relationships between different data elements.
Identify the source and target data elements that need to be mapped
Create a mapping document specifying the relationships between the source and target data
Implement the mappings using ETL tools or custom scripts
Test the mappings to ensure data is accurately transformed
Q100. What is the backup strategy you have?
Our backup strategy includes full backups weekly, differential backups daily, and transaction log backups every 15 minutes.
Weekly full backups
Daily differential backups
Transaction log backups every 15 minutes
Backups stored on separate disk
Regular testing of backups for restoration
Top Interview Questions for Related Skills
Interview Questions of Data Management Related Designations
Interview experiences of popular companies
Reviews
Interviews
Salaries
Users/Month