Altimetrik
10+ Aark Cargo Care Interview Questions and Answers
Q1. convert string of multiple lines with 'n' words to multiple arrays of fixed size: k, with no overlap of elements accross arrays.
Convert a string of multiple lines with 'n' words to multiple arrays of fixed size without overlap.
Split the string into individual words
Create arrays of fixed size 'k' and distribute words evenly
Handle cases where the number of words is not divisible by 'k'
Q2. Find columns from a table where data type is timestamp.
Use SQL query to find columns with timestamp data type in a table.
Use a SQL query like 'SHOW COLUMNS FROM table_name WHERE Type = 'timestamp''
Alternatively, query the information_schema.columns table for column data types
Check for variations of timestamp data types like datetime, timestamp, etc.
Q3. how can we handle unacknowledged messages in pubsub?
Unacknowledged messages in pubsub can be handled by implementing retries, dead letter queues, and monitoring mechanisms.
Implement retries for unacknowledged messages to be redelivered.
Use dead letter queues to store messages that repeatedly fail to be processed.
Set up monitoring mechanisms to track unacknowledged messages and identify potential issues.
Q4. FInd second index of "l" from a string "Hello". write a py code.
Find the second index of 'l' in the string 'Hello'.
Use the find() method to find the first index of 'l'.
Use the find() method again starting from the index after the first 'l' to find the second index.
Handle cases where the first 'l' is not found or there is no second 'l'.
Q5. convert string (multiple lines) to list
Use the split() method to convert a string with multiple lines into a list of strings.
Use the split() method with the newline character '\n' as the delimiter to split the string into a list of strings.
Example: 'Hello\nWorld\n' -> ['Hello', 'World']
Q6. What is difference between count(*) and count(1)
count(*) counts all rows in a table, while count(1) counts the number of non-null values in a specific column.
count(*) counts all rows in a table
count(1) counts the number of non-null values in a specific column
count(*) is generally used when you want to count all rows in a table, while count(1) is used when you want to count non-null values in a specific column
Q7. How to implement incremental load in a table
Implementing incremental load in a table involves updating only new or changed data without reloading the entire dataset.
Identify a column in the table that can be used to track changes, such as a timestamp or a version number
Use this column to filter out only the new or updated records during each load
Merge the new data with the existing data in the table using SQL queries or ETL tools
Ensure data integrity by handling any conflicts or duplicates that may arise during the inc...read more
Q8. how to drop duplicated rows from table
Use the DISTINCT keyword in a SELECT statement to remove duplicate rows from a table.
Use the DISTINCT keyword in a SELECT statement to retrieve unique rows
Use the GROUP BY clause with appropriate columns to remove duplicates
Use the ROW_NUMBER() function to assign a unique row number to each row and then filter out rows with row number greater than 1
Q9. Difference between drop, truncate and delete
Drop removes a table from the database, truncate removes all rows from a table, and delete removes specific rows from a table.
DROP: Removes the entire table structure and data from the database.
TRUNCATE: Removes all rows from a table but keeps the table structure.
DELETE: Removes specific rows from a table based on a condition.
Example: DROP TABLE table_name;
Example: TRUNCATE TABLE table_name;
Example: DELETE FROM table_name WHERE condition;
Q10. what is hadoop in big data
Hadoop is an open-source framework used for distributed storage and processing of large data sets across clusters of computers.
Hadoop consists of HDFS (Hadoop Distributed File System) for storage and MapReduce for processing.
It allows for parallel processing of large datasets across multiple nodes in a cluster.
Hadoop is scalable, fault-tolerant, and cost-effective for handling big data.
Popular tools like Apache Hive, Apache Pig, and Apache Spark can be integrated with Hadoop ...read more
Q11. how can you optmize dags?
Optimizing dags involves reducing unnecessary tasks, parallelizing tasks, and optimizing resource allocation.
Identify and remove unnecessary tasks to streamline the workflow.
Parallelize tasks to reduce overall execution time.
Optimize resource allocation by scaling up or down based on task requirements.
Use caching and memoization techniques to avoid redundant computations.
Implement data partitioning and indexing for efficient data retrieval.
Q12. Difference between top and limit.
Top is used to select the first few rows from a dataset, while limit is used to restrict the number of rows returned in a query.
Top is commonly used in SQL Server, while limit is commonly used in MySQL.
Top is used with the SELECT statement in SQL Server to limit the number of rows returned, while limit is used in MySQL to restrict the number of rows returned.
Example: SELECT TOP 5 * FROM table_name; (SQL Server) vs. SELECT * FROM table_name LIMIT 5; (MySQL)
Q13. Delete column from a table
To delete a column from a table, use the ALTER TABLE command.
Use the ALTER TABLE command followed by DROP COLUMN to delete a column from a table.
Specify the name of the column you want to delete after the DROP COLUMN keyword.
Make sure to carefully consider the impact of deleting a column on the data and any dependent objects.
Interview Process at Aark Cargo Care
Top Data Engineer Interview Questions from Similar Companies
Reviews
Interviews
Salaries
Users/Month