Premium Employer

Publicis Sapient

3.5
based on 3.1k Reviews
Filter interviews by

10+ MILEAGE LOGISTICS Interview Questions and Answers

Updated 31 Aug 2024
Popular Designations

Q1. What will happen if job has failed in pipeline and data processing cycle is over?

Ans.

If a job fails in the pipeline and data processing cycle is over, it can lead to incomplete or inaccurate data.

  • Incomplete data may affect downstream processes and analysis

  • Data quality may be compromised if errors are not addressed

  • Monitoring and alerting systems should be in place to detect and handle failures

  • Re-running the failed job or implementing error handling mechanisms can help prevent issues in the future

Add your answer

Q2. What Volume of data have you handled in your POCs ?

Ans.

I have handled terabytes of data in my POCs, including data from various sources and formats.

  • Handled terabytes of data in POCs

  • Worked with data from various sources and formats

  • Used tools like Hadoop, Spark, and SQL for data processing

View 1 answer

Q3. write sql code to get the city1 city2 distance of table if city1 and city2 tables can repeat

Ans.

SQL code to get the city1 city2 distance of table with repeating city1 and city2 values

  • Use a self join on the table to match city1 and city2

  • Calculate the distance between the cities using appropriate formula

  • Consider using a subquery if needed

Add your answer

Q4. what is difference repartition and coalesce

Ans.

Repartition increases the number of partitions in a DataFrame, while coalesce reduces the number of partitions without shuffling data.

  • Repartition involves a full shuffle of the data across the cluster, which can be expensive.

  • Coalesce minimizes data movement by only creating new partitions if necessary.

  • Repartition is typically used when increasing parallelism or evenly distributing data, while coalesce is used for reducing the number of partitions without a full shuffle.

  • Exampl...read more

Add your answer
Discover MILEAGE LOGISTICS interview dos and don'ts from real experiences

Q5. How will you design/configure a cluster if you have given 10 petabytes of data.

Ans.

Designing/configuring a cluster for 10 petabytes of data involves considerations for storage capacity, processing power, network bandwidth, and fault tolerance.

  • Consider using a distributed file system like HDFS or object storage like Amazon S3 to store and manage the large volume of data.

  • Implement a scalable processing framework like Apache Spark or Hadoop to efficiently process and analyze the data in parallel.

  • Utilize a cluster management system like Apache Mesos or Kubernet...read more

Add your answer

Q6. When will you decide to use repartition and coalesce?

Ans.

Repartition is used for increasing partitions for parallelism, while coalesce is used for decreasing partitions to reduce shuffling.

  • Repartition is used when there is a need for more partitions to increase parallelism.

  • Coalesce is used when there are too many partitions and need to reduce them to avoid shuffling.

  • Example: Repartition can be used before a join operation to evenly distribute data across partitions for better performance.

  • Example: Coalesce can be used after a filter...read more

Add your answer
Are these interview questions helpful?

Q7. how is data partitioned in pipeline

Ans.

Data partitioning in a pipeline involves dividing data into smaller chunks for processing and analysis.

  • Data can be partitioned based on a specific key or attribute, such as date, location, or customer ID.

  • Partitioning helps distribute data processing tasks across multiple nodes or servers for parallel processing.

  • Common partitioning techniques include range partitioning, hash partitioning, and list partitioning.

  • Example: Partitioning sales data by region to analyze sales perform...read more

Add your answer

Q8. 1. Command for find the 30 days old file in linux

Ans.

Use the find command with the -mtime option to find files that are 30 days old in Linux.

  • Use the find command with the -mtime option to specify the number of days.

  • For example, to find files that are exactly 30 days old: find /path/to/directory -mtime 30

  • To find files that are older than 30 days: find /path/to/directory -mtime +30

  • To find files that are newer than 30 days: find /path/to/directory -mtime -30

View 3 more answers
Share interview questions and help millions of jobseekers 🌟

Q9. 1.What are transformations and actions in spark 2.How to reduce shuffling 3.Questions related to project

Ans.

Transformations and actions in Spark, reducing shuffling, and project-related questions.

  • Transformations in Spark are operations that create a new RDD from an existing one, while actions are operations that return a value to the driver program.

  • Examples of transformations include map, filter, and reduceByKey, while examples of actions include count, collect, and saveAsTextFile.

  • To reduce shuffling in Spark, you can use techniques like partitioning, caching, and using appropriate...read more

Add your answer

Q10. Use of Vaccum in delta tables in terms of performance

Ans.

Vaccum in delta tables helps improve performance by reclaiming space and optimizing file sizes.

  • Vaccum operation helps optimize file sizes by removing small files and compacting larger files.

  • It helps improve query performance by reducing the amount of data that needs to be scanned.

  • Vaccum operation can be scheduled to run periodically to maintain optimal performance.

  • It is recommended to run Vaccum on delta tables after major data deletions or updates.

  • Example: VACCUM delta.`tabl...read more

Add your answer

Q11. command to copy the data from AWS s3 to redshift

Ans.

Use the COPY command in Redshift to load data from AWS S3.

  • Use the COPY command in Redshift to load data from S3 bucket.

  • Specify the IAM role with necessary permissions in the COPY command.

  • Provide the S3 file path and Redshift table name in the COPY command.

  • Ensure the Redshift cluster has the necessary permissions to access S3.

Add your answer
Contribute & help others!
Write a review
Share interview
Contribute salary
Add office photos

Interview Process at MILEAGE LOGISTICS

based on 11 interviews in the last 1 year
2 Interview rounds
Coding Test Round
Technical Round
View more
Interview Tips & Stories
Ace your next interview with expert advice and inspiring stories

Top Data Engineer Interview Questions from Similar Companies

3.8
 • 40 Interview Questions
3.8
 • 32 Interview Questions
3.4
 • 18 Interview Questions
3.7
 • 15 Interview Questions
3.8
 • 12 Interview Questions
4.0
 • 11 Interview Questions
View all
Share an Interview
Stay ahead in your career. Get AmbitionBox app
qr-code
Helping over 1 Crore job seekers every month in choosing their right fit company
70 Lakh+

Reviews

5 Lakh+

Interviews

4 Crore+

Salaries

1 Cr+

Users/Month

Contribute to help millions
Get AmbitionBox app

Made with ❤️ in India. Trademarks belong to their respective owners. All rights reserved © 2024 Info Edge (India) Ltd.

Follow us
  • Youtube
  • Instagram
  • LinkedIn
  • Facebook
  • Twitter