7 LONG FINCH TECHNOLOGIES Jobs
Data Engineer - PySpark (6-8 yrs)
LONG FINCH TECHNOLOGIES
posted 11hr ago
Key skills for the job
Role : Data Engineer(5+ years experience)
- Strong hold in Python as base language.
- Good knowledge in Python Fundamentals and OOPs concepts.
- Experience in Data Manipulation in Python.
- Any data processing libraries like Pyarrow, Pandas, Numpy, Dask etc - Good to have.
- Strong hold in SQL.
- Different types of Joins.
- Common Table Expressions.
- Window Functions.
- Sub Query.
- Distributed Computing.
- Good understanding of horizontal and vertically scalable system.
- Good understanding of distributed storage systems.
- Memory Management in Distributed Computing.
- Spark Architecture.
- Major Spark Components and Working.
- Different computational modes in Spark.
- Different types of nodes/executors and their requirement design.
- Functions of driver vs functions of executors.
- RDDs/DataFrame Fundametals.
- Job/Stage/Task.
- Actions/Triggers.
- Transformations: Wide/Narrow.
- Cache/Persistance.
- Broadcasting.
- Shuffle/Repartitioning.
- Spark SQL.
- Good Experience(hands-on) in the Spark SQL Library.
- Thorough understanding of the functions and datatypes in Spark.
- Spark UI.
- Good Experience in SparkUI - Different pages and functions.
- Good Debugging and Optimization Capabilities.
- Enabling History Servers.
- Optimization.
- Infrastucture level Design and Optimization.
- Spark Configuration and Code Optimization.
- Data Formats and Storage Systems.
- Understanding of different data formats like CSV, Parquet and ORC.
- Understanding of row/column-based storages, applications and Advantages/Disadvantages.
- Understanding of different data sources like HDFS, S3, SFTP, Apache Iceberg etc.
- Basic understanding of other packages in Spark.
- Basic functionalities of MLLib- Good to have.
- Basic functionalities of Spark Streaming and GraphX- Good to have
Functional Areas: Software/Testing/Networking
Read full job description