Data munging using Python and Large Scale data analytics tools, to combine different datasets together.
Investigate data munging results in case output is not aligned with expectation. This involves identifying root cause analysis for code bugs, and coming up with a solution for bugs.
Maintain and improve existing data transformation pipeline, built using Python and Spar.
Assemble large, complex data sets that meet functional / non-functional business requirements.
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL/Python/Large Scale data analytics tools and AWS big data technologies.
Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
Work with data and analytics experts to strive for greater functionality in our data systems.
Handle project management and stakeholder management activities and take full ownership of quality and timeliness of product deliverables.
Gather knowledge about the maritime industry and its interrelationship with various economic and policy decision as relates to clients in Financial markets and Shipping industry.
Basic Qualifications :
Total 5 to 8 years of data engineering role with 3+ years of experience in Large Scale data analytics tools and Python
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Build processes supporting data transformation, data structures, metadata, dependency and workload management.
A successful history of manipulating, processing and extracting value from large, disconnected datasets.
Ability to work with a team of data engineers and data scientists and to implement new and innovative algorithms.
Understanding of data visualization principles and experience with data visualization tools.
B achelors or higher degree in a quantitative fields like Computer Science, Statistics, Informatics, Information Systems or another quantitative field from reputed institutions.
Must Have:
- PYTHON ( Incl. experience with PySpark and Pandas )
- AURORA POSTGRES RDS and SQLServer
- AWS S3, AWS RDS, AWS ECS
- DATABRICKS
- GENERAL PROGRAMMING CONCEPTS
- DESIGN PATTERNS
Good to Have:
- Jenkins ( CI/CD DevOps)
- AWS Cloud Formation / Terraform
- Understanding of ELK stack
- NoSQL ( such as MongoDB, Redis, ElasticSearch )
- Experience with other AWS Services like AWS Lambda, Redshift etc.
- Any other language like JAVA, .NET, shell scripting etc.
Beneficial:
Experience in leading complex analytical projects and managing stakeholders
Experience working with geo spatial data and unstructured/semi-structured data
Experience using big data tools for data manipulation and modeling (BigQuery, Hive, MongoDB, Cassandra etc.)
Excellent Logical thinking and problem-solving skills
Ability to confidently present your own ideas and solutions, as well as guide technical discussions.
Welcoming and approachable attitude and ability to connect with people and build strong professional relationship.
What We re Looking For: We are looking for a savvy Data Engineer to join our growing team of analytics experts. The hire will be responsible for expanding and optimizing our data and data pipeline architecture, as well as optimizing data flow and collection for cross functional teams. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up. The Data Engineer will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. The right candidate will be excited by the prospect of optimizing or even re-designing our company s data architecture to support our next generation of products and data initiatives. The person will need to coordinate with Product managers, Industry Analysts and Technology experts to develop and deliver the product in accordance with customer requirements and agreed timeline.