Create new data source integration pipelines and enhance the functionality of current pipelines to operate effectively at scale
Provide the team with technical direction
Enhance the colleagues' ability to access and use data for business and product purposes
Provide tools for data governance, real-time streaming pipelines, and task orchestration abstractions to manage resources on AWS as part of new design and architecture projects
Build tools for the data science/marketing teams in cooperation with the team
Job Requirements:
Bachelor s/Master s degree in Engineering, Computer Science (or equivalent experience)
At least 4+ years of relevant experience as a data engineer
4+ years of professional experience working in environments including data architecture, data engineering, and/or data warehousing, particularly with the use of Spark, Hadoop, Hive, and other tools and with a firm grasp of streaming pipelines
At least 1+ years of expertise in streaming pipeline development
Some experience with at least one high-level programming language, such as Scala
Solid knowledge of databases
Knowledge of the trade-offs involved in designing these features, as well as experience building large-scale data products
Solid knowledge of algorithms, data structures, and system design
Excellent understanding of Hadoop MapReduce and Spark, among other distributed computing technologies
Solid familiarity with the following Amazon infrastructure: EMR, S3, and Redshift
Strong data governance and quality knowledge
Must be a self-driven, highly motivated team player that enjoys learning new things
Fluent in spoken and written English communication