Build out the data infrastructure systems that can handle ever-growing volumes of data and the demands the team wants to make of it
Help design and build highly-available data processing pipelines that self-monitor and report anomalies
Shape the future of data infrastructure while improving the existing metric-driven development and machine-learning capabilities
Understand how to make the necessary trade-offs to ship without sacrificing quality
Improve the mission-critical production data pipeline, data warehouses, and data systems by designing and implementing the changes.
Recognize patterns and generalizations in the data flow, and automate as much as you can to boost efficiency.
Extend our logging and monitoring procedures to find abnormalities and problems early on and fix them
Create cutting-edge data and automation solutions using Python, Spark, and Flink
Maintain, manage, and keep an eye on the infrastructure's components, such as Kafka, Kubernetes, Spark, Flink, Jenkins, general OLAP and RDBMS databases, S3 objects buckets, and permissions
Boost the effectiveness, precision, and consistency of our ETL procedures
Job Requirements:
Bachelor s/Master s degree in Engineering, Computer Science (or equivalent experience)
At least 12+ years of relevant experience as a data engineer, software development, and/or dev-ops, SRE roles
5+ years experience working with data engineering, data systems, pipeline, and stream processing
Extensive experience with SQL and related technologies like Redshift, PostgreSQL, MySQL, Presto, Spark SQL, and Hive
Proficient experience with high-level programming languages like Python, Scala, Java, Kotlin, and Go
Experience with CI/CD (continuous integration and deployment) and workflow management systems like Airflow, Oozie, Luigi, and Azkaban
Experience implementing data governance, i.e. access management policies, data retention, IAM, etc.
Prior experience operating in a DevOps-like capacity while working with AWS, Kubernetes, Jenkins, Terraform, etc.
Experience thinking about automation, alerting, monitoring, security, and other declarative infrastructure
Extensive experience with ETL processes and the various data stores that serve data rapidly and securely to all internal and external stakeholders
Experience with Kafka, Kubernetes, Python, Spark, Flink, Jenkins, general OLAP, and RDBMS databases, S3 objects buckets, and permissions
Prior experience with maintaining and managing Kafka is preferred
Nice to have some experience in maintaining and managing OLAP/HA database systems
Familiarity with handling Kubernetes clusters for various jobs, apps, and high throughput is desirable
Technical knowledge of data exchange and serialization formats such as Protobuf, Avro, or Thrift is nice to have
Some prior experience in either deploying and creating Spark Scala and/or Flink applications is preferred