66 SOFTPATH TECHNOLOGIES Jobs
4-6 years
Softpath Technologies - Data Engineer - Spark/Hadoop (4-6 yrs)
SOFTPATH TECHNOLOGIES
posted 8d ago
Fixed timing
Key skills for the job
Position Title : Data Engineer
Experience : 4+ Years
Education : Bachelor's Degree in Engineering (BE) or Master of Computer Applications (MCA)
Project Type : Contract
Duration : 12+ months
Role Overview :
We are seeking a highly skilled and motivated Data Engineer with 4+ years of experience to join our dynamic team. The ideal candidate will be responsible for monitoring, maintaining, and optimizing our data ingestion pipelines to ensure the efficient flow and processing of data across various systems. This role requires a strong foundation in Big Data concepts, SQL, and data pipeline operations, as well as the ability to collaborate with cross-functional teams in an agile environment. A basic understanding of Data Science principles, BigQuery, and DevOps practices is also essential. Additionally, experience in web development for building reports and dashboards will be a plus.
As a Data Engineer, you will work closely with the data science, engineering, and DevOps teams to design and implement scalable and efficient data solutions. You will be tasked with ensuring the operational success of data pipelines and play a pivotal role in enabling data-driven decision-making across the organization.
Key Responsibilities :
Data Pipeline Monitoring & Support :
- Monitor and ensure the smooth operation of data ingestion pipelines.
- Troubleshoot and resolve issues related to data flow, data quality, and pipeline failures.
- Conduct regular performance checks and implement improvements to optimize pipeline efficiency.
Collaborate with Cross-functional Teams :
- Work in an agile development environment, collaborating with data scientists, data analysts, and DevOps engineers to ensure data pipelines meet business requirements.
- Participate in sprint planning, stand-ups, and retrospectives, ensuring that data requirements are met in a timely and efficient manner.
Data Visualization & Reporting :
- Assist in the development and optimization of reports and dashboards for effective data visualization.
- Work with business analysts and stakeholders to understand data needs and translate them into actionable insights.
- Develop user-friendly dashboards and reports using BI tools or custom web applications.
Big Data & Data Science Understanding :
- Maintain a solid understanding of Big Data technologies and concepts, including distributed computing frameworks like Hadoop, Spark, and cloud-based solutions.
- Apply data science concepts, such as data cleaning, transformation, and model integration, to optimize data pipelines.
- Ensure data is structured in a way that supports future analytical and machine learning workflows.
BigQuery & Cloud Data Solutions :
- Use Google BigQuery for data storage, querying, and optimization of large-scale datasets.
- Design and implement efficient data models for reporting and analytics in BigQuery.
- Collaborate with cloud infrastructure teams to ensure proper integration of BigQuery with other cloud-based services.
DevOps Practices & Automation :
- Apply DevOps best practices for continuous integration, deployment, and monitoring of data pipelines.
- Automate repetitive tasks, such as pipeline testing, data validation, and deployment processes, to reduce manual intervention and improve efficiency.
- Ensure the scalability and reliability of the data infrastructure by utilizing appropriate monitoring tools and alerting systems.
Documentation & Knowledge Sharing :
- Maintain clear, comprehensive documentation on data pipeline processes, architecture, and troubleshooting steps.
- Share knowledge and best practices with other team members to enhance overall data engineering capabilities within the organization.
Required Skills :
SQL :
- Strong proficiency in SQL for data manipulation, querying, and optimizing large datasets.
- Ability to write efficient queries to extract, transform, and load data from multiple sources.
Big Data :
- Hands-on experience with Big Data technologies such as Hadoop, Spark, and Kafka.
- Understanding of distributed computing and data processing principles.
BigQuery :
- Experience working with Google BigQuery, including querying large datasets, creating views, and optimizing performance.
- Ability to design efficient data models and schema in BigQuery for optimized querying and reporting.
Web Development (React.js, Node.js): (Nice to Have) :
- Experience in web development, particularly in building dashboards and reports using front-end technologies like React.js.
- Familiarity with server-side development using Node.js for back-end services.
- Understanding of RESTful API design to integrate data pipelines with web-based interfaces for real-time reporting and visualization.
Data Pipeline & ETL :
- Experience in designing, developing, and maintaining ETL (Extract, Transform, Load) processes for data ingestion and transformation.
- Familiarity with tools such as Apache Airflow, Talend, or custom Python-based solutions for orchestrating data pipelines.
Cloud Platforms (Google Cloud Platform, AWS, Azure) :
- Experience working with cloud-based platforms for data storage, processing, and analytics.
- Knowledge of cloud-based data solutions such as Google Cloud Storage, BigQuery, AWS Redshift, or Azure Synapse Analytics.
Agile Methodology :
- Strong experience working in agile environments, using agile frameworks such as Scrum or Kanban.
- Comfortable with fast-paced development cycles and the ability to adapt to changing business priorities.
Desired Skills and Experience :
DevOps Practices :
- Familiarity with version control systems (e.g., Git) and CI/CD pipelines.
- Knowledge of containerization technologies (Docker, Kubernetes) and cloud-native architectures.
Data Quality & Governance :
- Experience with data quality frameworks and tools for ensuring the accuracy and integrity of data.
- Knowledge of data governance practices and how to implement them in data engineering workflows.
Communication & Collaboration :
- Strong written and verbal communication skills, with the ability to explain complex technical concepts to non-technical stakeholders.
- Ability to collaborate across teams and help guide the organization's data strategy.
Preferred Qualifications :
- Experience with machine learning model integration into production environments.
- Familiarity with data orchestration tools like Apache Airflow or Luigi.
- Experience in real-time data streaming with technologies like Kafka or AWS Kinesis.
- Knowledge of data lakes and data warehouse architectures.
- Exposure to containerized environments and orchestration using Kubernetes.
Functional Areas: Software/Testing/Networking
Read full job description12-16 Yrs
5-8 Yrs
2-5 Yrs