i
Awign Enterprises
49 Awign Enterprises Jobs
Data Engineer - Web Crawling & Scraping (2-5 yrs)
Awign Enterprises
posted 24d ago
Flexible timing
Key skills for the job
Experience : 2+ years
Location : Pune
Duration : Permanent Opportunity
The successful candidate will be intelligent, accomplished, and energetic as demonstrated by his professional credentials. They will be passionate about working with data. This position requires creative and proactive critical thinking skills, an insatiable appetite for exploring new technologies related to real-time web automation/scraping. Individuals with additional experience in document and image analysis, business analytics, financial data analysis and risk management matters will be preferred.
Responsibilities :
- Work closely with the product team to fetch real-time data and design complex scraping flows to extract information from multiple sources.
- Research new data sources independently to document scraping methods, and infra requirements along with scaling and monitoring strategies.
- Continuously optimize the products for performance and cost per transaction
- Acquire, clean, standardize, transform, structure, and store data.
- Develop modules to extract data from documents and identify entities and relationships.
- Perform exploratory analysis on datasets to identify potential insights.
- Help other team members with optimizing data models and analytics.
- Maintain data integrity and consistency across multiple databases and applications.
- Build dashboards and visualizations to convey status, changes, and analysis of data.
- Research and learn new frameworks, languages, and technologies as needed.
Requirements :
- Expertise in Web crawling and scraping (i. e. Scrapy, Selenium, BS4 etc).
- Working Knowledge of Airflow
- Knowledge of working with Page Models, JS Rendering, Pop Ups, Tabs, IP Proxies, and Captchas.
- Knowledge of SQL and NoSQL databases (i. e PostgreSQL, MongoDB/DynamoDB, Neo4J).
- Knowledge of API Development using frameworks like Flask.
- Knowledge of machine learning libraries/frameworks is essential.
- Demonstrated experience in self-directed, primary-source research.
- Extracting, cleaning, and structuring data from unstructured or semi-structured sources like PDF, Text Files, Log files etc
- Proficiency in Python is a must.
- Good to have knowledge of serverless/container technologies for scraping like docker, cloud functions, google cloud run or similar.
- Good to have knowledge of Search frameworks like Elasticsearch or similar.
- Good to have knowledge of queue systems like Kafka, and RabbitMQ for data pipelines.
- Familiarity with data visualization tools and libraries (i. e. D3.js, Seaborn).
- Experience in GCP or AWS
- Professional engineering habits, including TDD, design patterns, code comments, design documentation, and version control (Git).
- Bachelor's degree in a related field, or equivalent self-study and demonstrated technical proficiency.
- Bonus : Knowledge of image processing and OCR for data extraction (like Tesseract)
- Knowledge of Text Analysis using NLP Frameworks
Functional Areas: Software/Testing/Networking
Read full job descriptionPrepare for Data Engineer roles with real interview advice