We are looking for an Data Science Intern in our EO Applied Data Science (EO Data) Team
Our mission is realized through foundational research and development in applied machine learning
With a plethora of Geospatial Data Science use cases that we have solved so far, such as Land Use and Land Cover (LULC), Crop Classification, Sowing and Harvest Progression, Change Detection, Route Optimization, Satellite Image Time-Series (SITS) classification, Image2Image (I2I) Translation, Cross-Modal Fusion etc, we are now focusing on advancing the next-generation Machine Learning (ML) applications, and surpass the State-Of-The-Art (SOTA), especially in more ambiguous, complex geographies
We look forward to applying our research to critical products while touching the lives of millions of users, via revolutionary, real, and near-real-time large-scale software systems utilizing Terabytes of data
At the core of such systems, we envision foundational geospatial data science models that are season, modality, and ground agnostic
We have been at the forefront of adaptable and efficient models, as evidenced by our findings through publications at top ML/GRS conferences
Key Responsibilities:
Work in collaboration with applied data scientists, MLOps, geospatial experts, and platform engineers to envision solutions to real-world, ambiguous business use cases with low latency/ high throughput.
Focus on identifying and solving assigned problems with simple and elegant solutions, while working backwards from desired requirements.
Quickly propose and validate hypotheses to direct the science roadmap. Own time-bound, End-to-End (E2E) solutions for ML applications, ranging from resource, requirements gathering, data collection, cleaning and annotation, model development, and validation.
Brainstorm, deep dive, implement, and debug into fundamentals of the systems (eg, architectures, losses, efficiency, serving, etc), while writing clean code.
Define proper output Data Science metrics.
Clearly communicate findings verbally and in writing to stakeholders of varied backgrounds. Have attention to detail.
Engage and initiate collaborative efforts to meet ambitious (applied research and product/client delivery) goals.
Innovate and advance State-Of-The-Art (SOTA) in-house solutions, and communicate findings as IPs (patents, papers), as deemed applicable by business.
About You:
Pursuing M.Tech, MS (Research), PhD in a technical field (eg, CS, EE, EC, Remote Sensing, etc), preferably from leading academic/ industrial labs/institutes, or corporates. Undergraduates/Dual-Degree with research experience as mentioned below may also be considered.
A proven track record of relevant experience in computer vision, NLP, learning theory, optimization, ML systems, foundational models, etc
Technically familiar with some, or most of (as evidenced by problem-solving skills in novel scenarios): Convolutional Neural Networks (CNNs), LSTMs/RNNs/GRUs, Transformers, UNet, YOLO, RCNN, Encoder-Decoder Architectures, Generative Models (GAN, VAE, Diffusion), Contrastive Learning, Self-Supervised Learning, Semi-Supervised Learning, Representation Learning, Image Super Resolution, Traditional Machine Learning (Classification, Regression, Clustering), Active Learning, Learning with Noisy Labels, Multimodal Learning, Synthetic Aperture Radar (SAR)/VV-VH bands, Normalized Difference Vegetation Index (NDVI), False Colour Composite (FCC), Dimensionality Reduction (PCA, UMAP, Isomap), Time-Series Modeling/ Forecasting, Model compression (Distillation, Pruning, Quantization), Automatic Mixed Precision training, Fourier Neural Operator (FNO), Climate+AI, Domain Adaptation, Domain Generalization, Anomaly Detection etc
Experience working in industry (0-1 years of experience), if applicable will also be considered.
Candidates with prior publications in (main tracks/ workshops of) ICLR, CVPR, ICCV, ECCV, NeurIPS, ICML, AAAI, IJCAI, ACL, EMNLP, TACL, NAACL, TMLR, IGARSS, InGARSS, IEEE Transactions, etc, would have an edge too (with preference to first-authored ones).
Proficiency in at least one general programming language (preferably, Python), along with strong hands-on experience with ML frameworks (eg PyTorch) in terms of training large, optimized, scalable, ML models.
Experience with SQL, large-scale distributed systems (eg, Spark), and MLOps will be handy.