Career Summary

A data-centric software engineer, skilled at building robust, efficient and scalable applications and data pipelines for Cloud. Areas of interest: Data Science and Engineering, Machine Learning and AI, Cloud Platform Development.

Education

MS in Computer Science

2019 - Exp. May 2021
University of Wisconsin - Madison, GPA 3.68/4.00
  • Fall 2019: Machine Learning, Image Processing, Security and Privacy in Data Science
  • Spring 2020: Operating Systems, Data Management for Machine Learning
  • Fall 2020: Advanced Operating System, Big Data Systems, Advanced Deep Learning
  • Spring 2021: Introduction to Computer Security

BE(Hons.) in Computer Science

2013 - 2017
Birla Institute of Technology and Science, Pilani - Goa Campus, GPA 7.98/10.00

Experiences

Data Engineer Intern

May 2020 - Aug 2020, Jan 2021 - Present
American Family Insurance, Madison, WI
  • SQL Templating: Develop framework to template SQLs to make them re-usable. YAML config files and SQL templating with f-strings.
  • Ingestion: Data ingestion pipeline with Terabytes of ingestion from various raw sources into BigQuery. Apache Airflow for workflow management. Gitlab CI/CD and Terraform for infrastructure as code.
  • Serverless: Cloud run service for Data transformation for in-house analytics engine.
  • ETL: Worked on migrating an Extract Transform Load (ETL) pipeline for a predictive model from AWS to GCP. Designed and developed the ETL pipeline on GCP.
    GCP: Python, BigQuery, Apache Airflow AWS: PySpark, Hadoop, AWS EMR, AWS Athena, AWS Glue.
  • Docker: Migrate the containerized predictive model from AWS ECR to GCP GCR. Change all references from AWS to GCP and migrate all other dependencies like configurations etc. Test end to end execution of the pipeline and the model inference and, publish results for business consumption.

Software Engineer

Jul 2017 - May 2019
Oracle, Bengaluru, India
  • ALCM: Lead developer to enhance robustness of lifecycle operations of the Java based monitoring agent. Result - 100% success rate in the completion of any lifecycle task and essentially became a fire-and-forget operation for the end-user.
    • Robustness parameters achieved: Re-entrant execution, fault tolerance against network or host restart issues, enhanced error handling, enhanced ALCM microservice (REST API redesign for scalability, efficiency).
  • DBdesign: Modified existing schema as per multi-tenant database requirements, added hash based partitions and indexing. Achieved a 3x improvement in data retrieval speed.
  • Dockerized ALCM: Owner/initiator of containerized lifecycle operations for the containerized monitoring Agent. Developed one click solution to deploy 100s of containerized Agents, in parallel, on VMs instead of manually deploying one Agent at a time.
  • DirCompare: Designed and developed an in-house directory comparison tool in Python. Improved software packaging and footprint by 50% by eliminating redundant/outdated components from the software bundle using the said DirCompare tool.

Software Engineer Intern

Jan 2017 - Jun 2017
Retailio, Mumbai, India
  • Scraping: Created Python based data scrapers using beautifulsoup4 and requests. Used Selenium for web browser automation. Collected data of 260,000 pharmacies in India from various websites.
  • MDM: Developed REST APIs for Master Data Management tool using Django.
  • DevOps: Managed AWS deployments using Python scripts. Helped save costs by powering down non-essential compute instances at night.

Software Engineer Intern

May 2015 - Jul 2015
Atomic Energy Regulatory Board, Mumbai, India
  • MVC Webapp: Designed and developed Java web app to log courier packages going to and fro from an organisation. Technologies: Struts, Hibernate ORM, MySQL.

Projects

AllReduce - Efficient AllReduce implementation using Pytorch DDP (Distributed Data Parallel) with Gloo backend. Implemented Ring, Tree, Butterfly and Recursive Doubling and Halving topologies with sparse communication (COO format). Proved that Ring is optimal for dense tensors but for sparse tensors, you can have a variety of answers depending on the size of the tensor and the network bandwidth.
Active Learning with Weak Supervision - Combined Active Learning with Weak Supervision to form a novel semi-supervised method. Tested on sentiment analysis with on IMDb and YELP datasets and came to the following conclusion Supervised Learning >= Active Learning > Active Weak Learning (our method) > Weak Supervision. Terms: Scikit-learn, Pandas, Snorkel, Active Learning, Weak Supervision.
eBPF - Tracing and profiling of Zoom Video Conf app using bpftrace. We were able to figure out the list of syscalls, their frequencies and based on that analyze the behavior of the Zoom client under various conditions. Project as part of Adv. Operating Systems course in Fall 2020.
Multiplex - Configurations manager. Built using argparse and configurator as part of Data Management in ML course. Features: dynamically generated CLI, hierarchical configurations, custom config store.
Adversarial ML - Successfully implemented decision based adversarial attacks against speech-to-text models like Mozilla’s DeepSpeech.
Sign Language Translation - Developed tool to detect hand gestures in a video and translate to text. Terms: Histogram of gradients, SVC, Random Forrest Classifier. Accuracy: 85%
Feature Selection Algorithm - Designed and developed a new global feature selection algorithm. Achieved 10% better accuracy than Chi Square (CHI) on datasets like 20NewsGroups, Reuters.
VLIW Architecture - Designed an architecture with given assembly level instruction set. Designed Write Through LRU Counter Cache Memory with given specifications. Performed simulations and evaluated results using ModelSim (Verilog) software.

Skills

Programming Languages


Java, Python, Go, Perl, C, Bash, Windows Batch, HTML, CSS, JavaScript, SQL, PL/SQL

Tools & Frameworks


JavaEE, Django, Spring boot, Struts, Hibernate, Redis, Oracle, MySQL, PostgreSQL, Docker, NumPy, Pandas, scikit-learn, Docker, Kubernetes, Spark, Hadoop, MapReduce

Cloud Technologies


GCP Cloud Run, GCP BigQuery, Apache Airflow, GCS, GCR, AWS S3, AWS Athena, AWS EMR, AWS Glue

Misc


Gitlab CI/CD, Linux, Windows, Git, Jira, Confluence, MS Office