Abhishek Jangalwa

Data Scientist and Full-Stack Engineer Specializing in Real-Time Big Data Processing, Scalable Systems, and Machine Learning.

Hi,
My name is Abhishek Jangalwa and I am a seasoned Data Scientist with over seven years of experience in creating scalable big data applications. Currently, I lead innovative projects at Oracle, specializing in machine learning, cloud technologies, and data security. My journey in data science has been driven by a passion for solving complex problems and leveraging data to drive impactful business decisions.

Throughout my career, I've successfully designed and implemented systems that process terabyte-scale data, enhanced performance, and significantly reduced operational costs. My expertise spans a wide range of technologies, including Scala, Python, Spark, Kubernetes, and various databases and frameworks.

I hold a Master of Science in Data Informatics from the University of Southern California, where I also contributed to research in human movement pattern analysis. My professional experience is complemented by a robust foundation in software development, having founded and directed a software company earlier in my career.

On this website, you'll find details about my professional journey, projects, publications, and more. Feel free to explore and connect with me on LinkedIn to discuss data science, technology, and potential collaborations.

Thank you for visiting!

Technical Skills

  • Programming Languages
    Scala, Python, Java, PHP, SQL, JavaScript, C, C++, HTML, CSS, Shell
  • Clouds
    Amazon AWS, Oracle OCI, Google GCP
  • Databases & Warehouses
    Postgres, Oracle, MongoDB, MySQL, Firebase, DynamoDB, Apache Hive, Elasticsearch, Google BigQuery, AWS S3, OCI Object Storage Apache Jenna (Triplestore)
  • Data Science Tech
    Machine Learning, Recommender systems, Entity Resolution and Linking, Deep Learning, LLMs, Federated Machine Learning, Differential Privacy,
  • Libraries & Frameworks
    Spark, Hadoop, jQuery, Ajax, Kafka, Scikit Learn, Pandas, Flask, Matplotlib, Tensorflow, Keras, RDF, Luigi, MapReduce
  • Systems & IDEs
    Kubernetes, Unix, Git, Docker, Six Sigma, IntelliJ, Grafana, Prometheus

Senior Data Scientist, Identity Graph Core Team July 2021 – Present

Oracle Advertising, Denver, CO

  • Designed and developed entity resolution systems, creating anonymized profiles for targeted advertising using terabyte-scale data and cloud technologies, coordinating with multiple stakeholders, internal teams and external clients.
  • Created a pipeline for Connected TV devices, processing 5B+ records in minutes with Scala & Spark on OCI Dataflow and AWS, boosting graph reach by 20%.
  • Led multiple projects, including synthetic data generator with Differential Privacy, ensuring data security and compliance, producing multiple join-able datasets at the scale of billions of records created with regex as well as machine learning.
  • Reduced operating cost by 30% by implementing testing frameworks for Python and Scala libraries.
  • Improved legacy reporting and ingest systems, enhancing performance and reducing failures by 70%.
  • Developed a knowledge graph POC using RDF and ontology for data integration, and mentored junior engineers.

Data Scientist, Audience Modeling Team Aug 2019 – July 2021

Oracle Advertising, Denver, CO

  • Built scalable applications for targeted ads using machine learning and graph processing on AWS and OCI.
  • Productionized a research prototype generating $20M annually by optimizing Python & Spark scripts.
  • Introduced Kubernetes for end-to-end ML pipelines, improving orchestration and resource management.
  • Enhanced Orchestration (Workflow management) and metadata storage projects with AKKA, Elasticsearch, and Postgres.

Research Engineer Intern Jan 2019 - May 2019

Evidation Health Inc., Santa Barbara, CA

  • Developed a data lake and ETL pipeline for storing and serving US pollen data using Python and AWS.
  • Programmed continuous scrapping of various open-source data sources to fetch relevant data.
  • Created a real-time Flask-based REST API for data access, ensuring efficient querying.
  • Conducted research to optimize data storage solutions for the Pollen-data-lake project.

Machine Learning Pipeline Developer (Graduate Research Assistant) November 2017 to Dec 2018

USC Information Sciences Institute, Marina del Rey, CA

  • Developing ML pipeline using Spark for TILES (Tracking Individual Performance with Sensors) study that enables Data Scientists to extract features from raw sensor data and start creating models within minutes, also providing means for model evaluation, cross validation and hyper-parameter optimization.
  • Writing and re-factoring existing Python code for managing, cleaning and transforming data, collected from 6 sensors and various surveys.
  • Constructing custom PySpark modules for analysing terabytes of time series data within minutes.

Web Developer (Student Worker) November 2017 to December 2017

USC Information Sciences Institute, Marina del Rey, CA

  • Creating and Managing the TILES (Tracking Individual Performance with Sensors Study) website using PHP, MongoDB, HTML, CSS, JavaScript, JQuery, hosted at sail.usc.edu/tiles.

Full Stack Developer, Founder Director January 2010 to April 2017

Paritosh Software Pvt. Ltd., Mandsaur, Madhya Pradesh, India

  • Responsible for day-to-day running of business with an emphasis on product and business development.
  • Spearheaded expansion and development initiatives in Mandsaur city, on ground as well as online.
  • Identified and suggested new technologies and tools for enhancing product value and team productivity.
  • Conceptualized, created and marketed engaging websites and LAN based softwares for over 100 clients.
  • Trained and mentored over 100 engineering students; helped them initiate career in web development.

Academic Projects

  • Created a GUI based software for Prediction of Liver Transplant recipient using Machine Learning, based on dataset provided by UNOS and Keck Medical Center of USC using Python, Scikit Learn, Tensorflow and Keras.
  • Implemented user based and item-based recommendation systems using Jaccard and Cosine similarity. Also implemented model-based recommendation system using Spark MLLib. Built these on Amazon Review Dataset using Scala and Spark.
  • Building a knowledge graph based on art collection of Smithsonian American Art Museum. Using Scrapy, Beautiful Soup for website crawling and Snorkel, Spacy for information extraction.
  • Achieved 89.1% accuracy with multi-class image classifier, using feed forward neural networks in Python by applying back-propagation algorithm without using any libraries.
  • Designed Website for Adventure Gurus, built using Flask (Python web framework) and MongoDB, deployed on Amazon EC2.

Full-Stack Projects & Websites

While I was a full stack engineer at Paritosh Software, India, I designed many websites. Some of those are still online and some have been redesigned. I also created many softwares using Linux-Apache-MySQL-PHP (LAMP) for billing, inventory management, online ticketing and other commercial use cases.

Official Website of Krantikari Reporter February 2017

Full Stack Developer @ Paritosh Software Private Limited, India
  • Drafted fully responsive dynamic website with 3 levels of NEWS article presentation from scratch.
  • Employed PHP, MySQL, HTML, CSS, Java Script and JQuery for Website development.
  • Developed custom Admin Panel for updating and maintaining the website hassle-free.

Official Website of Agriculture Produce Marketting Committee, Mandsaur September 2011 – December 2011

Full Stack Developer @ Paritosh Software Private Limited, India
  • Performed data collection for APMC Mandsaur, a government organization.
  • 5000 plus daily active users (DAU) with a great social following on Facebook, serving over 1.2M users.
  • Integrated bulk SMS system to reach more than twenty thousad users without access to the internet.

LAN Based Software Projects January 2010 – April 2017

Full Stack Developer @ Paritosh Software Private Limited, India
  • Programmed solutions using Linux-Apache-MySQL-PHP (LAMP) for billing, inventory management, online ticketing and other commercial use cases.
  • Conceptualized and created software solutions for around 50 government and private clients including Administrations, Educational Institutes, Manufacturers, Hospitals and Agricultural Produce Market Committees.

Find more on Portfolio.

Memberships

Contact Me

  jangalwa@usc.edu
  +1 213-210-1939