Raunak Shah

profile_pic2.jpeg

I like working on hard and impactful problems in the data/systems/ML space. I am currently working on optimizing the ML systems that power the multimodal, generative foundation models at Adobe Firefly.

Prior to this, I graduated with an MSCS degree at the University of Illinois, Urbana-Champaign. I was advised by Prof. Yongjoo Park, and worked on building an improved, quantization-aware data format to reduce model size and saving/loading time for foundation models. During my masters I also interned at LanceDB. A blog describing my experience and the work I did can be found here. I worked on the open source file format called Lance, built on top of Apache Arrow in Rust, designed as a next generation data format targeting AI/ML workloads. I specifically added features to improve decompression, the speed of full scans and random access to data.

Previously, I was a Research Engineer at the Data-driven Systems, Insights, and Experiences Group at Adobe for 2 years, where I worked on reducing storage and compute costs, by using intelligent algorithms and ML for dynamic deduplication, load balancing, and storage tiering (Spark/Scala/Python). This involved working with enterprise customer data at large scales (TBs-PBs) and processing system level metrics.

I received my undergraduate degree in Electrical Engineering from IIT Kanpur in 2021. Earlier in 2019, I visited the University of California, San Diego (UCSD), where I worked on deep learning for autonomous driving applications - more specifically, I worked on unsupervised pixel-level depth estimation in driving scenes. I was advised by Prof. Dinesh Bharadia and Dr. Gaurav Bansal.

At college, I also co-founded and led the IIT Kanpur Consulting Group, where we helped social organizations and government entities leverage better insights from their data through data science and machine learning.

My Google Scholar can be found here. Feel free to reach out for any discussions, collaborations, or opportunities.

Publications and Preprints

  1. VLDB’26
    QStore: Quantization-Aware Compressed Model Storage
    Raunak Shah, Zhaoheng Li, and Yongjoo Park
    International Conference on Very Large Data Bases (VLDB) 2026
  2. SIGMOD’24
    R2D2: Reducing Redundancy and Duplication in Data Lakes
    Raunak Shah, Koyel Mukherjee, Atharv Tyagi, Dhruv Joshi, Subrata Mitra, Shivam Bhosale, and Sai Karnam
    International Conference on Management of Data (SIGMOD) 2024
  3. ICDE’23
    Towards Optimizing Storage Costs on the Cloud
    Koyel Mukherjee*, Raunak Shah*, Shiv Saini, Karanpreet Singh,  Khushi, Harsh Kesarwani, Kavya Barnwal, and Ayush Chauhan
    International Conference on Data Engineering (ICDE) 2023
  4. ECCV’20
    S3Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data
    Bin Cheng, Inderjot Singh Saggu, Raunak Shah, Gaurav Bansal, and Dinesh Bharadia
    European Conference on Computer Vision (ECCV) 2020
  5. ICML Workshop
    AI-based Monitoring and Response System for Hospital Preparedness towards COVID-19 in Southeast Asia
    Tushar Goswamy, Naishadh Parmar, Ayush Gupta, Raunak Shah, Vatsalya Tandon, Varun Goyal, Sanyog Gupta, Karishma Laud, and 3 more authors
    ICML Workshop on Healthcare Systems, Population-Health, and the Role of Health-Tech, 2020
  6. ICML Workshop
    IIT Kanpur Consulting Group: Using Machine Learning and Management Consulting for Social Good
    Tushar Goswamy*, Vatsalya Tandon*, Naishadh Parmar*, Raunak Shah*, and Ayush Gupta*
    ICML Workshop on Healthcare Systems, Population-Health, and the Role of Health-Tech, 2020
  7. Preprint
    DELFI: Deep Mixture Models for Long-term Air Quality Forecasting in the Delhi National Capital Region
    Naishadh Parmar, Raunak Shah, Tushar Goswamy, Vatsalya Tandon, Ravi Sahu, Ronak Sutaria, Purushottam Kar, and Sachchida Nand Tripathi
    arXiv preprint arXiv:2210.15923 2021

Patents

  1. Automated Generation of Labels for Data Governance
    Tathagato Roy, Nikhil Manjrekar, Koyel Mukherjee, Atharv Tyagi, and Raunak Shah
    USPTO 12536215. Granted on Jan 27, 2026
  2. Relating Data in Data Lakes
    Raunak Shah, Koyel Mukherjee, Subrata Mitra, Sai Keerthana Karanam, Dhruv Joshi, and Shivam Bhosale
    USPTO 12436926. Granted on Oct 7, 2025
  3. Optimizing Storage-Related Costs with Compression in a Multi-Tiered Storage Device
    Raunak Shah, Koyel Mukherjee, Shiv Kumar Saini, Karanpreet Singh,  Khushi, Kavya Barnwal, Harsh Kesarwani, and Ayush Chauhan
    USPTO 11907531. Granted on Feb 20, 2024
  4. Computing Resource Allocation Mechanism Testing and Deployment
    Raunak Shah, Shiv Saini, and Atanu Sinha
    US Patent App. 18/178,715. Filed on Mar 6, 2023
  5. Fraud Detection in NFT Exchanges
    Raunak Shah,  Harshvardhan, Mohit Kumar, Shambhavi Pardhi, Alakh Dixit, Shaddy Garg, Shiv Saini, and Ramasuri Narayanam
    US Patent App. 18/181,018. Filed on Mar 9, 2023
  6. Generating Fact Trees for Data Storytelling
    Raunak Shah, Vibhor Porwal, Koyel Mukherjee, Iftikhar Burhanuddin, Fan Du, Saurabh Mahapatra, and Annamalai Annamalai
    US Patent App. 18/471,996. Filed on Sep 16, 2023