Raunak Shah
I like working on hard and impactful problems in the data/systems/ML space. I am currently working on optimizing the ML systems that power the multimodal, generative foundation models at Adobe Firefly.
Prior to this, I graduated with an MSCS degree at the University of Illinois, Urbana-Champaign. I was advised by Prof. Yongjoo Park, and worked on building an improved, quantization-aware data format to reduce model size and saving/loading time for foundation models. During my masters I also interned at LanceDB. A blog describing my experience and the work I did can be found here. I worked on the open source file format called Lance, built on top of Apache Arrow in Rust, designed as a next generation data format targeting AI/ML workloads. I specifically added features to improve decompression, the speed of full scans and random access to data.
Previously, I was a Research Engineer at the Data-driven Systems, Insights, and Experiences Group at Adobe for 2 years, where I worked on reducing storage and compute costs, by using intelligent algorithms and ML for dynamic deduplication, load balancing, and storage tiering (Spark/Scala/Python). This involved working with enterprise customer data at large scales (TBs-PBs) and processing system level metrics.
I received my undergraduate degree in Electrical Engineering from IIT Kanpur in 2021. Earlier in 2019, I visited the University of California, San Diego (UCSD), where I worked on deep learning for autonomous driving applications - more specifically, I worked on unsupervised pixel-level depth estimation in driving scenes. I was advised by Prof. Dinesh Bharadia and Dr. Gaurav Bansal.
At college, I also co-founded and led the IIT Kanpur Consulting Group, where we helped social organizations and government entities leverage better insights from their data through data science and machine learning.
My Google Scholar can be found here. Feel free to reach out for any discussions, collaborations, or opportunities.
Publications and Preprints
Patents
- Automated Generation of Labels for Data GovernanceUSPTO 12536215. Granted on Jan 27, 2026
- Relating Data in Data LakesUSPTO 12436926. Granted on Oct 7, 2025
- Optimizing Storage-Related Costs with Compression in a Multi-Tiered Storage DeviceUSPTO 11907531. Granted on Feb 20, 2024
- Computing Resource Allocation Mechanism Testing and DeploymentUS Patent App. 18/178,715. Filed on Mar 6, 2023
- Fraud Detection in NFT ExchangesUS Patent App. 18/181,018. Filed on Mar 9, 2023
- Generating Fact Trees for Data StorytellingUS Patent App. 18/471,996. Filed on Sep 16, 2023