Raunak Shah

I am currently in the final year of my Masters in Computer Science at the University of Illinois, Urbana-Champaign. I am advised by Prof. Yongjoo Park, where I am working on developing an improved data storage format for LLM models.
I recently finished my summer internship as a Software Engineer Intern at LanceDB. A blog describing my experience and the work I did can be found here. I worked on the open source file format called Lance, built on top of Apache Arrow in Rust, designed as a next generation data format targeting AI/ML workloads. I specifically added features to improve decompression, the speed of full scans and random access to data.
Previously, I was a Research Engineer II at the Data-driven Systems, Insights, and Experiences Group at Adobe for 2 years, where I designed and implemented solutions to reduce storage/compute costs, improve data management & governance, and optimize the performance of data-driven systems in Adobe’s Experience Platform. This involved working with data at large scales (TBs-PBs) and applying ML and data analytics on enterprise customer data and system level metrics.
I received my undergraduate degree in Electrical Engineering from IIT Kanpur in 2021. Earlier in 2019, I visited the University of California, San Diego (UCSD), where I worked on deep learning for autonomous driving applications - more specifically, I worked on unsupervised pixel-level depth estimation in driving scenes. I was advised by Prof. Dinesh Bharadia and Dr. Gaurav Bansal.
At college, I also co-founded and led the IIT Kanpur Consulting Group, where we helped social organizations and government entities leverage better insights from their data through data science and machine learning.
My CV can be found here. Feel free to reach out for any discussions or collaborations.
Publications and Preprints
Patents
- Optimizing Storage-Related Costs with Compression in a Multi-Tiered Storage DeviceUSPTO 11907531. Granted on Feb 20, 2024
- Computing Resource Allocation Mechanism Testing and DeploymentUS Patent App. 18/178,715. Filed on Mar 6, 2023
- Fraud Detection in NFT ExchangesUS Patent App. 18/181,018. Filed on Mar 9, 2023
- Relating Data in Data LakesUS Patent App. 18/319,748. Filed on May 18, 2023
- Generating Fact Trees for Data StorytellingUS Patent App. 18/471,996. Filed on Sep 16, 2023
- Automated Generation of Labels for Data GovernanceUS Patent App. 19/416,567. Filed on May 22, 2024