I am Soumi Das, a post-doctoral researcher at Max Planck Institute for Software Systems, Saarbruecken advised by Professor Krishna Gummadi. Presently I'm working on understanding various aspects of large language models like knowledge estimation, privacy-utility-efficiency tradeoffs, and language learning. I have worked in my PhD tenure under the supervision of Dr. Sourangshu Bhattacharya in the Department of Computer Science and Engineering (C.S.E), Indian Institute of Technology (IIT) Kharagpur. I joined Complex NEtwork Research Group (CNeRG) of this department in May, 2017. My research area during PhD broadly spans Machine Learning and Computer Vision and dealing with subset selection problems in the field of data-centric AI.
I have completed my B.Sc (Computer Science Honours) from St. Xaviers College in 2014. Then I completed my M.Sc (Computer Science) from the Institute of Science, Banaras Hindu University (B.H.U) in 2016. I submitted my PhD thesis entitled 'Algorithms for online subset selection and data valuation in data-centric AI' at IIT Kharagpur in July 2023. A quick overview of the works can be viewed at this projects page.
-
Qinyuan Wu, Mohammad Aflah Khan, Soumi Das, Vedant Nanda, Bishwamittra Ghosh, Camila Kolling, Till Speicher, Laurent Bindschaedler, Krishna P. Gummadi, and Evimaria Terzi, Towards Reliable Latent Knowledge Estimation in LLMs: Zero-Prompt Many-Shot Based Factual Knowledge Extraction, International Conference on Web Search and Data Mining (WSDM 2025)[paper] [code]
-
Soumi Das, Manasvi Sagarkar, Suparna Bhattacharya and Sourangshu Bhattacharya, CheckSelect: Online Checkpoint Selection for Flexible, Accurate, Robust, and Efficient Data Valuation, IEEE Transactions on Artificial Intelligence (IEEE TAI (2024))[paper] [code]
-
Kiran Purohit, Soumi Das, Sourangshu Bhattacharya, and Santu Rana, LearnDefend: Learning to Defend against Targeted Model-Poisoning Attacks on Federated Learning, European Conference on Artificial Intelligence, ECAI 2024 [paper] [Slides]
-
Soumi Das, Shubhadip Nag, Shreyyash Sharma, Suparna Bhattacharya, and Sourangshu Bhattacharya, VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI, Data-centric Machine Learning Research (DMLR) @ ICLR 2024 [paper] [code] [Poster]
-
Kiran Purohit, Anurag Parvathgiri, Soumi Das, and Sourangshu Bhattacharya, Accurate and Efficient Channel pruning via Orthogonal Matching Pursuit, AIML Systems 2022 [paper]
-
Soumi Das, Harikrishna Patibandla, Suparna Bhattacharya, Kshounis Bera, Niloy Ganguly, and Sourangshu Bhattacharya, TMCOSS: Thresholded Multi-Criteria Online Subset Selection for Data-Efficient Autonomous Driving, ICCV 2021 [paper] [code] [Recording] [Slides] [Poster]
-
Soumi Das, Arshdeep Singh, Saptarshi Chatterjee, Suparna Bhattacharya, and Sourangshu Bhattacharya, Finding High-Value Training Data Subset through Differentiable Convex Programming, ECML-PKDD 2021 [paper] [code] [Recording] [Slides] [Poster]
-
Soumi Das, Sayan Mandal, Ashwin Bhoyar, Madhumita Bharde, Niloy Ganguly, Suparna Bhattacharya, and Sourangshu Bhattacharya, Multi-criteria online frame-subset selection for autonomous vehicle videos, Pattern Recognition Letters (2020) [paper]
-
Soumi Das, Camila Kolling, Mohammad Aflah Khan, Mahsa Amani, Bishwamittra Ghosh, Qinyuan Wu, Till Speicher, and Krishna P. Gummadi, Revisiting Privacy, Utility, and Efficiency Trade-offs when Fine-Tuning Large Language Models [paper]
-
Till Speicher, Mohammad Aflah Khan, Qinyuan Wu, Vedant Nanda, Soumi Das, Bishwamittra Ghosh, Krishna P. Gummadi, and Evimaria Terzi , Understanding Memorisation in LLMs: Dynamics, Influencing Factors, and Implications [paper]
-
Soumi Das, Rajath Nandan Kalava, Kolli Kiran Kumar, Akhil Kandregula, Kalpam Suhaas, Sourangshu Bhattacharya, and Niloy Ganguly, Map Enhanced Route Travel Time Prediction using Deep Neural Networks [paper]
-
Are Emergent Abilities of Large Language Models a Mirage?, NeurIPS 2023
-
PhD Thesis - Algorithms for online subset selection and data valuation in data-centric AI
You can reach me through email at soumid.04 [at] gmail [dot] com
LinkedIn: soumi-das
Twitter Handle: soumi_das0407