Publications of Aditya Krishna Menon

Preprints

Universal model routing for efficient LLM inference.
Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Zifeng Wang, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, and Sanjiv Kumar.
Manuscript, 2025.
[pdf]
A little help goes a long way: efficient LLM training by leveraging small LMs.
Ankit Singh Rawat, Veeranjaneyulu Sadhanala, Afshin Rostamizadeh, Ayan Chakrabarti, Wittawat Jitkrittum, Vladimir Feinberg, Seungyeon Kim, Hrayr Harutyunyan, Nikunj Saunshi, Zachary Nado, Rakesh Shivanna, Sashank J. Reddi, Aditya Krishna Menon, Rohan Anil, and Sanjiv Kumar.
Manuscript, 2024.
[pdf]

2025
Bipartite ranking from multiple labels: on loss versus label aggregation.
Michal Lukasik, Lin Chen, Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Felix X. Yu, Sashank J. Reddi, Gang Fu, Mohammadhossein Bateni, and Sanjiv Kumar.
In International Conference on Machine Learning (ICML), 2025.
[pdf]
Faster cascades via speculative decoding.
Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, and Sanjiv Kumar.
In International Conference on Learning Representations (ICLR), 2025.
[pdf]
Better autoregressive regression with LLMs via regression-aware fine-tuning.
Michal Lukasik, Zhao Meng, Harikrishna Narasimhan, Yin-Wen Chang, Aditya Krishna Menon, Felix X. Yu, and Sanjiv Kumar.
In International Conference on Learning Representations (ICLR), 2025.
[pdf]

2024
Regression-aware inference with LLMs.
Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, and Sanjiv Kumar.
In Empirical Methods in Natural Language Processing Findings (EMNLP Findings), 2024.
[pdf]
Unified single-model training achieving diverse scores for information retrieval.
Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Wittawat Jitkrittum, Veeranjaneyulu Sadhanala, Sadeep Jayasumana, Aditya Krishna Menon, Rob Fergus, and Sanjiv Kumar.
In International Conference on Machine Learning (ICML), 2024.
[pdf]
Language model cascades: token-level uncertainty and beyond.
Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, and Sanjiv Kumar.
In International Conference on Learning Representations (ICLR), 2024.
[pdf]
Learning to reject meets long-tail learning.
Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Neha Gupta, and Sanjiv Kumar
In International Conference on Learning Representations (ICLR), 2024.
[pdf]
Plugin estimators for selective classification with out-of-distribution detection.
Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, and Sanjiv Kumar.
In International Conference on Learning Representations (ICLR), 2024.
[pdf]
DistillSpec: improving speculative decoding via knowledge distillation.
Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, and Rishabh Agarwal.
In International Conference on Learning Representations (ICLR), 2024.
[pdf]
Think before you speak: training language models with pause tokens.
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, and Vaishnavh Nagarajan.
In International Conference on Learning Representations (ICLR), 2024.
[pdf]
The importance of feature preprocessing for differentially private linear optimization.
Ziteng Sun, Ananda Theertha Suresh, and Aditya Krishna Menon.
In International Conference on Learning Representations (ICLR), 2024.
[pdf]
What do larger image classifiers memorise?
Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, and Sanjiv Kumar.
In Transactions of Machine Learning Research (TMLR), 2024.
[pdf]

2023
When does confidence-based cascade deferral suffice?
Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, and Sanjiv Kumar.
In Advances in Neural Information Processing Systems (NeurIPS), 2023.
[pdf]
On student-teacher deviations in distillation: does it pay to disobey?
Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, and Sanjiv Kumar.
In Advances in Neural Information Processing Systems (NeurIPS), 2023.
[pdf]
ResMem: Learn what you can and memorize the rest.
Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, and Sanjiv Kumar.
In Advances in Neural Information Processing Systems (NeurIPS), 2023.
[pdf]
Robust distillation for worst-class performance: on the interplay between teacher and student objectives.
Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, and Aditya Krishna Menon.
In Uncertainty in Artificial Intelligence (UAI), 2023.
[pdf]
Supervision complexity and its role in knowledge distillation.
Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon Kim, and Sanjiv Kumar.
In International Conference on Learning Representations (ICLR), 2023.
[pdf]

2022
Post-hoc estimators for learning to defer to an expert.
Harikrishna Narasimhan, Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, and Sanjiv Kumar.
In Advances in Neural Information Processing Systems (NeurIPS), 2022.
[pdf]
Teacher's pet: understanding and mitigating biases in distillation.
Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, and Sanjiv Kumar.
In Transactions of Machine Learning Research (TMLR), 2022.
[pdf]
Interval-censored Hawkes processes.
Marian-Andrei Rizoiu, Alexander Soen, Shidi Li, Pio Calderon, Leanne J. Dong, Aditya Krishna Menon, and Lexing Xie.
In Journal of Machine Learning Research (JMLR), 2022.
[pdf]
In defense of dual-encoders for neural ranking.
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Seungyeon Kim, Sashank J. Reddi, and Sanjiv Kumar.
In International Conference on Machine Learning (ICML), 2022.
[pdf]

2021
Training over-parameterized models with non-decomposable objectives.
Harikrishna Narasimhan and Aditya Krishna Menon.
In Advances in Neural Information Processing Systems (NeurIPS), 2021.
[pdf]
Disentangling sampling and labeling bias for learning in large-output spaces.
Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, and Sanjiv Kumar.
In International Conference on Machine Learning (ICML), 2021.
[pdf]
A statistical perspective on distillation.
Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, and Sanjiv Kumar.
In International Conference on Machine Learning (ICML), 2021.
[pdf]
RankDistil: knowledge distillation for ranking.
Sashank Reddi, Rama Kumar Pasumarthi, Aditya Krishna Menon, Ankit Singh Rawat, Felix Yu, Seungyeon Kim, Andreas Veit, and Sanjiv Kumar.
In Artificial Intelligence and Statistics (AISTATS), 2021.
[pdf]
Coping with label shift via distributionally robust optimisation.
Jingzhao Zhang, Aditya Krishna Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, and Suvrit Sra.
In International Conference on Learning Representations (ICLR), 2021.
[pdf]
Overparameterisation and worst-case generalisation: friend or foe?
Aditya Krishna Menon, Ankit Singh Rawat, and Sanjiv Kumar.
In International Conference on Learning Representations (ICLR), 2021.
[pdf]
Long-tail learning via logit adjustment.
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar.
In International Conference on Learning Representations (ICLR), 2021.
[pdf]

2020
Robust large-margin learning in hyperbolic space.
Melanie Weber, Manzil Zaheer, Ankit Singh Rawat, Aditya Krishna Menon, and Sanjiv Kumar.
In Advances in Neural Information Processing Systems (NeurIPS), 2020.
[pdf]
Semantic label smoothing for sequence to sequence problems.
Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu and Sanjiv Kumar.
In Empirical Methods in Natural Language Processing (EMNLP), 2020.
[pdf]
SupMMD: a sentence importance model for extractive summarization using maximum mean discrepancy.
Umanga Bista, Alexander Patrick Mathews, Aditya Krishna Menon, and Lexing Xie.
In Empirical Methods in Natural Language Processing Findings (EMNLP Findings), 2020.
[pdf]
Does label smoothing mitigate label noise?
Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, and Sanjiv Kumar.
In International Conference on Machine Learning (ICML), 2020.
[pdf]
Supervised learning: no loss no cry.
Richard Nock and Aditya Krishna Menon.
In International Conference on Machine Learning (ICML), 2020.
[pdf]
Federated learning with only positive labels.
Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, and Sanjiv Kumar.
In International Conference on Machine Learning (ICML), 2020.
[pdf]
Can gradient clipping mitigate label noise?
Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, and Sanjiv Kumar.
In International Conference on Learning Representations (ICLR), 2020.
[pdf]

2019
Noise-tolerant fair classification.
Alexandre Louis Lamy, Ziyuan Zhong, Aditya Krishna Menon, and Nakul Verma.
In Advances in Neural Information Processing Systems (NeurIPS), Vancouver, 2019.
[pdf]
Multilabel reductions: what is my loss optimising?
Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, and Sanjiv Kumar.
In Advances in Neural Information Processing Systems (NeurIPS), Vancouver, 2019.
[pdf] [slides] [poster]
Fairness risk measures.
Robert C. Williamson and Aditya Krishna Menon.
In International Conference on Machine Learning (ICML), Long Beach, 2019.
[pdf]
Complementary-label learning for arbitrary losses and models.
Takashi Ishida, Gang Niu, Aditya Krishna Menon, and Masashi Sugiyama.
In International Conference on Machine Learning (ICML), Long Beach, 2019.
[pdf]
Monge blunts Bayes: hardness results for adversarial training.
Zac Cranko, Aditya Krishna Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, and Christian Walder.
In International Conference on Machine Learning (ICML), Long Beach, 2019.
[pdf]
On the minimal supervision for training any binary classifier from only unlabeled data.
Nan Lu, Gang Niu, Aditya Krishna Menon and Masashi Sugiyama.
In International Conference on Learning Representations (ICLR), New Orleans, 2019.
[pdf]
Comparative document summarisation via classification.
Umanga Bista, Alexander Mathews, Minjeong Shin, Aditya Krishna Menon and Lexing Xie.
In AAAI Conference on Artificial Intelligence (AAAI), Honolulu, 2019.
[pdf]
The risk of trivial solutions in bipartite top ranking.
Aditya Krishna Menon.
In Machine Learning, Volume 108, Issue 4, 2019.
[pdf]

2018
A loss framework for calibrated anomaly detection.
Aditya Krishna Menon and Robert C. Williamson.
In Advances in Neural Information Processing Systems (NeurIPS), Montreal, 2018.
[pdf]
Learning from binary labels with instance-dependent noise.
Aditya Krishna Menon, Brendan van Rooyen, and Nagarajan Natarajan.
In Machine Learning, Volume 107 Issue 8-10 (Special Issue on Papers from ECML-PKDD), 2018.
[pdf]
The cost of fairness in binary classification.
Aditya Krishna Menon and Robert C. Williamson.
In Conference on Fairness, Accountability, and Transparency (FAT), New York City, 2018.
[pdf]
Proper losses for nonlinear Hawkes processes.
Aditya Krishna Menon and Young Lee.
In AAAI Conference on Artificial Intelligence (AAAI), New Orleans, 2018.
[pdf]

2017
f-GANs in an information geometric nutshell.
Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu and Robert C. Williamson.
In Advances in Neural Information Processing Systems (NIPS), Los Angeles, 2017.
[pdf]
Predicting short-term public transport demand via inhomogeneous Poisson processes.
Aditya Krishna Menon and Young Lee.
In International Conference on Information and Knowledge Management (CIKM), Singapore, 2017.
[pdf]
Revisiting revisits in trajectory recommendation.
Aditya Krishna Menon, Dawei Chen, Lexing Xie and Cheng Soon Ong.
In RecSys Workshop on Recommender Systems for Citizens (CitRec), Como, 2017.
[pdf]
Robust, deep and inductive anomaly detection.
Raghavendra Chalapathy, Aditya Krishna Menon and Sanjay Chawla.
In European Conference on Machine Learning (ECML/PKDD), Skopje, 2017.
[pdf]
Making deep neural networks robust to label noise: a loss correction approach.
Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, Lizhen Qu.
In Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017.
[pdf] [slides]
Low-rank linear cold-start recommendation from social data.
Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, Lexing Xie, and Darius Braziunas.
In AAAI Conference on Artificial Intelligence (AAAI), San Francisco, 2017.
[pdf]

2016
Bipartite ranking: a risk-theoretic perspective.
Aditya Krishna Menon and Robert C. Williamson.
In Journal of Machine Learning Research (JMLR), Volume 17, Issue 195. 2016.
[pdf] [liner notes]

Back in 2011, months of exhaustive experimentation with AUC maximisation in the context of link prediction taught me one thing: it's hard to do better than logistic regression. I was excited when later that year, Kotłowski et al. gave a compelling theoretical explanation for why this might be.

When I joined NICTA in 2013, I followed a rite of passage and read Mark Reid and Bob Williamson's tour de force on information, divergences, and risks. Tucked away in that paper was a mention of how the AUC related to the concepts presented. I innocently mentioned to Bob that perhaps there was more to be said here. He agreed, and suggested that I start by spelling out what was in my mind.

We never expected it would take quite that long to spell. [hide]
A scaled Bregman theorem with applications.
Richard Nock, Aditya Krishna Menon and Cheng Soon Ong.
In Advances in Neural Information Processing Systems (NIPS), Barcelona, 2016.
[pdf]
Linking losses for density ratio and class-probability estimation.
Aditya Krishna Menon and Cheng Soon Ong.
In International Conference on Machine Learning (ICML), New York City, 2016.
[pdf] [slides] [poster] [code] [liner notes]

Long before being introduced to proper losses, I'd known about, and completely accepted, the basic premise as to why density ratio was different to class-probability estimation: the later only indirectly modelled the ratio, which is generally a sub-optimal strategy. Sometime in late 2015, I re-read the cool work on LSIF. This time, with link functions and partial losses fresh in my mind, I noticed that one could think of this as involving a particular proper loss plus link function. This was mildly interesting, but the point remained: using anything but the link that directly yields the density ratio must be a bad idea.

This was something of a defeat for my running theory at the time, that most everything can be attacked with some version of logistic regression. To better understand this failure mode, I was looking at Sugiyama's elegant unified Bregman view of density ratios. I noticed that logistic regression was mentioned, but didn't pay much attention to it; after all, the Bregman view of regret under proper losses immediately led to the KL minimisation view of logistic regression.

It was only on a re-read that I realised there was a bit more to this innocuous result: it was a statement of Bregman minimisation not to the true probability, but to the true density ratio. Now that I didn't know was true for logistic regression. Studying the surprisingly simple proof, I saw it could be viewed as a surprising equality between the KL divergence on probabilities and on ratios.

I was sure this couldn't be true in general; but why? I worked through the case of a general divergence, trying to find out the line where the proof would break.

Half an hour later, after staring at my working multiple times, I was still incredulous that everything seemed to work in general, too. And so was Lemma 2 born. [hide]
Practical linear models for large-scale one-class collaborative filtering.
Suvash Sedhain, Hung Bui, Jaya Kawale, Nikos Vlassis, Branislav Kveton, Aditya Krishna Menon, Trung Bui and Scott Sanner.
In International Joint Conference on Artificial Intelligence (IJCAI), New York City, 2016.
[pdf]
On the effectiveness of linear models for one-class collaborative filtering.
Suvash Sedhain, Aditya Krishna Menon, Scott Sanner and Darius Braziunas.
In AAAI Conference on Artificial Intelligence (AAAI), Phoenix, 2016.
[pdf] [slides] [code]

2015
Learning with symmetric label noise: the importance of being unhinged.
Brendan van Rooyen, Aditya Krishna Menon and Robert C. Williamson.
In Advances in Neural Processing Systems (NIPS), Montreal, 2015.
[pdf] [poster]
Fine-grained OD estimation with automated zoning and sparsity regularisation.
Aditya Krishna Menon, Chen Cai, Weihong Wang, Tao Wen and Fang Chen.
In Transportation Research Part B: Methodological, Volume 80, October 2015, Pages 150-172.
[pdf]
Learning from corrupted binary labels via class-probability estimation.
Aditya Krishna Menon, Brendan van Rooyen, Cheng Soon Ong and Robert C. Williamson.
In International Conference on Machine Learning (ICML), Lille, 2015.
[pdf] [slides] [poster] [code] [liner notes]

Ever since I read Elkan and Noto's 2008 paper on PU learning, I was fascinated by the topic and their approach; I also felt like I didn't fully grok either, since their sample-selection bias viewpoint was unfamiliar to me. I was inspired to rectify this deficiency on my part when reading one of du Plessis and Sugiyama's elegant papers that extended this work, and crisply stated the problem in terms of the distributions the underlying samples are drawn from.

Around the same time, I also stumbled upon some beautiful works on label noise, notably those of Scott et al. and Natarajan et al. I was struck by the apparent similarity of the two problems, but still couldn't quite decipher the apparently different approaches taken to solve them.

At this stage, I sought to clarify in my mind how precisely these different problems relate. Which, as is my wont, involved thinking about them in terms of class-probabilities. On discussing ideas with colleagues at NICTA (serendipitously interested in very similar problems), soon enough we ended up writing this paper. [hide]
AutoRec: autoencoders meet collaborative filtering.
Suvash Sedhain, Aditya Krishna Menon, Scott Sanner and Lexing Xie.
In International World Wide Web Conference (WWW), Florence, 2015.
[pdf] [code] [poster]
Cross-modal retrieval: a pairwise classification approach.
Aditya Krishna Menon, Didi Surian and Sanjay Chawla.
In SIAM Conference on Data Mining (SDM), Vancouver, 2015.
[pdf (with supplementary)] [slides] [code]
An approach to sparse, fine-grained OD estimation.
Aditya Krishna Menon, Chen Cai, Weihong Wang, Tao Wen and Fang Chen.
In 94th Annual Meeting of the Transporation Research Board (TRB), Washington DC, 2015.
[pdf] [poster]

2014
Bayes-optimal scorers for bipartite ranking.
Aditya Krishna Menon and Robert C. Williamson.
In Conference on Learning Theory (COLT), Barcelona, 2014.
[pdf] [slides] [poster]
Inappropriate access detection for electronic health records using collaborative filtering.
Aditya Krishna Menon, Xiaoqian Jiang, Jihoon Kim, Lucila Ohno-Machado, and Jaideep Vaidya.
In Machine Learning, Volume 95 Number 1, Special Issue on Machine Learning for Society, 2014.
[pdf]

2013
A colorful approach to text processing by example.
Kuat Yessenov, Shubham Tulsiani, Aditya Krishna Menon, Robert C. Miller, Sumit Gulwani, Butler Lampson, and Adam Kalai.
In ACM Symposium on User Interface Software and Technology (UIST), 2013.
[pdf]
Beam search algorithms for multilabel learning.
Abhishek Kumar, Shankar Vembu, Aditya Krishna Menon, and Charles Elkan.
In Machine Learning, 2013.
[pdf]
On the statistical consistency of algorithms for binary classification under class imbalance.
Aditya Krishna Menon, Harikrishna Narasimhan, Shivani Agarwal and Sanjay Chawla.
In International Conference on Machine Learning (ICML), Atlanta, 2013.
[pdf] [slides]
A machine learning framework for programming by example.
Aditya Krishna Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, and Adam Tauman Kalai.
In International Conference on Machine Learning (ICML), Atlanta, 2013.
[pdf] [poster] [data]

2012
Learning and inference in Probabilistic Classifier Chains with beam search.
Abhishek Kumar, Shankar Vembu, Aditya Krishna Menon, and Charles Elkan.
In Machine Learning and Knowledge Discovery in Databases - European Conference (ECML-PKDD), Proceedings Part I, 2012.
[pdf]
Doubly optimized calibrated Support Vector Machine (DOC-SVM): an algorithm for joint optimization of discrimination and calibration.
Xiaoqian Jiang, Aditya Krishna Menon, Shuang Wang, Jihoon Kim, and Lucila Ohno-Machado.
In PLoS ONE, 7(11): e48823, 2012.
[pdf]
Predicting accurate probabilities with a ranking loss.
Aditya Krishna Menon, Xiaoqian Jiang, Shankar Vembu, Charles Elkan, and Lucila Ohno-Machado.
In International Conference on Machine Learning (ICML), Edinburgh, 2012.
[pdf] [poster] [code]

2011
Link prediction via matrix factorization.
Aditya Krishna Menon, Charles Elkan.
In Machine Learning and Knowledge Discovery In Databases - European Conference, ECML-PKDD, Proceedings Part II, 2011.
[pdf] [poster] [code]
Response prediction using collaborative filtering with hierarchies and side-information.
Aditya Krishna Menon, Krishna-Prasad Chitrapura, Sachin Garg, Deepak Agarwal, and Nagaraj Kota.
In Knowledge Discovery and Data Mining (KDD), San Diego, 2011.
[pdf] [slides] [poster] [code]

2010
Fast algorithms for approximating the singular value decomposition.
Aditya Krishna Menon, Charles Elkan.
In Transactions of Knowledge and Data Discovery: Special Issue on Large-Scale Data Mining (TKDD-LDMTA), 2010.
[pdf] [code]
A log-linear model with latent features for dyadic prediction.
Aditya Krishna Menon, Charles Elkan.
In International Conference on Data Mining (ICDM), Sydney, 2010.
[pdf] [slides] [code]
Predicting labels for dyadic data.
Aditya Krishna Menon, Charles Elkan.
In Data Mining and Knowledge Discovery, Special Issue on Papers from ECML-PKDD Volume 21, Number 2, 2010.
[pdf] [slides]

2007
An incremental data-stream sketch using sparse random projections.
Aditya Krishna Menon, Gia Vinh Anh Pham, Sanjay Chawla and Anastasios Viglas.
In SIAM Conference on Data Mining (SDM), Minnesota, 2007.
[pdf]

Manuscripts
Cascade-aware training of language models.
Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, and Alec Go.
Manuscript, 2024.
[pdf]
Efficient document ranking with learnable late interactions.
Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, and Sanjiv Kumar.
Manuscript, 2024.
[pdf]
EmbedDistill: a geometric knowledge distillation for information retrieval.
Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, and Sanjiv Kumar.
Manuscript, 2023.
[pdf]
When in doubt, summon the titans: efficient inference with large models.
Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, and Sanjiv Kumar.
Manuscript, 2021.
[pdf]
ELM: embedding and logit margins for long-tail learning.
Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, and Sanjiv Kumar.
Manuscript, 2022.
[pdf]
Distilling double descent.
Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, and Yichen Zhou.
Manuscript, 2021.
[pdf]
On the reproducibility of neural network predictions.
Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Krishna Menon, and Sanjiv Kumar.
Manuscript, 2021.
[pdf]
Doubly-stochastic mining for heterogeneous retrieval.
Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, and Sanjiv Kumar.
Manuscript, 2020.
[pdf]
Cold-start playlist recommendation with multitask learning.
Dawei Chen, Cheng Soon Ong, and Aditya Krishna Menon.
Manuscript, 2019.
[pdf]
Structured recommendation.
Dawei Chen, Lexing Xie, Aditya Krishna Menon, and Cheng Soon Ong.
Manuscript, 2019.
[pdf]
Large-scale Support Vector Machines: algorithms and theory.
Aditya Krishna Menon.
Research Exam, University of California, San Diego. 2009.
[pdf] [slides]
An incremental data-stream sketch using sparse random projections.
Aditya Krishna Menon, Gia Vinh Anh Pham, Sanjay Chawla and Anastasios Viglas.
Technical Report 609, University of Sydney. 2007.
[pdf]
Random projections and applications to dimensionality reduction.
Aditya Krishna Menon.
Honours thesis, University of Sydney. 2006.
[pdf] [slides]

Preprints

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2007

Manuscripts