About Me

I am an Assistant Professor in the NLP group at IT University of Copenhagen.

I completed my PhD at the University of Edinburgh, where my thesis on neural planning for long-document generation received the best dissertation in Scotland award from SICSA. During my PhD, I also interned with the Summarization team at Google Research, London.

My research interests include:

  • Planning and Long-Context Modeling: I work on improving models’ ability to plan and operate over long contexts, both in general NLP and scientific domains. This includes neural planning for text generation (TACL’21, TACL’22), long-context architectures for summarization and sequence modeling (ACL’23, ICLR’25 W), and genome modeling via task-specific self-pretraining (ICML-GenBio’25).

  • Multilinguality, Transfer Learning, and Interpretability: I explore methods to make LLMs effective for low-resource and non-Roman script languages through romanization (RomanSetu, ACL’24) and language-relatedness-based chunking (DecoMT, EMNLP’23). I also study how LLMs internally represent such multilingual data, including latent romanization (RomanLens, ACL’25).

  • Reasoning: I study mathematical reasoning in open-weights LLMs. Our work (VerityMath, ICML-AI4Math’24) identifies unit consistency as a key challenge and introduces Unit Consistency Programs (UCPs) as a solution.

News

  • 12 Jun 2025: Paper on self-pretraining for genome modeling accepted to ICML 2025 Workshop on Generative AI for Biology - Paper
  • 16 May 2025: RomanLens to appear in Findings of ACL 2025 - Paper
  • 11 Feb 2025: RomanLens paper on latent romanization in multilingual LLMs - Paper
  • 25 Sep 2024: Paper on vocabulary expansion and initialization strategies for LLMs accepted to CoNLL 2024 - Paper
  • 13 Jun 2024: VerityMath paper accepted to AI4Math workshop at ICML 2024 - Paper
  • 16 May 2024: Two papers accepted to ACL: RomanSetu and a paper on Indic MT Eval - RomanSetu Preprint - Indic MT Eval Preprint
  • 25 Jan 2024: Introducing Airavata, Hindi Instruction-tuned LLM - Blog
  • 24 Jan 2024: RomanSetu for unlocking multilingual capabilities of Large Language Models via Romanization - Preprint
  • 21 Nov 2023: IndicTrans2 is accepted to Transactions of Machine Learning Research (TMLR) - Preprint
  • 13 Nov 2023: VerityMath for applying unit consistency check for math problem solving - Preprint
  • 9 Oct 2023: Two papers accepted to EMNLP. DecoMT is accepted to Main and CTQScorer to Findings - DecoMT Preprint - CTQScorer Preprint

Selected Publications

For my latest publications, please visit my Google Scholar profile.