About Me

I am an Assistant Professor in the NLP group at IT University of Copenhagen.

I completed my PhD at the University of Edinburgh, where my thesis focused on neural planning for generating long documents from tabular data. It received the Best Dissertation in Scotland award from SICSA. During my PhD, I also interned with the Summarization team at Google Research, London.

My research interests include:

  • Structured and Long-Context Modeling: I work on improving models’ ability to process structured data and long sequences. This includes neural planning for generation from tabular inputs (TACL’21, TACL’22), long-context modeling for summarization and sequences (ACL’23, ICLR’25 W), and genome modeling via task-specific self-pretraining (ICML-GenBio’25).

  • Multilinguality, Transfer Learning, and Interpretability: I explore methods to make LLMs effective for low-resource and non-Roman script languages through romanization (RomanSetu, ACL’24) and language-relatedness-based chunking (DecoMT, EMNLP’23). I also study how LLMs internally represent such multilingual data, including latent romanization (RomanLens, ACL’25).

  • Mathematical Reasoning: I study mathematical reasoning in open-weight LLMs, focusing on challenges like unit consistency (VerityMath, ICML-AI4Math’24).

News

  • 12 Jun 2025: Paper on self-pretraining for genome modeling accepted to ICML 2025 Workshop on Generative AI for Biology - Paper
  • 16 May 2025: RomanLens to appear in Findings of ACL 2025 - Paper
  • 11 Feb 2025: RomanLens paper on latent romanization in multilingual LLMs - Paper
  • 25 Sep 2024: Paper on vocabulary expansion and initialization strategies for LLMs accepted to CoNLL 2024 - Paper
  • 13 Jun 2024: VerityMath paper accepted to AI4Math workshop at ICML 2024 - Paper
  • 16 May 2024: Two papers accepted to ACL: RomanSetu and a paper on Indic MT Eval - RomanSetu Preprint - Indic MT Eval Preprint
  • 25 Jan 2024: Introducing Airavata, Hindi Instruction-tuned LLM - Blog
  • 24 Jan 2024: RomanSetu for unlocking multilingual capabilities of Large Language Models via Romanization - Preprint
  • 21 Nov 2023: IndicTrans2 is accepted to Transactions of Machine Learning Research (TMLR) - Preprint
  • 13 Nov 2023: VerityMath for applying unit consistency check for math problem solving - Preprint
  • 9 Oct 2023: Two papers accepted to EMNLP. DecoMT is accepted to Main and CTQScorer to Findings - DecoMT Preprint - CTQScorer Preprint

Selected Publications

For my latest publications, please visit my Google Scholar profile.