Education

University of Oregon
Ph.D. in Computer Science, advised by Prof. Thien Huu Nguyen.
Fall 2023 - Present
Hanoi University of Science and Technology
B.S. in Computer Science, advised by Dr. Linh Ngo Van.
Fall 2018 - Spring 2023

Employment

VinAI Research
Research Resident on the Natural Language Processing Group. Worked on Information Extraction tasks and Large Language Models (LLMs).
Mar 2022 - Sep 2023
Hanoi, Vietnam
VietAI
Lecturer for courses: - Advances in Natural Language Processing - LLMs & Industry Practices
Mar 2023 - Current
Ha Noi, Vietnam
ICOMM Media and Tech., Jsc
Research Intern on the RnD team. Developed and deployed an Open-domain Question Answering System for the Vietnamese language.
Jun 2019 - Sep 2020
Ha Noi, Vietnam

Selected Publications

CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages We introduced a largest multilingual dataset with 6.3 trillion tokens in 167 languages, readily usable for Large Language Models (LLMs) development
LREC-Coling 2024
Transitioning Representations between Languages for Cross-lingual Event Detection via Langevin Dynamics We explored a novel alignment method for cross-lingual transfer learning in Event Detection.
EMNLP 2023 (Findings)
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning A framework that introduces resources and models for instruction tuning for LLMs with RLHF in 26 languages.
EMNLP 2023 (Demonstration)
A Spectral Viewpoint on Continual Relation Extraction A novel method for Continual Relation Extraction (CRE) with Feature Decorrelation.
EMNLP 2023 (Findings)
Retrieving Relevant Context to Align Representations for Cross-lingual Event Detection
C. Nguyen, L. Van, T. Nguyen
A new approach for the cross-lingual transfer learning problem in Event Detection using Retrieval-Augmented method.
ACL 2023 (Findings)
Contextualized Soft Prompts for Extraction of Event Arguments
C. Nguyen, H. Man, T. Nguyen
A novel approach for document-level Event Argument Extraction (EAE) using graph-based soft prompts with better customizability and stability.
ACL 2023 (Findings)