CV | Jonggeun Lee

Contact Information

Name	Jonggeun Lee
Professional Title	M.S. Student in Data Science
Email	jonggeun.lee@snu.ac.kr

Professional Summary

M.S. student at Seoul National University working on multi-modal foundation models, voice assistants, tool-augmented agents, and post-training & reinforcement learning.

Experience

2024 - 2024

Hwasung, Korea
Software Engineering Intern, ML Brain SW Development Team

Samsung Electronics
- Research on Retrieval Augmented Generation (RAG) for internal chatbot system
- Fine-tuned retriever and re-ranker using contrastive learning
- Improved internal document retrieval performance from 20% to 71% Hit@1 for user queries
2023 - 2024

Daejeon, Korea
Undergraduate Research Intern

KAIST, Data Science and Artificial Intelligence Lab
- Advisor: Prof. Chanyoung Park
- Conducted literature review and implementation of representative papers in recommender systems
- Explored LLM-based recommendation approaches (LLM4Rec, TALLRec)
2023 - 2023

Seoul, Korea
Research Intern

LG AI Research, EXAONE Lab
- Advisor: Dr. Hyeongu Yun
- Built a multi-modal (vision + text) data extraction pipeline to curate large-scale pretraining data for the EXAONE foundation model
- Combined vision-based layout detection models with rule-based parsing to extract structured text, tables, and images from PDF documents
- Produced 73GB+ of pretraining-grade data, validating scalability toward web-scale corpora
2023 - 2023

Seoul, Korea
Software Engineering Intern

Kounosoft
- Advisor: Dr. Woongmyung Kim
- Constructed Arduino-related Q&A dataset for education platform
- Performed supervised fine-tuning of KoGPT2 model on the custom dataset
- Developed complete chat interface and system using Vue3.js, FastAPI

Education

2024 - present

Seoul, Korea
M.S.

Seoul National University

Data Science
- GPA: 4.08/4.3
- Advisor: Prof. Yohan Jo
- Research interests: Multi-modal Foundation Models, Voice Assistants, Tool-augmented Agents, Post-training & Reinforcement Learning
2019 - 2024

Seoul, Korea
B.S.

Korea University

Industrial Management Engineering
- GPA: 4.08/4.5 (Major: 4.14/4.5)
- Dean’s List in College of Engineering (GPA: 4.5/4.5; 2023 Fall)
- Graduated with Great Honors (with one semester early graduation)

Publications

2026

Don't Adapt Small Language Models for Tools; Adapt Tool Schemas to the Models

ACL 2026 Main (acceptance rate: 19%)

Proposed PA-Tool, a training-free method that adapts tool schemas to align with models’ pretrained knowledge, improving tool-use performance by up to 17% and reducing schema misalignment errors by 80%.
2026

SpeakerSleuth: Can LALMs Judge Speaker Consistency across Multi-turn Dialogues?

ACL 2026 Main (acceptance rate: 19%)

Introduced a benchmark evaluating whether Large Audio-Language Models can reliably judge speaker consistency across multi-turn conversations, revealing significant biases in prioritizing text over acoustics.
2026

SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

Under review; Machine Learning for Audio Workshop @ ICML 2026

Developed a spoken user simulator that jointly generates text and speech tokens, modeling realistic spoken behaviors (cross-turn slots, barge-in, disfluency, emotion-aware speech) for task-oriented dialogue systems.
2026

SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

ICLR 2026 Oral (acceptance rate: 1.13%)

Developed a time-accelerated smart home simulation environment with 600 benchmark episodes, revealing that even top models struggle with temporal scheduling and state verification.
2026

Quantifying Data Contamination in Psychometric Evaluations of LLMs

EACL 2026 Findings (acceptance rate: 36.2%)

Proposed a framework to systematically measure data contamination in psychometric evaluations of LLMs, providing evidence of strong contamination in popular inventories.
2025

Tool-Augmented Agents: Evolution from Autonomy to Interaction

Korean Institute of Information Scientists and Engineers, Vol. 43, No. 11, pp. 14-25

Comprehensive survey examining the evolution of tool-augmented agents, focusing on the shift from autonomous capabilities to interactive paradigms in human-centered interaction.

Projects

Knowledge Graph Construction from Messenger Conversations

Industry-academic research project with Samsung Electronics on dynamically extracting user information from multi-session messenger conversations and constructing knowledge graphs for hyper-personalization.
- Fine-tuned Llama3-8B-Instruct model for knowledge graph extraction from messenger dialogues
- Achieved 29.4% higher F1-Score than GPT-4o on the extraction benchmark

Contact Information

Professional Summary

Experience

Software Engineering Intern, ML Brain SW Development Team

Samsung Electronics

Undergraduate Research Intern

KAIST, Data Science and Artificial Intelligence Lab

Research Intern

LG AI Research, EXAONE Lab

Software Engineering Intern

Kounosoft

Education

M.S.

Seoul National University

Data Science

B.S.

Korea University

Industrial Management Engineering

Publications

Don't Adapt Small Language Models for Tools; Adapt Tool Schemas to the Models

ACL 2026 Main (acceptance rate: 19%)

SpeakerSleuth: Can LALMs Judge Speaker Consistency across Multi-turn Dialogues?

ACL 2026 Main (acceptance rate: 19%)

SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue

Under review; Machine Learning for Audio Workshop @ ICML 2026

SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

ICLR 2026 Oral (acceptance rate: 1.13%)

Quantifying Data Contamination in Psychometric Evaluations of LLMs

EACL 2026 Findings (acceptance rate: 36.2%)

Tool-Augmented Agents: Evolution from Autonomy to Interaction

Korean Institute of Information Scientists and Engineers, Vol. 43, No. 11, pp. 14-25

Projects

Knowledge Graph Construction from Messenger Conversations