CV
Curriculum vitae. A PDF version is available via the download button.
Contact Information
| Name | Jonggeun Lee |
| Professional Title | M.S. Student in Data Science |
| jonggeun.lee@snu.ac.kr |
Professional Summary
M.S. student at Seoul National University working on multi-modal foundation models, voice assistants, tool-augmented agents, and post-training & reinforcement learning.
Experience
-
2024 - 2024 Hwasung, Korea
Software Engineering Intern, ML Brain SW Development Team
Samsung Electronics
- Research on Retrieval Augmented Generation (RAG) for internal chatbot system
- Fine-tuned retriever and re-ranker using contrastive learning
- Improved internal document retrieval performance from 20% to 71% Hit@1 for user queries
-
2023 - 2024 Daejeon, Korea
Undergraduate Research Intern
KAIST, Data Science and Artificial Intelligence Lab
- Advisor: Prof. Chanyoung Park
- Conducted literature review and implementation of representative papers in recommender systems
- Explored LLM-based recommendation approaches (LLM4Rec, TALLRec)
-
2023 - 2023 Seoul, Korea
Research Intern
LG AI Research, EXAONE Lab
- Advisor: Dr. Hyeongu Yun
- Built a multi-modal (vision + text) data extraction pipeline to curate large-scale pretraining data for the EXAONE foundation model
- Combined vision-based layout detection models with rule-based parsing to extract structured text, tables, and images from PDF documents
- Produced 73GB+ of pretraining-grade data, validating scalability toward web-scale corpora
-
2023 - 2023 Seoul, Korea
Software Engineering Intern
Kounosoft
- Advisor: Dr. Woongmyung Kim
- Constructed Arduino-related Q&A dataset for education platform
- Performed supervised fine-tuning of KoGPT2 model on the custom dataset
- Developed complete chat interface and system using Vue3.js, FastAPI
Education
-
2024 - present Seoul, Korea
M.S.
Seoul National University
Data Science
- GPA: 4.08/4.3
- Advisor: Prof. Yohan Jo
- Research interests: Multi-modal Foundation Models, Voice Assistants, Tool-augmented Agents, Post-training & Reinforcement Learning
-
2019 - 2024 Seoul, Korea
B.S.
Korea University
Industrial Management Engineering
- GPA: 4.08/4.5 (Major: 4.14/4.5)
- Dean’s List in College of Engineering (GPA: 4.5/4.5; 2023 Fall)
- Graduated with Great Honors (with one semester early graduation)
Publications
-
2026 Don't Adapt Small Language Models for Tools; Adapt Tool Schemas to the Models
ACL 2026 Main (acceptance rate: 19%)
Proposed PA-Tool, a training-free method that adapts tool schemas to align with models’ pretrained knowledge, improving tool-use performance by up to 17% and reducing schema misalignment errors by 80%.
-
2026 SpeakerSleuth: Can LALMs Judge Speaker Consistency across Multi-turn Dialogues?
ACL 2026 Main (acceptance rate: 19%)
Introduced a benchmark evaluating whether Large Audio-Language Models can reliably judge speaker consistency across multi-turn conversations, revealing significant biases in prioritizing text over acoustics.
-
2026 SpokenUS: A Spoken User Simulator for Task-Oriented Dialogue
Under review; Machine Learning for Audio Workshop @ ICML 2026
Developed a spoken user simulator that jointly generates text and speech tokens, modeling realistic spoken behaviors (cross-turn slots, barge-in, disfluency, emotion-aware speech) for task-oriented dialogue systems.
-
2026 SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents
ICLR 2026 Oral (acceptance rate: 1.13%)
Developed a time-accelerated smart home simulation environment with 600 benchmark episodes, revealing that even top models struggle with temporal scheduling and state verification.
-
2026 Quantifying Data Contamination in Psychometric Evaluations of LLMs
EACL 2026 Findings (acceptance rate: 36.2%)
Proposed a framework to systematically measure data contamination in psychometric evaluations of LLMs, providing evidence of strong contamination in popular inventories.
-
2025 Tool-Augmented Agents: Evolution from Autonomy to Interaction
Korean Institute of Information Scientists and Engineers, Vol. 43, No. 11, pp. 14-25
Comprehensive survey examining the evolution of tool-augmented agents, focusing on the shift from autonomous capabilities to interactive paradigms in human-centered interaction.
Projects
-
Knowledge Graph Construction from Messenger Conversations
Industry-academic research project with Samsung Electronics on dynamically extracting user information from multi-session messenger conversations and constructing knowledge graphs for hyper-personalization.
- Fine-tuned Llama3-8B-Instruct model for knowledge graph extraction from messenger dialogues
- Achieved 29.4% higher F1-Score than GPT-4o on the extraction benchmark