Zhen Yang    杨珍

I am a postdoctoral researcher in the Department of Computer Science and Technology, Tsinghua Univerisity, collaborated with Prof. Jie Tang. Prior to that, I received my Ph.D. from the Knowledge Engineering Group (KEG), Department of Computer Science and Technology of Tsinghua University, fortunately working with Prof. Jie Tang. I obtained my master and bachelor degree from Tsinghua University and Xidian University, respectively.

My research interests lie in vision-language models, large language models, and graph representation learning. Currently, I focus on multimodal coding agents and agentic reinforcement learning. I aim to build agentic systems that can reason, plan, and generate executable code cross modalities.

If these areas align with your interests, I welcome you to reach out via email. I am always open to discussing potential collaborations and exploring new ideas together.

Email: yang-zhen [at] mail.tsinghua.edu.cn  /  Google Scholar  /  Github

profile photo
News
  • January 2026: Our WebSeer paper about Deeper Search Agents is accepted by ICLR 2026!
  • December 2025: Released GLM-4.6V, our newest open-source GLM-4.6V series model, including GLM-4.6V (106B-A12B) and GLM-4.6V-Flash (9B).
  • November 2025: Our MathSE paper about Multimodal Mathematical Reasoning is accepted by AAAI 2026!
  • August 2025: Released GLM-4.5V, an improved vision-language model across multiple benchmarks.
  • July 2025: Open-sourced GLM-4.1V-Thinking, designed to advance general-purpose multimodal understanding and reasoning.
  • August 2024: Checkout our specialized mathematical multi-modal large language model MathGLM-Vision!
  • August 2024: Checkout our multi-modal scientific benchmark VisScience!
  • June 2024: Checkout our GLM Team paper ChatGLM!
  • February 2024: Our survey paper about Negative Sampling is accepted by TPAMI 2024!
  • December 2023: Our paper TriSampler is accepted by AAAI 2024!
  • September 2023: Checkout our specialized mathematical model MathGLM!
  • September 2023: I am currently doing an internship at the ChatGLM group of ZhipuAI!
  • August 2023: Our paper ViLTA gets accepted to ICCV 2023!
  • May 2023: Our paper BatchSampler gets accepted to KDD 2023!
  • February 2022: Our paper RecNS gets accepted to TKDE 2022!
  • January 2022: Our paper STAM gets accepted to WWW 2022!
  • May 2021: One paper MixGCF gets accepted to KDD 2021!
  • May 2020: One paper MCNS gets accepted to KDD 2020!
Publications
fqn UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation
Zhen Yang*, Wenyi Hong*, Mingde Xu, Xinyue Fan, Weihan Wang, Jiele Cheng, Xiaotao Gu, Jie Tang
arXiv, 2025
paper, code

UI2Code^N is a visual language foundation model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding, which unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing.

fqn WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation
Mingde Xu*, Zhen Yang*, Wenyi Hong, Lihang Pan, Xinyue Fan, Xiaotao Gu, Bin Xu, Jie Tang
arXiv, 2025
paper, code

WebVIA is the first agentic framework for interactive and verifiable UI-to-Code generation. While prior vision-language models only produce static HTML/CSS layouts, WebVIA enables executable and interactive web interfaces. The framework consists of three modules: WebVIA-Agent, WebVIA-UI2Code, and Validation Module.

fqn MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
Jinhao Chen*, Zhen Yang*, Jianxin Shi, Tianyu Wo, Jie Tang
AAAI 2026
paper, code

MathSE unifies distilled supervision, an Outcome Reward Model (ORM), and reflection-driven data refresh to progressively enhance math reasoning in multimodal LLMs.

fqn WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection
Guanzhong He, Zhen Yang, Jinxin Liu, Bin Xu, Lei Hou, Juanzi Li
ICLR 2026
paper, code

WebSeer is a reinforcement learning framework for training intelligent web-based search agents capable of deeper reasoning, longer tool-use chains, and self-reflective correction. Unlike traditional Retrieval-Augmented Generation (RAG) systems, WebSeer integrates self-reflection into every stage of reasoning, enabling agents to backtrack, reformulate queries, and iteratively improve answers in real-world web environments.

fqn GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
GLM-V Team,
Technical Report, 2025
paper, code

Vision-language models (VLMs) have become a key cornerstone of intelligent systems. As real-world AI tasks grow increasingly complex, VLMs urgently need to enhance reasoning capabilities beyond basic multimodal perception — improving accuracy, comprehensiveness, and intelligence — to enable complex problem solving, long-context understanding, and multimodal agents. Through our open-source work, we aim to explore the technological frontier together with the community while empowering more developers to create exciting and innovative applications. This open-source repository contains our GLM-4.6V, GLM-4.5V and GLM-4.1V series models. For performance and details, see Model Overview. For known issues, see Fixed and Remaining Issues.

fqn MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
Zhen Yang*, Jinhao Chen*, Zhengxiao Du, Wenmeng Yu, Weihan Wang, Wenyi Hong, Zhihuan Jiang, Bin Xu, Yuxiao Dong, Jie Tang
Manuscript, 2024
paper, code

We aim to construct a fine-tuning dataset MathVL, and develop a series of specialized mathematical MLLMs MathGLM-Vision with various parameter-scale backbones.

fqn VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning
Zhihuan Jiang*, Zhen Yang*, Jinhao Chen, Zhengxiao Du, Weihan Wang, Bin Xu, Yuxiao Dong, Jie Tang
Manuscript, 2024
paper, code

We meticulously construct a comprehensive benchmark, named VisScience, which is utilized to assess the multi-modal scientific reasoning across the three disciplines of mathematics, physics, and chemistry.

streamv2v Does Negative Sampling Matter? A Review with Insights into its Theory and Applications
Zhen Yang, Ming Ding, Tinglin Huang, Yukuo Cen, Junshuai Song, Bin Xu, Yuxiao Dong, Jie Tang
TPAMI, 2024
paper

We explore the history of negative sampling, categorize the strategies used to select negative samples, and examine their practical applications.

ovseg TriSampler: A Better Negative Sampling Principle for Dense Retrieval
Zhen Yang, Zhou Shao, Yuxiao Dong, Jie Tang,
AAAI, 2024
paper

We design the quasi-triangular principle and introduce TriSampler to selectively sample more informative negatives within a prescribed constrained region.

ovseg GPT Can Solve Mathematical Problems Without a Calculator
Zhen Yang*, Ming Ding*, Qingsong Lv, Zhihuan Jiang, Zehai He, Yuyi Guo, Jinfei Bai, Jie Tang,
Manuscript, 2023
arxiv, code

We propose a 2 billion-parameter language model MathGLM that can accurately perform multi-digit arithmetic operations with almost 100% accuracy without data leakage.

supmae Batchsampler: Sampling Mini-batches for Contrastive Learning in Vision, Language, and Graphs
Zhen Yang*, Tinglin Huang*, Ming Ding*, Yuxiao Dong, Rex Ying, Yukuo Cen, Yangliao Geng, Jie Tang
KDD, 2023
paper, code

We present BatchSampler to sample mini-batches of hard-to-distinguish (i.e., hard and true negatives to each other) instances by leveraging the proximity graph and a random walk with restart.

declip ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
Weihan Wang*, Zhen Yang*, Bin Xu, Juanzi Li, Yankui Sun
ICCV, 2023
paper

We propose a novel method ViLTA that utilize a cross-distillation method to generate soft labels for enhancing the robustness of model.

ant STAM: A Spatiotemporal Aggregation Method for Graph Neural Network-based Recommendation
Zhen Yang, Ming Ding, Bin Xu, Hongxia Yang, Jie Tang
WWW, 2022
paper, code

We propose a spatiotemporal aggregation method STAM to efficiently incorporate temporal information into neighbor embedding learning.

repre Region or Global? A Principle for Negative Sampling in Graph-based Recommendation
Zhen Yang, Ming Ding, Xu Zou, Jie Tang, Bin Xu, Chang Zhou, Hongxia Yang
TKDE, 2022, Long oral
paper, code

We design three region principle to select negative candidate and propose RecNS method to sythesize hard negatives.

crnas MixGCF: An Improved Training Method for Graph Neural Network-based Recommender Systems
Tinglin Huang, Yuxiao Dong, Ming Ding, Zhen Yang, Wenzheng Feng, Xinyu Wang, Jie Tang
KDD, 2021
paper, code

We present MixGCF that can study negative sampling by leveraging both the user-item graph structure and GNNs’ aggregation process to design the hop mixing technique to synthesize hard negatives.

oqa Understanding Negative Sampling in Graph Representation Learning
Zhen Yang*, Ming Ding*, Chang Zhou, Hongxia Yang, Jingren Zhou, Jie Tang,
KDD, 2020
paper, code

We develop a theory and quantify that a nice negative sampling distribution is \( p_n(u|v) \propto p_d(u|v)^\alpha \), \( 0 < \alpha < 1 \). Additionly, we propose Markov chain Monte Carlo Negative Sampling (MCNS), an effective and scalable negative sampling strategy for various tasks in graph representation learning.

Academic Services

  • Reviewer of Journals: TKDE
  • Reviewer of Conferences: ICLR 2024/2025/2026, ICCV 2025, AAAI 2025/2026, KDD 2021/2022, WWW 2023/2024/2025
  • Awards and Funds
    • Young Scientists Fund of the National Natural Science Foundation of China (国家自然科学基金青年基金项目C类), 2025.
    • General Fund of China Postdoctoral Science Foundation (中国博士后科学基金面上资助), 2025.
    • China National Postdoctoral Program for Innovative Talents (2025年度博士后创新人才支持计划), 2025.
    • Shuimu Tsinghua Scholar for Postdoc Researcher (清华大学水木学者), 2025.
    • Beijing Outstanding Graduate, 2024.
    • Huawei Scholarship, 2023.
    • Excellent Graduate of Beijing, 2019.
    • Outstanding Academic Paper Award, Tsinghua University, 2019.
    • 129 Scholarship of Tsinghua University, 2018.
    • National Scholarship by Ministry of Education of China, 2018.
    • National Scholarship by Ministry of Education of China, 2014.

    Thanks to Jon Barron