Publications

You can also find my articles on my Google Scholar profile.

- ⭐: Co-first Author - 🚩: Corresponding Author - 💭: Under Review

Conference Papers (16)

[ICLR 2026] YuE: Scaling Open Foundation Models for Long-Form Music Generation

Published in ICLR, 2026-04

Long-form music (full-song) generation, Audio Language Model, Music In-Context Learning, Controllability, human preference evaluation

Download Paper

🚩 [AAAI 2026 (Oral)] O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing

Published in AAAI, 2026-01

Video Editing, Diffusion Models, Unified Control Signal, Object Distortion Control

Download Paper

🚩 [ACL 2025] Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective

Published in ACL, 2025-07

Multi-Paradigm, Large Language Model (LLM), Progressive Paradigm Training (PPT), Zero-shot Generalization

Download Paper

⭐ [ICLR 2025] ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation

Published in ICLR, 2025-04

Large Multimodal Models (LMMs), Chart Understanding, Code Generation, Cross-modal Reasoning

Download Paper

[EMNLP 2024] HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing

Published in EMNLP, 2024-11

Screenwriting, Large Language Models (LLMs), Role Playing, Creative Generation, Multi-Agent Collaboration

Download Paper

⭐ [EMNLP 2024] ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

Published in EMNLP, 2024-11

ToolBeHonest, Hallucination, LLM, Multi‑level Diagnostic

Download Paper

⭐ [SIGIR-AP 2024 最佳论文提名] Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Language Models

Published in SIGIR-AP, 2024-10

Massive Tool Retrieval (MTR), Query‑Tool Alignment (QTA), Massive Tool Retrieval Benchmark, DPO

Download Paper

[COLM 2024] StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

Published in COLM, 2024-02

Structured Knowledge Grounding (SKG), Instruction tuning, Generalist model, Tables / Graphs / Databases

Download Paper

⭐ [SIGIR-AP 2023] EALM: Introducing Multidimensional Ethical Alignment in Conversational Information Retrieval

Published in SIGIR-AP, 2023-11

Multidimensional Ethics, Ethical Judgment, Large Multimodal Models (LMMs), Binary & Multi‑label Classification

Download Paper

[ACL 2023] UniEX: An Effective and Efficient Framework for Unified Information Extraction via a Span-extractive Perspective

Published in ACL, 2023-07

Information Extraction , Unified Across IE Tasks, Triaffine Attention, Span‑extractive Framework, Low‑resource Transferability

Download Paper

⭐ [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models

Published in ACL, 2023-07

System 1 & System 2, Cooperative Reasoning (CoRe),Stepwise Feedback, Math Word Problems

Download Paper

⭐ [CVPR 2023] MAP: Modality-Agnostic Uncertainty-Aware Vision-Language Pre-training Model

Published in CVPR, 2023-06

Multimodality, Uncertainty Modeling, Vision-Language Pre-training, Probability Distribution Encoder (PDE)

Download Paper

⭐ [EMNLP 2022] Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective

Published in EMNLP, 2022-10

Zero‑Shot Learning, Multiple Choice Format, Pre‑trained Masked Language Model (PMLM), Unified Multiple Choice model

Download Paper

[ACM 2022] Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation

Published in ACM MM, 2022-10

Multimedia Recommendation, Graph Fusion, Edge-wise Modulation, Graph Convolutional Network (GCN)

Download Paper

⭐ [EMNLP 2021] MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

Published in EMNLP, 2021-10

Multimodal Interaction, Trilinear Transformer, Visual Question Answering (VQA), Two-Stage Workflow

Download Paper

⭐ [NTCIR 15] SKYMN at the NTCIR-15 DialEval-1 Task

Published in NTCIR, 2020-12

Dialogue Quality, BiLSTM + Attention), CNN (Convolutional Neural Network), Pre‑trained Language Model, MoE

Download Paper

Journal Articles (1)

⭐ [ACM TOIS 2024] SSR: Solving Named Entity Recognition Problems via a Single-stream Reasoner

Published in ACM TOIS, 2024-05

Named Entity Recognition (NER), Machine Reading Comprehension (MRC), Single-Stream Reasoner (SSR), Multi-choice Input Format

Download Paper

Arxiv Papers (11)

💭⭐ SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature

Published in Arxiv, 2026-01

Multimodal Large Language Models, Long-Context Understanding, Evidence Chain, Scientific Literature, Benchmark

Download Paper

💭🚩 ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Published in Arxiv, 2026-01

Probability-Guided Token Selection, High-Value Semantic Signals, Gradient Interference, Training Stability & Rapid Convergence

Download Paper

💭 Nav-R^2 Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation

Published in Arxiv, 2025-12

Object-Goal Navigation, Chain-of-Thought, Dual-Relation Reasoning, Similarity-Aware Memory

Download Paper

💭 A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Published in Arxiv, 2025-08

AI-Generated Code Security, Repository-Level Benchmark, Large Language Models (LLMs), Common Vulnerabilities and Exposures (CVEs)

Download Paper

💭 VeriGUI: Verifiable Long-Chain GUI Dataset

Published in Arxiv, 2025-08

Long-Chain Complexity, GUI Agents, Subtask-Level Verifiability, POMDP

Download Paper

💭 AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Published in Arxiv, 2025-07

Controllable Captioning, Omni-modal Intelligence, Plug-and-play Framework, Multimodal Large Language Models

Download Paper

⭐🚩 PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Published in Arxiv, 2024-06

Knowledge‑intensive, Paired and Interleaved Documents, Multimodal Datasets, Data Format

Download Paper

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

Published in Arxiv, 2024-01

Multimodal Understanding, Subject Knowledge Reasoning , University Exam Questions, LMM Performance Evaluation

Download Paper

GameGPT: Multi-agent Collaborative Framework for Game Development

Published in Arxiv, 2023-10

Game Development, Multi-agent Collaboration, LLM, Redundancy

Download Paper

⭐ Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence

Published in Arxiv, 2022-09

Chinese Pre-trained Models, Foundation Models, Open-Source Ecosystem, Cognitive Intelligence

Download Paper

⭐ Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss

Published in Arxiv, 2022-08

Semantic Matching, Pre-trained Language Model (PLM), Propensity‑Corrected Loss (PCL), LUE Semantic Matching Challenge

Download Paper

Junjie Wang (王军杰)

Publications

Conference Papers (16)

Journal Articles (1)

Arxiv Papers (11)