🚩 [AAAI 2026 (Oral)] O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing
Published in AAAI, 2026-01
Video Editing, Diffusion Models, Unified Control Signal, Object Distortion Control
Published in AAAI, 2026-01
Video Editing, Diffusion Models, Unified Control Signal, Object Distortion Control
Published in ACL, 2025-07
Multi-Paradigm, Large Language Model (LLM), Progressive Paradigm Training (PPT), Zero-shot Generalization
Published in ICLR, 2025-04
Large Multimodal Models (LMMs), Chart Understanding, Code Generation, Cross-modal Reasoning
Published in EMNLP, 2024-11
Screenwriting, Large Language Models (LLMs), Role Playing, Creative Generation, Multi-Agent Collaboration
Published in EMNLP, 2024-11
ToolBeHonest, Hallucination, LLM, Multi‑level Diagnostic
Published in SIGIR-AP, 2024-10
Massive Tool Retrieval (MTR), Query‑Tool Alignment (QTA), Massive Tool Retrieval Benchmark, DPO
Published in COLM, 2024-02
Structured Knowledge Grounding (SKG), Instruction tuning, Generalist model, Tables / Graphs / Databases
Published in SIGIR-AP, 2023-11
Multidimensional Ethics, Ethical Judgment, Large Multimodal Models (LMMs), Binary & Multi‑label Classification
Published in ACL, 2023-07
Information Extraction , Unified Across IE Tasks, Triaffine Attention, Span‑extractive Framework, Low‑resource Transferability
Published in ACL, 2023-07
System 1 & System 2, Cooperative Reasoning (CoRe),Stepwise Feedback, Math Word Problems
Published in CVPR, 2023-06
Multimodality, Uncertainty Modeling, Vision-Language Pre-training, Probability Distribution Encoder (PDE)
Published in EMNLP, 2022-10
Zero‑Shot Learning, Multiple Choice Format, Pre‑trained Masked Language Model (PMLM), Unified Multiple Choice model
Published in ACM MM, 2022-10
Multimedia Recommendation, Graph Fusion, Edge-wise Modulation, Graph Convolutional Network (GCN)
Published in EMNLP, 2021-10
Multimodal Interaction, Trilinear Transformer, Visual Question Answering (VQA), Two-Stage Workflow
Published in NTCIR, 2020-12
Dialogue Quality, BiLSTM + Attention), CNN (Convolutional Neural Network), Pre‑trained Language Model, MoE
Published in ACM TOIS, 2024-05
Named Entity Recognition (NER), Machine Reading Comprehension (MRC), Single-Stream Reasoner (SSR), Multi-choice Input Format
Published in Arxiv, 2025-12
Object-Goal Navigation, Chain-of-Thought, Dual-Relation Reasoning, Similarity-Aware Memory
Published in Arxiv, 2025-08
AI-Generated Code Security, Repository-Level Benchmark, Large Language Models (LLMs), Common Vulnerabilities and Exposures (CVEs)
Published in Arxiv, 2025-08
Long-Chain Complexity, GUI Agents, Subtask-Level Verifiability, POMDP
Published in Arxiv, 2025-07
Controllable Captioning, Omni-modal Intelligence, Plug-and-play Framework, Multimodal Large Language Models
Published in Arxiv, 2024-06
Knowledge‑intensive, Paired and Interleaved Documents, Multimodal Datasets, Data Format
Published in Arxiv, 2024-01
Multimodal Understanding, Subject Knowledge Reasoning , University Exam Questions, LMM Performance Evaluation
Published in Arxiv, 2023-10
Game Development, Multi-agent Collaboration, LLM, Redundancy
Published in Arxiv, 2022-09
Massive Tool Retrieval (MTR), Query‑Tool Alignment (QTA), Massive Tool Retrieval Benchmark, DPO
Published in Arxiv, 2022-08
Semantic Matching, Pre-trained Language Model (PLM), Propensity‑Corrected Loss (PCL), LUE Semantic Matching Challenge