Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
Comprehensive LaTeX Template for Waseda University PhD Theses
Published:
Github: https://github.com/wanng-ide/phd_thesis_template_waseda_university
📚论文阅读
Published:
阅读的论文合集。
🤔一些思考
Published:
记录一些简单的思考💬。
Markdown Guide
Published:
📒 This page is from Academic Pages.
Academic Pages 简明双语指南
Published:
Quick Guide for Academic Pages (Chinese-English Bilingual)
从 视觉问答 Visual Question Answering(VQA)到 多模态表征 Multimodal representation learning 简单综述
Published:
文本主要是对VQA整个任务做一个综述。
portfolio
YUE开源音乐大模型
YuE: Scaling Open Foundation Models for Long-Form Music Generation
Open‑Sora
Open-Sora: Democratizing Efficient Video Production for All
最大的开源的交错图文对数据集:PIN
PIN(Paired and INterleaved multimodal documents)
封神榜开源体系
开源模型、开源框架、开源榜单
一些自动脚本
记录一些简单的脚本项目
publications
⭐ [NTCIR 15] SKYMN at the NTCIR-15 DialEval-1 Task
Published in NTCIR, 2020-12
Dialogue Quality, BiLSTM + Attention), CNN (Convolutional Neural Network), Pre‑trained Language Model, MoE
⭐ [EMNLP 2021] MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering
Published in EMNLP, 2021-10
Multimodal Interaction, Trilinear Transformer, Visual Question Answering (VQA), Two-Stage Workflow
⭐ Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss
Published in Arxiv, 2022-08
Semantic Matching, Pre-trained Language Model (PLM), Propensity‑Corrected Loss (PCL), LUE Semantic Matching Challenge
⭐ Fengshenbang 1.0: Being the Foundation of Chinese Cognitive Intelligence
Published in Arxiv, 2022-09
Massive Tool Retrieval (MTR), Query‑Tool Alignment (QTA), Massive Tool Retrieval Benchmark, DPO
[ACM 2022] Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation
Published in ACM MM, 2022-10
Multimedia Recommendation, Graph Fusion, Edge-wise Modulation, Graph Convolutional Network (GCN)
⭐ [EMNLP 2022] Zero-Shot Learners for Natural Language Understanding via a Unified Multiple Choice Perspective
Published in EMNLP, 2022-10
Zero‑Shot Learning, Multiple Choice Format, Pre‑trained Masked Language Model (PMLM), Unified Multiple Choice model
⭐ [CVPR 2023] MAP: Modality-Agnostic Uncertainty-Aware Vision-Language Pre-training Model
Published in CVPR, 2023-06
Multimodality, Uncertainty Modeling, Vision-Language Pre-training, Probability Distribution Encoder (PDE)
⭐ [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models
Published in ACL, 2023-07
System 1 & System 2, Cooperative Reasoning (CoRe),Stepwise Feedback, Math Word Problems
[ACL 2023] UniEX: An Effective and Efficient Framework for Unified Information Extraction via a Span-extractive Perspective
Published in ACL, 2023-07
Information Extraction , Unified Across IE Tasks, Triaffine Attention, Span‑extractive Framework, Low‑resource Transferability
⭐ [SIGIR-AP 2023] EALM: Introducing Multidimensional Ethical Alignment in Conversational Information Retrieval
Published in SIGIR-AP, 2023-11
Multidimensional Ethics, Ethical Judgment, Large Multimodal Models (LMMs), Binary & Multi‑label Classification
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
Published in Arxiv, 2024-01
Multimodal Understanding, Subject Knowledge Reasoning , University Exam Questions, LMM Performance Evaluation
[COLM 2024] StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Published in COLM, 2024-02
Structured Knowledge Grounding (SKG), Instruction tuning, Generalist model, Tables / Graphs / Databases
⭐ [ACM TOIS 2024] SSR: Solving Named Entity Recognition Problems via a Single-stream Reasoner
Published in ACM TOIS, 2024-05
Named Entity Recognition (NER), Machine Reading Comprehension (MRC), Single-Stream Reasoner (SSR), Multi-choice Input Format
⭐🚩 PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents
Published in Arxiv, 2024-06
Knowledge‑intensive, Paired & Interleaved, Large Multimodal Models (LMMs), Scalability
⭐ [SIGIR-AP 2024 最佳论文提名] Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Language Models
Published in SIGIR-AP, 2024-10
Massive Tool Retrieval (MTR), Query‑Tool Alignment (QTA), Massive Tool Retrieval Benchmark, DPO
⭐ [EMNLP 2024] ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models
Published in EMNLP, 2024-11
ToolBeHonest, Hallucination, LLM, Multi‑level Diagnostic
[EMNLP 2024] HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing
Published in EMNLP, 2024-11
Screenwriting, Large Language Models (LLMs), Role Playing, Creative Generation, Multi-Agent Collaboration
⭐ [ICLR 2025] ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation
Published in ICLR, 2025-04
Large Multimodal Models (LMMs), Chart Understanding, Code Generation, Cross-modal Reasoning
🚩 [ACL 2025] Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
Published in ACL, 2025-07
Multi-Paradigm, Large Language Model (LLM), Progressive Paradigm Training (PPT), Zero-shot Generalization
talks
IDEA研究院原作团队解读封神榜体系:致力于成为中文认知智能的基础设施
Published:
随着大模型在自然语言处理、计算机视觉等多个领域兴起,认知智能正在经历范式上的变化。借助大规模的数据以及庞大的参数量,这些模型展现出能够有效处理各种任务的特征,并正在以惊人的速度被部署到各个专业领域中,对社会和经济发展产生深远的影响。但是目前中文社区出现了某种停滞不前的现象,因为模型的体量已经从原本的百万参数飞跃至千亿级别,一些高校和传统公司并不具备足够的算力,也缺少有效的基础设施帮助他们训练和使用模型。因此,要推动人工智能技术进一步发展,坚实的基础设施尤为重要。
首个中文Stable Diffusion模型背后的技术:IDEA研究院封神榜团队揭秘
Published:
全面讲解太乙系列模型从模型的生产到应用。该分享从训练,微调和加速等角度揭秘封神榜开源体系之一的太乙系列(多模态系列)模型是如何生产的。基于该团队训练后开源的权重,讲解如何推理加速以及如何部署在 webui 和 dreambooth 等应用上。
Large Models bridge the Digital-Real World Gap: from Understanding to Generation
Published:
NCAA 2023 tutorial speaker
SIGIR-AP 2023 in Beijing
Published:
EALM: Introducing Multidimensional Ethical Alignment in Conversational Information Retrieval
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.