📚论文阅读

less than 1 minute read

Published: June 21, 2025

阅读的论文合集。

Genie: 生成式交互环境, Generative Interactive Environments

Gemma: 基于Gemini研究和技术的开源模型, Gemma: Open Models Based on Gemini Research and Technology

迈向全透明的开源大语言模型, LLM360: Towards Fully Transparent Open-Source LLMs

作为世界模拟器的视频生成模型, Video generation models as world simulators

处理、表示和操作视觉丰富的科学文献的统一工具包: PaperMage

统一和处理多种结构化知识基础（SKG）任务, UnifiedSKG

双子座：一个功能强大的多模态模型系列，Gemini: A Family of Highly Capable Multimodal Models

开源的多模态文档数据集，《OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents》

看了超过200篇中国人写的英文论文后总结出了这些常见错误

166页超长论文阅读，大多模态模型的黎明：GPT-4V的初步探索，The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

GPT-4V系统卡解读：智能以外，如何对社会有益？

大模型的缩放定律，Scaling Laws for Neural Language Models

可以生成空间感知的文本块和以markdown格式的多模态文学模型，Kosmos-2.5: A Multimodal Literate Model

基于科学文档的PDE识别，《Nougat：Neural Optical Understanding for Academic Documents》

📚论文阅读

Genie: 生成式交互环境, Generative Interactive Environments

Gemma: 基于Gemini研究和技术的开源模型, Gemma: Open Models Based on Gemini Research and Technology

迈向全透明的开源大语言模型, LLM360: Towards Fully Transparent Open-Source LLMs

作为世界模拟器的视频生成模型, Video generation models as world simulators

处理、表示和操作视觉丰富的科学文献的统一工具包: PaperMage

统一和处理多种结构化知识基础（SKG）任务, UnifiedSKG

双子座：一个功能强大的多模态模型系列，Gemini: A Family of Highly Capable Multimodal Models

开源的多模态文档数据集，《OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents》

看了超过200篇中国人写的英文论文后总结出了这些常见错误

166页超长论文阅读，大多模态模型的黎明：GPT-4V的初步探索，The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

GPT-4V系统卡解读：智能以外，如何对社会有益？

大模型的缩放定律，Scaling Laws for Neural Language Models

可以生成空间感知的文本块和以markdown格式的多模态文学模型，Kosmos-2.5: A Multimodal Literate Model

基于科学文档的PDE识别，《Nougat：Neural Optical Understanding for Academic Documents》

Comprehensive LaTeX Template for Waseda University PhD Theses

🤔一些思考

Markdown Guide

Academic Pages 简明双语指南