GenAI papers published between Mar'25 to May'25

Shashank Shekhar
May 22, 2025
3 min read

These are the three papers which have been just published.

1. Generalization Bias in Large Language Model Summarization of Scientific Research

Authors: Uwe Peters, Benjamin Chin-Yee
Published: March 28, 2025
Link: (arXiv)

Summary:

This study investigates the propensity of large language models (LLMs) to overgeneralize when summarizing scientific research. The authors evaluated 4,900 summaries generated by 10 prominent LLMs, including ChatGPT-4o, ChatGPT-4.5, DeepSeek, LLaMA 3.3 70B, and Claude 3.7 Sonnet.

The analysis revealed that, even when explicitly prompted for accuracy, most LLMs produced summaries that generalized the original research findings beyond their warranted scope. Specifically, models like DeepSeek, ChatGPT-4o, and LLaMA 3.3 70B overgeneralized in 26% to 73% of cases.(Apollo)

Furthermore, when comparing LLM-generated summaries to human-authored ones, the study found that LLM summaries were nearly five times more likely to contain broad generalizations (odds ratio = 4.85, 95% CI [3.06, 7.70]). Notably, newer models tended to perform worse in generalization accuracy than earlier versions.(PubMed)

The authors highlight the risk of large-scale misinterpretations of research findings due to this bias and suggest mitigation strategies, including lowering LLM temperature settings and benchmarking LLMs for generalization accuracy.(arXiv)

2. InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework

Authors: Jiale Tao, Yanbing Zhang, Qixun Wang, Yiji Cheng, Haofan Wang, Xu Bai, Zhengguang Zhou, Ruihuang Li, Linqing Wang, Chunyu Wang, Qin Lin, Qinglin Lu
Published: April 16, 2025
Paper: (InstantCharacter)

Summary:

The paper introduces InstantCharacter, a scalable framework for character customization built upon a foundation diffusion transformer (DiT). Traditional learning-based subject customization approaches, predominantly relying on U-Net architectures, suffer from limited generalization ability and compromised image quality. Optimization-based methods require subject-specific fine-tuning, which degrades textual controllability.(InstantCharacter)

InstantCharacter addresses these challenges by demonstrating three fundamental advantages:(arXiv)

It achieves open-domain personalization across diverse character appearances, poses, and styles while maintaining high-fidelity results.
The framework introduces a scalable adapter with stacked transformer encoders, effectively processing open-domain character features and interacting seamlessly with the latent space of modern diffusion transformers.
To train the framework effectively, the authors constructed a large-scale character dataset containing over 10 million samples, organized into paired (multi-view character) and unpaired (text-image combinations) subsets. This dual-data structure enables simultaneous optimization of identity consistency and textual editability through distinct learning pathways.

Qualitative experiments demonstrate InstantCharacter's advanced capabilities in generating high-fidelity, text-controllable, and character-consistent images, setting a new benchmark for character-driven image generation.(InstantCharacter)

3. Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Authors: Lvmin Zhang, Maneesh Agrawala
Published: April 17, 2025
Link: (Lllyasviel)

Summary:

This paper presents FramePack, a neural network structure designed to train next-frame (or next-frame-section) prediction models for video generation. FramePack compresses input frames to maintain a fixed transformer context length, enabling the processing of a large number of frames using video diffusion with computational efficiency similar to image diffusion. This approach allows for significantly larger training video batch sizes, comparable to those in image diffusion training.(arXiv)

Additionally, the authors propose an anti-drifting sampling method that generates frames in inverted temporal order with early-established endpoints to avoid exposure bias (error accumulation over iterations). This technique helps maintain consistency and quality over long video sequences.

The study demonstrates that existing video diffusion models can be fine-tuned with FramePack, potentially improving visual quality due to more balanced diffusion schedulers with less extreme flow shift timesteps.(arXiv)

GenAI papers published between Mar'25 to May'25

1. Generalization Bias in Large Language Model Summarization of Scientific Research

2. InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework

3. Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Recent Posts

Comments