Presented at MLVC Lab, Kyung Hee University Global Campus
Seminar covering the use of large language models and vision-language approaches in visual reasoning and restoration. We explored recent innovations in step-wise reasoning and multimodal agents for complex image tasks.
Presents a multi-task learning viewpoint on modern diffusion models. It discusses task transferability, negative transfer challenges, and the use of mixture-of-experts to solve denoising in a synergistic fashion.
A deep dive into diffusion-based models tailored for specific restoration tasks, including Ambient Diffusion, ResShift for super-resolution, and DiffBIR’s two-stage blind restoration pipeline.
Highlights from two AAAI-24 papers: Any-Size-Diffusion (ASD), a two-stage pipeline for text-to-image synthesis at arbitrary resolutions, and ResDiff, which unites CNN priors with diffusion guidance for efficient single-image super-resolution.
Summary session from NeurIPS 2024, with analyses of several key papers presented at the conference.