도전2022

CVPR 2024 본문

카테고리 없음

CVPR 2024

hotdigi 2024. 6. 18. 14:15
SMALL

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh RecoveryYixuan Zhu, Ao Li, Yansong Tang, Wenliang Zhao, Jie Zhou, Jiwen Lu[pdf] [supp] [arXiv] 


HEAL-SWIN: A Vision Transformer On The SphereOscar Carlsson, Jan E. Gerken, Hampus Linander, Heiner Spieß, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson[pdf] [supp] 


3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score DistillationDale Decatur, Itai Lang, Kfir Aberman, Rana Hanocka[pdf] [supp] [arXiv] 


Guided Slot Attention for Unsupervised Video Object SegmentationMinhyeok Lee, Suhwan Cho, Dogyoon Lee, Chaewon Park, Jungho Lee, Sangyoun Lee[pdf] [arXiv] 


Programmable Motion Generation for Open-Set Motion Control TasksHanchao Liu, Xiaohang Zhan, Shaoli Huang, Tai-Jiang Mu, Ying Shan[pdf] [supp] [arXiv] 


SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark EstimationKejia Yin, Varshanth Rao, Ruowei Jiang, Xudong Liu, Parham Aarabi, David B. Lindell[pdf] [supp] 


LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented DiffusionPancheng Zhao, Peng Xu, Pengda Qin, Deng-Ping Fan, Zhicheng Zhang, Guoli Jia, Bowen Zhou, Jufeng Yang[pdf] 


TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion ProcessZhiyuan Ren, Minchul Kim, Feng Liu, Xiaoming Liu[pdf] [supp] 


ASH: Animatable Gaussian Splats for Efficient and Photoreal Human RenderingHaokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann[pdf] [supp] [arXiv] 


ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit AdaptationDar-Yen Chen, Hamish Tennent, Ching-Wen Hsu[pdf] [supp] [arXiv] 


Activity-Biometrics: Person Identification from Daily ActivitiesShehreen Azad, Yogesh Singh Rawat[pdf] [supp] 


Z*: Zero-shot Style Transfer via Attention ReweightingYingying Deng, Xiangyu He, Fan Tang, Weiming Dong[pdf] [supp] 


Learning Continuous 3D Words for Text-to-Image GenerationTa-Ying Cheng, Matheus Gadelha, Thibault Groueix, Matthew Fisher, Radomir Mech, Andrew Markham, Niki Trigoni[pdf] [arXiv] 


MarkovGen: Structured Prediction for Efficient Text-to-Image GenerationSadeep Jayasumana, Daniel Glasner, Srikumar Ramalingam, Andreas Veit, Ayan Chakrabarti, Sanjiv Kumar[pdf] [supp] [arXiv] 


HashPoint: Accelerated Point Searching and Sampling for Neural RenderingJiahao Ma, Miaomiao Liu, David Ahmedt-Aristizabal, Chuong Nguyen[pdf] [supp] 


MFP: Making Full Use of Probability Maps for Interactive Image SegmentationChaewon Lee, Seon-Ho Lee, Chang-Su Kim[pdf] [supp] [arXiv] 


StyLitGAN: Image-Based Relighting via Latent ControlAnand Bhattad, James Soole, D.A. Forsyth[pdf] [supp] 


MoMask: Generative Masked Modeling of 3D Human MotionsChuan Guo, Yuxuan Mu, Muhammad Gohar Javed, Sen Wang, Li Cheng[pdf] [supp] [arXiv] 


Fitting Flats to FlatsGabriel Dogadov, Ugo Finnendahl, Marc Alexa[pdf] [supp] 


Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud MatchingMatteo Bastico, Etienne Decencière, Laurent Corté, Yannick Tillier, David Ryckelynck[pdf] [supp] 


Scaling Up Video Summarization Pretraining with Large Language ModelsDawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung[pdf] [arXiv] 


Continuous Optical Zooming: A Benchmark for Arbitrary-Scale Image Super-Resolution in Real WorldHuiyuan Fu, Fei Peng, Xianwei Li, Yejun Li, Xin Wang, Huadong Ma[pdf] 


Sharingan: A Transformer Architecture for Multi-Person Gaze FollowingSamy Tafasca, Anshul Gupta, Jean-Marc Odobez[pdf] [supp] 


Open-Vocabulary Segmentation with Semantic-Assisted CalibrationYong Liu, Sule Bai, Guanbin Li, Yitong Wang, Yansong Tang[pdf] [arXiv] 


Towards a Perceptual Evaluation Framework for Lighting EstimationJustine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy, Javier Vazquez-Corral, Jean-François Lalonde[pdf] [arXiv] 


On Exact Inversion of DPM-SolversSeongmin Hong, Kyeonghyun Lee, Suh Yoon Jeon, Hyewon Bae, Se Young Chun[pdf] [supp] [arXiv] 


CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-driven Video EditingGuiwei Zhang, Tianyu Zhang, Guanglin Niu, Zichang Tan, Yalong Bai, Qing Yang[pdf] [supp] 


FocSAM: Delving Deeply into Focused Objects in Segmenting AnythingYou Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji[pdf] [supp] [arXiv] 


PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion ModelsFei Deng, Qifei Wang, Wei Wei, Tingbo Hou, Matthias Grundmann[pdf] [supp] [arXiv] 


Task-Customized Mixture of Adapters for General Image FusionPengfei Zhu, Yang Sun, Bing Cao, Qinghua Hu[pdf] [supp] [arXiv] 


Artist-Friendly Relightable and Animatable Neural HeadsYingyan Xu, Prashanth Chandran, Sebastian Weiss, Markus Gross, Gaspard Zoss, Derek Bradley[pdf] [supp] [arXiv] 


From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze EstimationYiwei Bao, Feng Lu[pdf] 


Boosting Image Restoration via Priors from Pre-trained ModelsXiaogang Xu, Shu Kong, Tao Hu, Zhe Liu, Hujun Bao[pdf] [arXiv] 


VRetouchEr: Learning Cross-frame Feature Interdependence with Imperfection Flow for Face Retouching in VideosWen Xue, Le Jiang, Lianxin Xie, Si Wu, Yong Xu, Hau San Wong[pdf] [supp] 


Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural DecoderJinseok Kim, Tae-Kyun Kim[pdf] [supp] [arXiv] 


Cache Me if You Can: Accelerating Diffusion Models through Block CachingFelix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, Jialiang Wang[pdf] [supp] [arXiv] 


Identifying Important Group of Pixels using InteractionsKosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera[pdf] [supp] [arXiv] 


DIOD: Self-Distillation Meets Object DiscoverySandra Kara, Hejer Ammar, Julien Denize, Florian Chabot, Quoc-Cuong Pham[pdf] [supp] 


GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-MeshJing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G. Schwing, Shenlong Wang[pdf] [supp] [arXiv] 


Neural Redshift: Random Networks are not Random FunctionsDamien Teney, Armand Mihai Nicolicioiu, Valentin Hartmann, Ehsan Abbasnejad[pdf] [supp] [arXiv] 


HumanGaussian: Text-Driven 3D Human Generation with Gaussian SplattingXian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu[pdf] [supp] [arXiv] 


CosmicMan: A Text-to-Image Foundation Model for HumansShikai Li, Jianglin Fu, Kaiyuan Liu, Wentao Wang, Kwan-Yee Lin, Wayne Wu[pdf] [supp] [arXiv] 


JDEC: JPEG Decoding via Enhanced Continuous Cosine CoefficientsWoo Kyoung Han, Sunghoon Im, Jaedeok Kim, Kyong Hwan Jin[pdf] [supp] [arXiv] 


HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual EnvironmentJuze Zhang, Jingyan Zhang, Zining Song, Zhanhe Shi, Chengfeng Zhao, Ye Shi, Jingyi Yu, Lan Xu, Jingya Wang[pdf] [supp] 


Interactive3D: Create What You Want by Interactive 3D GenerationShaocong Dong, Lihe Ding, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu[pdf] [supp] [arXiv] 


OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic VideosDongyoung Choi, Hyeonjoong Jang, Min H. Kim[pdf] [supp] [arXiv] 


Semantic Human Mesh Reconstruction with TexturesXiaoyu Zhan, Jianxin Yang, Yuanqi Li, Jie Guo, Yanwen Guo, Wenping Wang[pdf] [supp] [arXiv] 


PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image ModelsYiming Zhang, Zhening Xing, Yanhong Zeng, Youqing Fang, Kai Chen[pdf] [supp] [arXiv] 


NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFsMichael Fischer, Zhengqin Li, Thu Nguyen-Phuoc, Aljaz Bozic, Zhao Dong, Carl Marshall, Tobias Ritschel[pdf] [supp] [arXiv] 


Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-OnXu Yang, Changxing Ding, Zhibin Hong, Junhao Huang, Jin Tao, Xiangmin Xu[pdf] [supp] [arXiv] 


Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel ApproachGuoqiang Liang, Kanghao Chen, Hangyu Li, Yunfan Lu, Lin Wang[pdf] [supp] [arXiv] 


From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera CalibrationZekun Qian, Ruize Han, Wei Feng, Song Wang[pdf] [supp] 


Enhancing Video Super-Resolution via Implicit Resampling-based AlignmentKai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi, Angela Yao[pdf] [supp] [arXiv] 


Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything ModelZelin Peng, Zhengqin Xu, Zhilin Zeng, Lingxi Xie, Qi Tian, Wei Shen[pdf] [supp] [arXiv] 


Masked and Shuffled Blind Spot Denoising for Real-World ImagesHamadi Chihaoui, Paolo Favaro[pdf] [supp] [arXiv] 


DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head AvatarsTobias Kirschstein, Simon Giebenhain, Matthias Nießner[pdf] [supp] 


Data-Free Quantization via Pseudo-label FilteringChunxiao Fan, Ziqi Wang, Dan Guo, Meng Wang[pdf] 


Generative Powers of TenXiaojuan Wang, Janne Kontkanen, Brian Curless, Steven M. Seitz, Ira Kemelmacher-Shlizerman, Ben Mildenhall, Pratul Srinivasan, Dor Verbin, Aleksander Holynski[pdf] [supp] [arXiv] 


Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image SynthesisFeifan Xu, Rui Li, Si Wu, Yong Xu, Hau San Wong[pdf] 


Correcting Diffusion Generation through ResamplingYujian Liu, Yang Zhang, Tommi Jaakkola, Shiyu Chang[pdf] [supp] [arXiv] 


AirPlanes: Accurate Plane Estimation via 3D-Consistent EmbeddingsJamie Watson, Filippo Aleotti, Mohamed Sayed, Zawar Qureshi, Oisin Mac Aodha, Gabriel Brostow, Michael Firman, Sara Vicente[pdf] 


Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown DomainsBang-Dang Pham, Phong Tran, Anh Tran, Cuong Pham, Rang Nguyen, Minh Hoai[pdf] [supp] [arXiv] 


Exploring Vision Transformers for 3D Human Motion-Language Models with Motion PatchesQing Yu, Mikihiro Tanaka, Kent Fujiwara[pdf] [supp] [arXiv] 


Clustering for Protein Representation LearningRuijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang[pdf] [supp] [arXiv] 


CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic SegmentationBoyuan Sun, Yuqi Yang, Le Zhang, Ming-Ming Cheng, Qibin Hou[pdf] [arXiv] 


Estimating Extreme 3D Image Rotations using Cascaded AttentionShay Dekel, Yosi Keller, Martin Cadik[pdf] [supp] 


Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image RestorationShihao Zhou, Duosheng Chen, Jinshan Pan, Jinglei Shi, Jufeng Yang[pdf] 


VINECS: Video-based Neural Character SkinningZhouyingcheng Liao, Vladislav Golyanik, Marc Habermann, Christian Theobalt[pdf] [supp] [arXiv] 


Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion ModelsNikita Starodubcev, Dmitry Baranchuk, Artem Fedorov, Artem Babenko[pdf] [supp] [arXiv] 


SHViT: Single-Head Vision Transformer with Memory Efficient Macro DesignSeokju Yun, Youngmin Ro[pdf] [supp] [arXiv] 


CommonCanvas: Open Diffusion Models Trained on Creative-Commons ImagesAaron Gokaslan, A. Feder Cooper, Jasmine Collins, Landan Seguin, Austin Jacobson, Mihir Patel, Jonathan Frankle, Cory Stephenson, Volodymyr Kuleshov[pdf] [supp] 


Prompt-Driven Referring Image Segmentation with Instance ContrastingChao Shang, Zichen Song, Heqian Qiu, Lanxiao Wang, Fanman Meng, Hongliang Li[pdf] 


Image Sculpting: Precise Object Editing with 3D Geometry ControlJiraphon Yenphraphai, Xichen Pan, Sainan Liu, Daniele Panozzo, Saining Xie[pdf] [supp] [arXiv] 


PFStorer: Personalized Face Restoration and Super-ResolutionTuomas Varanka, Tapani Toivonen, Soumya Tripathy, Guoying Zhao, Erman Acar[pdf] [supp] [arXiv] 


TextureDreamer: Image-Guided Texture Synthesis Through Geometry-Aware DiffusionYu-Ying Yeh, Jia-Bin Huang, Changil Kim, Lei Xiao, Thu Nguyen-Phuoc, Numair Khan, Cheng Zhang, Manmohan Chandraker, Carl S Marshall, Zhao Dong, Zhengqin Li[pdf] [supp] [arXiv] 


Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature EnhancementKangmin Xu, Liang Liao, Jing Xiao, Chaofeng Chen, Haoning Wu, Qiong Yan, Weisi Lin[pdf] [supp] 


Attention Calibration for Disentangled Text-to-Image PersonalizationYanbing Zhang, Mengping Yang, Qin Zhou, Zhe Wang[pdf] [supp] [arXiv] 


One-Shot Structure-Aware Stylized Image SynthesisHansam Cho, Jonghyun Lee, Seunggyu Chang, Yonghyun Jeong[pdf] [supp] [arXiv] 


MR-VNet: Media Restoration using Volterra NetworksSiddharth Roheda, Amit Unde, Loay Rashid[pdf] 


Single Mesh Diffusion Models with Field Latents for Texture GenerationThomas W. Mitchel, Carlos Esteves, Ameesh Makadia[pdf] [supp] [arXiv] 


SAI3D: Segment Any Instance in 3D ScenesYingda Yin, Yuzheng Liu, Yang Xiao, Daniel Cohen-Or, Jingwei Huang, Baoquan Chen[pdf] [supp] [arXiv] 


TexOct: Generating Textures of 3D Models with Octree-based DiffusionJialun Liu, Chenming Wu, Xinqi Liu, Xing Liu, Jinbo Wu, Haotian Peng, Chen Zhao, Haocheng Feng, Jingtuo Liu, Errui Ding[pdf] [supp] 


Anatomically Constrained Implicit Face ModelsPrashanth Chandran, Gaspard Zoss[pdf] [supp] [arXiv] 


Capturing Closely Interacted Two-Person Motions with Reaction PriorsQi Fang, Yinghui Fan, Yanjun Li, Junting Dong, Dingwei Wu, Weidong Zhang, Kang Chen[pdf] [supp] 


RobustSAM: Segment Anything Robustly on Degraded ImagesWei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhou Ma, Jian Wang[pdf] [supp] 


In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face EditingYiran Xu, Zhixin Shu, Cameron Smith, Seoung Wug Oh, Jia-Bin Huang[pdf] [supp] 


Combining Frame and GOP Embeddings for Neural Video RepresentationJens Eirik Saethre, Roberto Azevedo, Christopher Schroers[pdf] [supp] 


Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAMPingping Zhang, Tianyu Yan, Yang Liu, Huchuan Lu[pdf] [supp] [arXiv] 


Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent AlignersYazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen[pdf] [supp] [arXiv] 


Objects as Volumes: A Stochastic Geometry View of Opaque SolidsBailey Miller, Hanyu Chen, Alice Lai, Ioannis Gkioulekas[pdf] [supp] [arXiv] 


Improving Subject-Driven Image Synthesis with Subject-Agnostic GuidanceKelvin C.K. Chan, Yang Zhao, Xuhui Jia, Ming-Hsuan Yang, Huisheng Wang[pdf] [supp] [arXiv] 


Diffusion Model Alignment Using Direct Preference OptimizationBram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik[pdf] [supp] [arXiv] 


ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single ImageKyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu[pdf] [supp] [arXiv] 


Restoration by Generation with Constrained PriorsZheng Ding, Xuaner Zhang, Zhuowen Tu, Zhihao Xia[pdf] [supp] [arXiv] 


Blur-aware Spatio-temporal Sparse Transformer for Video DeblurringHuicong Zhang, Haozhe Xie, Hongxun Yao[pdf] [supp] 


DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive DiffusionTom Van Wouwe, Seunghwan Lee, Antoine Falisse, Scott Delp, C. Karen Liu[pdf] [supp] [arXiv] 


MANUS: Markerless Grasp Capture using Articulated 3D GaussiansChandradeep Pokhariya, Ishaan Nikhil Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar[pdf] [supp] [arXiv] 


BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene GenerationQihang Zhang, Yinghao Xu, Yujun Shen, Bo Dai, Bolei Zhou, Ceyuan Yang[pdf] [supp] [arXiv] 


3D Facial Expressions through Analysis-by-Neural-SynthesisGeorge Retsinas, Panagiotis P. Filntisis, Radek Danecek, Victoria F. Abrevaya, Anastasios Roussos, Timo Bolkart, Petros Maragos[pdf] [supp] [arXiv] 


Unleashing the Potential of SAM for Medical Adaptation via Hierarchical DecodingZhiheng Cheng, Qingyue Wei, Hongru Zhu, Yan Wang, Liangqiong Qu, Wei Shao, Yuyin Zhou[pdf] [supp] [arXiv] 


Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion NetworkSizhe Zheng, Pan Gao, Peng Zhou, Jie Qin[pdf] [supp] 


Towards Progressive Multi-Frequency Representation for Image WarpingJun Xiao, Zihang Lyu, Cong Zhang, Yakun Ju, Changjian Shui, Kin-Man Lam[pdf] 


Learning to Control Camera Exposure via Reinforcement LearningKyunghyun Lee, Ukcheol Shin, Byeong-Uk Lee[pdf] [supp] [arXiv] 


RNb-NeuS: Reflectance and Normal-based Multi-View 3D ReconstructionBaptiste Brument, Robin Bruneau, Yvain Quéau, Jean Mélou, François Bernard Lauze, Jean-Denis Durou, Lilian Calvet[pdf] [supp] 


Scaling Up Dynamic Human-Scene Interaction ModelingNan Jiang, Zhiyuan Zhang, Hongjie Li, Xiaoxuan Ma, Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Siyuan Huang[pdf] [supp] [arXiv] 


Semantic-aware SAM for Point-Prompted Instance SegmentationZhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han[pdf] [supp] [arXiv] 


Make Pixels Dance: High-Dynamic Video GenerationYan Zeng, Guoqiang Wei, Jiani Zheng, Jiaxin Zou, Yang Wei, Yuchen Zhang, Hang Li[pdf] [supp] [arXiv] 


A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural NetworkRuichen Ma, Guanchao Qiao, Yian Liu, Liwei Meng, Ning Ning, Yang Liu, Shaogang Hu[pdf] 


Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part RepresentationsDaan de Geus, Gijs Dubbelman[pdf] [supp] 


From Activation to Initialization: Scaling Insights for Optimizing Neural FieldsHemanth Saratchandran, Sameera Ramasinghe, Simon Lucey[pdf] [supp] [arXiv] 


DiffAvatar: Simulation-Ready Garment Optimization with Differentiable SimulationYifei Li, Hsiao-yu Chen, Egor Larionov, Nikolaos Sarafianos, Wojciech Matusik, Tuur Stuyck[pdf] [supp] [arXiv] 


AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement LearningDuojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma, Guanbin Li[pdf] [supp] 


Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-ResolutionZhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei[pdf] [arXiv] 


Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural NetworkAihua Mao, Biao Yan, Zijing Ma, Ying He[pdf] [supp] 


HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion ModelsLi Pang, Xiangyu Rui, Long Cui, Hongzhong Wang, Deyu Meng, Xiangyong Cao[pdf] [supp] 


FreeDrag: Feature Dragging for Reliable Point-based Image EditingPengyang Ling, Lin Chen, Pan Zhang, Huaian Chen, Yi Jin, Jinjin Zheng[pdf] [supp] [arXiv] 


Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)Tsu-Ching Hsiao, Hao-Wei Chen, Hsuan-Kung Yang, Chun-Yi Lee[pdf] [arXiv] 


DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene GenerationXiaoliang Ju, Zhaoyang Huang, Yijin Li, Guofeng Zhang, Yu Qiao, Hongsheng Li[pdf] [supp] [arXiv] 


MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-LabelingXuzhe Zhang, Yuhao Wu, Elsa Angelini, Ang Li, Jia Guo, Jerod M. Rasmussen, Thomas G. O'Connor, Pathik D. Wadhwa, Andrea Parolin Jackowski, Hai Li, Jonathan Posner, Andrew F. Laine, Yun Wang[pdf] [supp] [arXiv] 


DaReNeRF: Direction-aware Representation for Dynamic ScenesAnge Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Terrence Chen, Jack Noble, Ziyan Wu[pdf] [supp] [arXiv] 


SfmCAD: Unsupervised CAD Reconstruction by Learning Sketch-based Feature Modeling OperationsPu Li, Jianwei Guo, Huibin Li, Bedrich Benes, Dong-Ming Yan[pdf] [supp] 


Learning Degradation-unaware Representation with Prior-based Latent Transformations for Blind Face RestorationLianxin Xie, Csbingbing Zheng, Wen Xue, Le Jiang, Cheng Liu, Si Wu, Hau San Wong[pdf] 


Faces that Speak: Jointly Synthesising Talking Face and Speech from TextYoungjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung[pdf] [supp] [arXiv] 


DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression ApproachDayi Tan, Hansheng Chen, Wei Tian, Lu Xiong[pdf] [supp] 


Memory-Scalable and Simplified Functional Map LearningRobin Magnet, Maks Ovsjanikov[pdf] [supp] [arXiv] 


Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic GaussiansYuelang Xu, Benwang Chen, Zhe Li, Hongwen Zhang, Lizhen Wang, Zerong Zheng, Yebin Liu[pdf] [supp] [arXiv] 


Stratified Avatar Generation from Sparse ObservationsHan Feng, Wenchao Ma, Quankai Gao, Xianwei Zheng, Nan Xue, Huijuan Xu[pdf] [supp] [arXiv] 


Rewrite the StarsXu Ma, Xiyang Dai, Yue Bai, Yizhou Wang, Yun Fu[pdf] [supp] [arXiv] 


PairDETR : Joint Detection and Association of Human Bodies and FacesAmmar Ali, Georgii Gaikov, Denis Rybalchenko, Alexander Chigorin, Ivan Laptev, Sergey Zagoruyko[pdf] [supp] 


SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame InterpolationJiaben Chen, Huaizu Jiang[pdf] [arXiv] 


Text2HOI: Text-guided 3D Motion Generation for Hand-Object InteractionJunuk Cha, Jihyeon Kim, Jae Shin Yoon, Seungryul Baek[pdf] [supp] [arXiv] 


MACE: Mass Concept Erasure in Diffusion ModelsShilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, Adams Wai-Kin Kong[pdf] [supp] [arXiv] 


PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral ConvolutionHonghao Chen, Xiangxiang Chu, Yongjian Ren, Xin Zhao, Kaiqi Huang[pdf] [supp] [arXiv] 


AiOS: All-in-One-Stage Expressive Human Pose and Shape EstimationQingping Sun, Yanjun Wang, Ailing Zeng, Wanqi Yin, Chen Wei, Wenjia Wang, Haiyi Mei, Chi-Sing Leung, Ziwei Liu, Lei Yang, Zhongang Cai[pdf] [supp] [arXiv] 


Design2Cloth: 3D Cloth Generation from 2D MasksJiali Zheng, Rolandos Alexandros Potamias, Stefanos Zafeiriou[pdf] [supp] [arXiv] 


Amodal Completion via Progressive Mixed Context DiffusionKatherine Xu, Lingzhi Zhang, Jianbo Shi[pdf] [supp] [arXiv] 


Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic FeaturesNiladri Shekhar Dutt, Sanjeev Muralikrishnan, Niloy J. Mitra[pdf] [supp] [arXiv] 


Cinematic Behavior Transfer via NeRF-based Differentiable FilmingXuekun Jiang, Anyi Rao, Jingbo Wang, Dahua Lin, Bo Dai[pdf] [arXiv] 


Text-Driven Image Editing via Learnable RegionsYuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang[pdf] [arXiv] 


Relation Rectification in Diffusion ModelYinwei Wu, Xingyi Yang, Xinchao Wang[pdf] [supp] [arXiv] 


Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted CameraJiye Lee, Hanbyul Joo[pdf] [supp] [arXiv] 


Fast ODE-based Sampling for Diffusion Models in Around 5 StepsZhenyu Zhou, Defang Chen, Can Wang, Chun Chen[pdf] [supp] [arXiv] 


CLiC: Concept Learning in ContextMehdi Safaee, Aryan Mikaeili, Or Patashnik, Daniel Cohen-Or, Ali Mahdavi-Amiri[pdf] [supp] [arXiv] 


CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided AttentionMohammad Sadil Khan, Elona Dupont, Sk Aziz Ali, Kseniya Cherenkova, Anis Kacem, Djamila Aouada[pdf] [supp] 


CLIB-FIQA: Face Image Quality Assessment with Confidence CalibrationFu-Zhao Ou, Chongyi Li, Shiqi Wang, Sam Kwong[pdf] 


Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion ModelsKota Sueyoshi, Takashi Matsubara[pdf] [supp] [arXiv] 


MoML: Online Meta Adaptation for 3D Human Motion PredictionXiaoning Sun, Huaijiang Sun, Bin Li, Dong Wei, Weiqing Li, Jianfeng Lu[pdf] [supp] 


CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion ModelJianhao Zeng, Dan Song, Weizhi Nie, Hongshuo Tian, Tongtong Wang, An-An Liu[pdf] [supp] 


Synergistic Global-space Camera and Human Reconstruction from VideosYizhou Zhao, Tuanfeng Yang Wang, Bhiksha Raj, Min Xu, Jimei Yang, Chun-Hao Paul Huang[pdf] [supp] [arXiv] 


3D Face Reconstruction with the Geometric Guidance of Facial Part SegmentationZidu Wang, Xiangyu Zhu, Tianshuo Zhang, Baiqin Wang, Zhen Lei[pdf] [supp] [arXiv] 


FreeU: Free Lunch in Diffusion U-NetChenyang Si, Ziqi Huang, Yuming Jiang, Ziwei Liu[pdf] [supp] 


ViewDiff: 3D-Consistent Image Generation with Text-to-Image ModelsLukas Höllein, Aljaž Boži?, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner[pdf] [supp] 


Diffusion Models Without AttentionJing Nathan Yan, Jiatao Gu, Alexander M. Rush[pdf] [arXiv] 


Emotional Speech-driven 3D Body Animation via Disentangled Latent DiffusionKiran Chhatre, Radek Dan??ek, Nikos Athanasiou, Giorgio Becherini, Christopher Peters, Michael J. Black, Timo Bolkart[pdf] [supp] 


Retrieval-Augmented Layout Transformer for Content-Aware Layout GenerationDaichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi, Kiyoharu Aizawa[pdf] [supp] [arXiv] 


InstantBooth: Personalized Text-to-Image Generation without Test-Time FinetuningJing Shi, Wei Xiong, Zhe Lin, Hyun Joon Jung[pdf] [supp] [arXiv] 


SD2Event:Self-supervised Learning of Dynamic Detectors and Contextual Descriptors for Event CamerasYuan Gao, Yuqing Zhu, Xinjun Li, Yimin Du, Tianzhu Zhang[pdf] 


PaReNeRF: Toward Fast Large-scale Dynamic NeRF with Patch-based ReferenceXiao Tang, Min Yang, Penghui Sun, Hui Li, Yuchao Dai, Feng Zhu, Hojae Lee[pdf] [supp] 


Affine Equivariant Networks Based on Differential InvariantsYikang Li, Yeqing Qiu, Yuxuan Chen, Lingshen He, Zhouchen Lin[pdf] [supp] 


Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image PersonalizationJimyeong Kim, Jungwon Park, Wonjong Rhee[pdf] [supp] [arXiv] 


Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion ModelsJiayi Guo, Xingqian Xu, Yifan Pu, Zanlin Ni, Chaofei Wang, Manushree Vasu, Shiji Song, Gao Huang, Humphrey Shi[pdf] [supp] [arXiv] 


FlowIE: Efficient Image Enhancement via Rectified FlowYixuan Zhu, Wenliang Zhao, Ao Li, Yansong Tang, Jie Zhou, Jiwen Lu[pdf] [supp] 


Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder ArchitectureHuijie Zhang, Yifu Lu, Ismail Alkhouri, Saiprasad Ravishankar, Dogyoon Song, Qing Qu[pdf] [supp] 


In-Context MattingHe Guo, Zixuan Ye, Zhiguo Cao, Hao Lu[pdf] [supp] [arXiv] 


DemoCaricature: Democratising Caricature Generation with a Rough SketchDar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley, Aneeshan Sain, Pinaki Nath Chowdhury, Yi-Zhe Song[pdf] [supp] [arXiv] 


CapHuman: Capture Your Moments in Parallel UniversesChao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang[pdf] [supp] [arXiv] 


SDPose: Tokenized Pose Estimation via Circulation-Guide Self-DistillationSichen Chen, Yingyi Zhang, Siming Huang, Ran Yi, Ke Fan, Ruixin Zhang, Peixian Chen, Jun Wang, Shouhong Ding, Lizhuang Ma[pdf] [supp] [arXiv] 


Authentic Hand Avatar from a Phone Scan via Universal Hand ModelGyeongsik Moon, Weipeng Xu, Rohan Joshi, Chenglei Wu, Takaaki Shiratori[pdf] [supp] [arXiv] 


Open-World Semantic Segmentation Including Class SimilarityMatteo Sodano, Federico Magistri, Lucas Nunes, Jens Behley, Cyrill Stachniss[pdf] [supp] [arXiv] 


Towards Memorization-Free Diffusion ModelsChen Chen, Daochang Liu, Chang Xu[pdf] [supp] [arXiv] 


IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame InterpolationMengshun Hu, Kui Jiang, Zhihang Zhong, Zheng Wang, Yinqiang Zheng[pdf] 


KeyPoint Relative Position Encoding for Face RecognitionMinchul Kim, Yiyang Su, Feng Liu, Anil Jain, Xiaoming Liu[pdf] [supp] [arXiv] 


Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric CharacteristicsXingtao Wang, Hongliang Wei, Xiaopeng Fan, Debin Zhao[pdf] [supp] 


Beyond First-Order Tweedie: Solving Inverse Problems using Latent DiffusionLitu Rout, Yujia Chen, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu[pdf] [supp] [arXiv] 


Rethinking the Objectives of Vector-Quantized Tokenizers for Image SynthesisYuchao Gu, Xintao Wang, Yixiao Ge, Ying Shan, Mike Zheng Shou[pdf] [supp] [arXiv] 


Continuous Pose for Monocular Cameras in Neural Implicit RepresentationQi Ma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool[pdf] [supp] [arXiv] 


D^4: Dataset Distillation via Disentangled Diffusion ModelDuo Su, Junjie Hou, Weizhi Gao, Yingjie Tian, Bowen Tang[pdf] [supp] 


360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion ModelQian Wang, Weiqi Li, Chong Mou, Xinhua Cheng, Jian Zhang[pdf] [supp] [arXiv] 


RankMatch: Exploring the Better Consistency Regularization for Semi-supervised Semantic SegmentationHuayu Mai, Rui Sun, Tianzhu Zhang, Feng Wu[pdf] 


DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic SegmentationYuanchen Wu, Xichen Ye, Kequan Yang, Jide Li, Xiaoqiang Li[pdf] [supp] [arXiv] 


SurMo: Surface-based 4D Motion Modeling for Dynamic Human RenderingTao Hu, Fangzhou Hong, Ziwei Liu[pdf] [supp] [arXiv] 


Hierarchical Spatio-temporal Decoupling for Text-to-Video GenerationZhiwu Qing, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yujie Wei, Yingya Zhang, Changxin Gao, Nong Sang[pdf] [supp] [arXiv] 


PLACE: Adaptive Layout-Semantic Fusion for Semantic Image SynthesisZhengyao Lv, Yuxiang Wei, Wangmeng Zuo, Kwan-Yee K. Wong[pdf] [supp] [arXiv] 


Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World ScenariosShiyan Chen, Jiyuan Zhang, Zhaofei Yu, Tiejun Huang[pdf] [supp] [arXiv] 


Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion DeblurringXin Gao, Tianheng Qiu, Xinyu Zhang, Hanlin Bai, Kang Liu, Xuan Huang, Hu Wei, Guoying Zhang, Huaping Liu[pdf] [supp] [arXiv] 


MaskPLAN: Masked Generative Layout Planning from Partial InputHang Zhang, Anton Savov, Benjamin Dillenburger[pdf] [supp] 


HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse ObservationsPeng Dai, Yang Zhang, Tao Liu, Zhen Fan, Tianyuan Du, Zhuo Su, Xiaozheng Zheng, Zeming Li[pdf] [supp] 


Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention Alignment and Prompt TuningLeslie Ching Ow Tiong, Dick Sigmund, Chen-Hui Chan, Andrew Beng Jin Teoh[pdf] [supp] 


Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units RecognitionZihan Wang, Siyang Song, Cheng Luo, Songhe Deng, Weicheng Xie, Linlin Shen[pdf] [supp] [arXiv] 


EventEgo3D: 3D Human Motion Capture from Egocentric Event StreamsChristen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik[pdf] [supp] [arXiv] 


A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified BenchmarkJakub Paplhám, Vojt?ch Franc[pdf] [supp] 


CosalPure: Learning Concept from Group Images for Robust Co-Saliency DetectionJiayi Zhu, Qing Guo, Felix Juefei-Xu, Yihao Huang, Yang Liu, Geguang Pu[pdf] [arXiv] 


MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature PerturbationSumanth Udupa, Prajwal Gurunath, Aniruddh Sikdar, Suresh Sundaram[pdf] [supp] 


MotionEditor: Editing Video Motion via Content-Aware DiffusionShuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang[pdf] [supp] [arXiv] 


Doubly Abductive Counterfactual Inference for Text-based Image EditingXue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiang[pdf] [supp] [arXiv] 


Normalizing Flows on the Product Space of SO(3) Manifolds for Probabilistic Human Pose ModelingOlaf Dünkel, Tim Salzmann, Florian Pfaff[pdf] [supp] 


ReGenNet: Towards Human Action-Reaction SynthesisLiang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, Wenjun Zeng[pdf] [supp] [arXiv] 


A Simple Baseline for Efficient Hand Mesh ReconstructionZhishan Zhou, Shihao Zhou, Zhi Lv, Minqiang Zou, Yao Tang, Jiajun Liang[pdf] [arXiv] 


PhotoMaker: Customizing Realistic Human Photos via Stacked ID EmbeddingZhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan[pdf] [supp] [arXiv] 


Score-Guided Diffusion for 3D Human RecoveryAnastasis Stathopoulos, Ligong Han, Dimitris Metaxas[pdf] [supp] [arXiv] 


Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image GenerationBiao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu[pdf] [supp] [arXiv] 


Pose-Transformed Equivariant Network for 3D Point Trajectory PredictionRuixuan Yu, Jian Sun[pdf] [supp] 


Revisiting Sampson Approximations for Geometric Estimation ProblemsFelix Rydell, Angélica Torres, Viktor Larsson[pdf] [supp] [arXiv] 


Fixed Point Diffusion ModelsXingjian Bai, Luke Melas-Kyriazi[pdf] [supp] [arXiv] 


Residual Learning in Diffusion ModelsJunyu Zhang, Daochang Liu, Eunbyung Park, Shichao Zhang, Chang Xu[pdf] 


Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer ExamplesYuyang Yu, Bangzhen Liu, Chenxi Zheng, Xuemiao Xu, Huaidong Zhang, Shengfeng He[pdf] [supp] 


Exploiting Style Latent Flows for Generalizing Deepfake Video DetectionJongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, Jongwon Choi[pdf] [supp] [arXiv] 


Video-P2P: Video Editing with Cross-attention ControlShaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia[pdf] [supp] 


Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic SegmentationFeilong Tang, Zhongxing Xu, Zhaojun Qu, Wei Feng, Xingjian Jiang, Zongyuan Ge[pdf] [supp] [arXiv] 


PIE-NeRF: Physics-based Interactive Elastodynamics with NeRFYutao Feng, Yintong Shang, Xuan Li, Tianjia Shao, Chenfanfu Jiang, Yin Yang[pdf] [supp] 


FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian EmbeddingJun Xiang, Xuan Gao, Yudong Guo, Juyong Zhang[pdf] [supp] [arXiv] 


ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light ImagesYiqi Shi, Duo Liu, Liguo Zhang, Ye Tian, Xuezhi Xia, Xiaojing Fu[pdf] [supp] 


FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion ModelsJinglin Xu, Yijie Guo, Yuxin Peng[pdf] [arXiv] 


DreamPropeller: Supercharge Text-to-3D Generation with Parallel SamplingLinqi Zhou, Andy Shih, Chenlin Meng, Stefano Ermon[pdf] [supp] [arXiv] 


Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMsHao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Tat-Seng Chua[pdf] [supp] 


General Object Foundation Model for Images and Videos at ScaleJunfeng Wu, Yi Jiang, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai[pdf] [supp] [arXiv] 


Inlier Confidence Calibration for Point Cloud RegistrationYongzhe Yuan, Yue Wu, Xiaolong Fan, Maoguo Gong, Qiguang Miao, Wenping Ma[pdf] [supp] 


Readout Guidance: Learning Control from Diffusion FeaturesGrace Luo, Trevor Darrell, Oliver Wang, Dan B Goldman, Aleksander Holynski[pdf] [supp] [arXiv] 


A Unified Approach for Text- and Image-guided 4D Scene GenerationYufeng Zheng, Xueting Li, Koki Nagano, Sifei Liu, Otmar Hilliges, Shalini De Mello[pdf] [supp] 


GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D GaussiansLiangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, Liqiang Nie[pdf] [supp] [arXiv] 


Mosaic-SDF for 3D Generative ModelsLior Yariv, Omri Puny, Oran Gafni, Yaron Lipman[pdf] [supp] [arXiv] 


Diffusion Handles Enabling 3D Edits for Diffusion Models by Lifting Activations to 3DKarran Pandey, Paul Guerrero, Matheus Gadelha, Yannick Hold-Geoffroy, Karan Singh, Niloy J. Mitra[pdf] [supp] [arXiv] 


Friendly Sharpness-Aware MinimizationTao Li, Pan Zhou, Zhengbao He, Xinwen Cheng, Xiaolin Huang[pdf] [supp] [arXiv] 


BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion ModelsFengyuan Shi, Jiaxi Gu, Hang Xu, Songcen Xu, Wei Zhang, Limin Wang[pdf] [supp] [arXiv] 


NC-TTT: A Noise Constrastive Approach for Test-Time TrainingDavid Osowiechi, Gustavo A. Vargas Hakim, Mehrdad Noori, Milad Cheraghalikhani, Ali Bahri, Moslem Yazdanpanah, Ismail Ben Ayed, Christian Desrosiers[pdf] [supp] 


Small Scale Data-Free Knowledge DistillationHe Liu, Yikai Wang, Huaping Liu, Fuchun Sun, Anbang Yao[pdf] 


CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofingAjian Liu, Shuai Xue, Jianwen Gan, Jun Wan, Yanyan Liang, Jiankang Deng, Sergio Escalera, Zhen Lei[pdf] 


Open Vocabulary Semantic Scene Sketch UnderstandingAhmed Bourouis, Judith E. Fan, Yulia Gryaditskaya[pdf] [supp] [arXiv] 


IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray TracingShaofei Wang, Bozidar Antic, Andreas Geiger, Siyu Tang[pdf] [supp] [arXiv] 


Efficient Detection of Long Consistent Cycles and its Application to Distributed SynchronizationShaohan Li, Yunpeng Shi, Gilad Lerman[pdf] [supp] 


Vlogger: Make Your Dream A VlogShaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang[pdf] [supp] [arXiv] 


Neural 3D Strokes: Creating Stylized 3D Scenes with Vectorized 3D StrokesHao-Bin Duan, Miao Wang, Yan-Xun Li, Yong-Liang Yang[pdf] [supp] [arXiv] 


Multi-Object Tracking in the DarkXinzhe Wang, Kang Ma, Qiankun Liu, Yunhao Zou, Ying Fu[pdf] [arXiv] 


UniHuman: A Unified Model For Editing Human Images in the WildNannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin[pdf] [supp] [arXiv] 


DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language ModelLirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji[pdf] [arXiv] 


In Search of a Data Transformation That Accelerates Neural Field TrainingJunwon Seo, Sangyoon Lee, Kwang In Kim, Jaeho Lee[pdf] [supp] [arXiv] 


Zero-Painter: Training-Free Layout Control for Text-to-Image SynthesisMarianna Ohanyan, Hayk Manukyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi[pdf] [supp] 


Towards 3D Vision with Low-Cost Single-Photon CamerasFangzhou Mu, Carter Sifferman, Sacha Jungerman, Yiquan Li, Mark Han, Michael Gleicher, Mohit Gupta, Yin Li[pdf] [supp] [arXiv] 


WonderJourney: Going from Anywhere to EverywhereHong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann[pdf] [supp] [arXiv] 


4D-fy: Text-to-4D Generation Using Hybrid Score Distillation SamplingSherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell[pdf] [supp] 


FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any ConditionSicheng Mo, Fangzhou Mu, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, Bolei Zhou[pdf] [supp] [arXiv] 


VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion ModelsHyeonho Jeong, Geon Yeong Park, Jong Chul Ye[pdf] [supp] [arXiv] 


DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion ModelsMuyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Kai Li, Song Han[pdf] [arXiv] 


AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture SearchJunghyup Lee, Bumsub Ham[pdf] [supp] 


Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle OptimizationTakuhiro Kaneko[pdf] [supp] 


Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual LossJaeha Kim, Junghun Oh, Kyoung Mu Lee[pdf] [supp] [arXiv] 


XCube: Large-Scale 3D Generative Modeling using Sparse Voxel HierarchiesXuanchi Ren, Jiahui Huang, Xiaohui Zeng, Ken Museth, Sanja Fidler, Francis Williams[pdf] [supp] 


Reconstruction-free Cascaded Adaptive Compressive SensingChenxi Qiu, Tao Yue, Xuemei Hu[pdf] 


USE: Universal Segment Embeddings for Open-Vocabulary Image SegmentationXiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han-Wei Shen, Liu Ren[pdf] [supp] 


Functional DiffusionBiao Zhang, Peter Wonka[pdf] [arXiv] 


Wired Perspectives: Multi-View Wire Art Embraces Generative AIZhiyu Qu, Lan Yang, Honggang Zhang, Tao Xiang, Kaiyue Pang, Yi-Zhe Song[pdf] [supp] [arXiv] 


Leveraging Camera Triplets for Efficient and Accurate Structure-from-MotionLalit Manam, Venu Madhav Govindu[pdf] [supp] 


SimDA: Simple Diffusion Adapter for Efficient Video GenerationZhen Xing, Qi Dai, Han Hu, Zuxuan Wu, Yu-Gang Jiang[pdf] [supp] [arXiv] 


Multi-view Aggregation Network for Dichotomous Image SegmentationQian Yu, Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu[pdf] [arXiv] 


A Recipe for Scaling up Text-to-Video Generation with Text-free VideosXiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang[pdf] [supp] [arXiv] 


Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak SupervisionXin Juan, Kaixiong Zhou, Ninghao Liu, Tianlong Chen, Xin Wang[pdf] [supp] 


Residual Denoising Diffusion ModelsJiawei Liu, Qiang Wang, Huijie Fan, Yinong Wang, Yandong Tang, Liangqiong Qu[pdf] [supp] [arXiv] 


Towards Accurate and Robust Architectures via Neural Architecture SearchYuwei Ou, Yuqi Feng, Yanan Sun[pdf] [arXiv] 


Closely Interactive Human Reconstruction with Proxemics and Physics-Guided AdaptionBuzhen Huang, Chen Li, Chongyang Xu, Liang Pan, Yangang Wang, Gim Hee Lee[pdf] [supp] [arXiv] 


Taming Stable Diffusion for Text to 360 Panorama Image GenerationCheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai[pdf] [supp] 


Modular Blind Video Quality AssessmentWen Wen, Mu Li, Yabin Zhang, Yiting Liao, Junlin Li, Li Zhang, Kede Ma[pdf] [arXiv] 


RELI11D: A Comprehensive Multimodal Human Motion Dataset and MethodMing Yan, Yan Zhang, Shuqiang Cai, Shuqi Fan, Xincheng Lin, Yudi Dai, Siqi Shen, Chenglu Wen, Lan Xu, Yuexin Ma, Cheng Wang[pdf] [supp] [arXiv] 


One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature LearningPei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen, Jun-Xiong Chong, Tyng-Luh Liu, Chiou-Ting Hsu[pdf] 


InteractDiffusion: Interaction Control in Text-to-Image Diffusion ModelsJiun Tian Hoe, Xudong Jiang, Chee Seng Chan, Yap-Peng Tan, Weipeng Hu[pdf] [supp] [arXiv] 


Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language ModelsJiayun Luo, Siddhesh Khandelwal, Leonid Sigal, Boyang Li[pdf] [supp] [arXiv] 


SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose EstimationVinkle Srivastav, Keqi Chen, Nicolas Padoy[pdf] [supp] [arXiv] 


Joint2Human: High-Quality 3D Human Generation via Compact Spherical Embedding of 3D JointsMuxin Zhang, Qiao Feng, Zhuo Su, Chao Wen, Zhou Xue, Kun Li[pdf] [supp] [arXiv] 


Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion ModelsXingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, Humphrey Shi[pdf] [supp] [arXiv] 


Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory ConditioningJaewoo Jeong, Daehee Park, Kuk-Jin Yoon[pdf] [supp] [arXiv] 


CLOAF: CoLlisiOn-Aware Human FlowAndrey Davydov, Martin Engilberge, Mathieu Salzmann, Pascal Fua[pdf] [supp] [arXiv] 


Hybrid Functional Maps for Crease-Aware Non-Isometric Shape MatchingLennart Bastian, Yizheng Xie, Nassir Navab, Zorah Lähner[pdf] [supp] [arXiv] 


Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness SamplingJianan Li, Qiulei Dong[pdf] [supp] 


ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content SeparationMoayed Haji-Ali, Guha Balakrishnan, Vicente Ordonez[pdf] [supp] [arXiv] 


Locally Adaptive Neural 3D Morphable ModelsMichail Tarasiou, Rolandos Alexandros Potamias, Eimear O'Sullivan, Stylianos Ploumpis, Stefanos Zafeiriou[pdf] [supp] [arXiv] 


ICON: Incremental CONfidence for Joint Pose and Radiance Field OptimizationWeiyao Wang, Pierre Gleize, Hao Tang, Xingyu Chen, Kevin J Liang, Matt Feiszli[pdf] [supp] [arXiv] 


Learned Scanpaths Aid Blind Panoramic Video Quality AssessmentKanglong Fan, Wen Wen, Mu Li, Yifan Peng, Kede Ma[pdf] [supp] [arXiv] 


TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion ModelsHaomiao Ni, Bernhard Egger, Suhas Lohit, Anoop Cherian, Ye Wang, Toshiaki Koike-Akino, Sharon X. Huang, Tim K. Marks[pdf] [supp] 


iToF-flow-based High Frame Rate Depth ImagingYu Meng, Zhou Xue, Xu Chang, Xuemei Hu, Tao Yue[pdf] 


Relightful Harmonization: Lighting-aware Portrait Background ReplacementMengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, He Zhang[pdf] [supp] [arXiv] 


Mitigating Motion Blur in Neural Radiance Fields with Events and FramesMarco Cannici, Davide Scaramuzza[pdf] [supp] [arXiv] 


TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose RepresentationSai Kumar Dwivedi, Yu Sun, Priyanka Patel, Yao Feng, Michael J. Black[pdf] [supp] [arXiv] 


FaceCom: Towards High-fidelity 3D Facial Shape Completion via Optimization and Inpainting GuidanceYinglong Li, Hongyu Wu, Xiaogang Wang, Qingzhao Qin, Yijiao Zhao, Yong Wang, Aimin Hao[pdf] [supp] 


LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting EstimationXuecan Wang, Shibang Xiao, Xiaohui Liang[pdf] [supp] [arXiv] 


FaceLift: Semi-supervised 3D Facial Landmark LocalizationDavid Ferman, Pablo Garrido, Gaurav Bharaj[pdf] [supp] [arXiv] 


PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic SegmentationXinqiao Zhao, Ziqian Yang, Tianhong Dai, Bingfeng Zhang, Jimin Xiao[pdf] 


Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic SegmentationBingfeng Zhang, Siyue Yu, Yunchao Wei, Yao Zhao, Jimin Xiao[pdf] [supp] 


LAFS: Landmark-based Facial Self-supervised Learning for Face RecognitionZhonglin Sun, Chen Feng, Ioannis Patras, Georgios Tzimiropoulos[pdf] [supp] [arXiv] 


SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic SegmentationBin Xie, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang[pdf] [supp] [arXiv] 


GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical PriorsYuan Dong, Qi Zuo, Xiaodong Gu, Weihao Yuan, Zhengyi Zhao, Zilong Dong, Liefeng Bo, Qixing Huang[pdf] [supp] 


Self-correcting LLM-controlled Diffusion ModelsTsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell[pdf] [supp] [arXiv] 


PACER+: On-Demand Pedestrian Animation Controller in Driving ScenariosJingbo Wang, Zhengyi Luo, Ye Yuan, Yixuan Li, Bo Dai[pdf] [supp] 


LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time RenderingJaehoon Choi, Rajvi Shah, Qinbo Li, Yipeng Wang, Ayush Saraf, Changil Kim, Jia-Bin Huang, Dinesh Manocha, Suhib Alsisan, Johannes Kopf[pdf] [supp] 


Don't Drop Your Samples! Coherence-Aware Training Benefits Conditional DiffusionNicolas Dufour, Victor Besnier, Vicky Kalogeiton, David Picard[pdf] [supp] 


What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze EstimationYihua Cheng, Yaning Zhu, Zongji Wang, Hongquan Hao, Yongwei Liu, Shiqing Cheng, Xi Wang, Hyung Jin Chang[pdf] [supp] [arXiv] 


UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable SetsYoungju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, Sung-Eui Yoon[pdf] [supp] [arXiv] 


Breathing Life Into Sketches Using Text-to-Video PriorsRinon Gal, Yael Vinker, Yuval Alaluf, Amit Bermano, Daniel Cohen-Or, Ariel Shamir, Gal Chechik[pdf] [supp] [arXiv] 


Learning Diffusion Texture Priors for Image RestorationTian Ye, Sixiang Chen, Wenhao Chai, Zhaohu Xing, Jing Qin, Ge Lin, Lei Zhu[pdf] 


Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance FieldsZhiyuan Min, Yawei Luo, Wei Yang, Yuesong Wang, Yi Yang[pdf] [supp] [arXiv] 


YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution DetectionAlon Zolfi, Guy Amit, Amit Baras, Satoru Koda, Ikuya Morikawa, Yuval Elovici, Asaf Shabtai[pdf] [supp] [arXiv] 


Collaborating Foundation Models for Domain Generalized Semantic SegmentationYasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière[pdf] [supp] [arXiv] 


Towards Variable and Coordinated Holistic Co-Speech Motion GenerationYifei Liu, Qiong Cao, Yandong Wen, Huaiguang Jiang, Changxing Ding[pdf] [supp] [arXiv] 


AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic SegmentationHaonan Wang, Qixiang Zhang, Yi Li, Xiaomeng Li[pdf] [supp] [arXiv] 


SIGNeRF: Scene Integrated Generation for Neural Radiance FieldsJan-Niklas Dihlmann, Andreas Engelhardt, Hendrik Lensch[pdf] [supp] [arXiv] 


Generating Illustrated InstructionsSachit Menon, Ishan Misra, Rohit Girdhar[pdf] [supp] [arXiv] 


Robust Image Denoising through Adversarial Frequency MixupDonghun Ryou, Inju Ha, Hyewon Yoo, Dongwan Kim, Bohyung Han[pdf] [supp] 


AnyScene: Customized Image Synthesis with Composited ForegroundRuidong Chen, Lanjun Wang, Weizhi Nie, Yongdong Zhang, An-An Liu[pdf] [supp] 


Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of ArtifactsCansu Korkmaz, A. Murat Tekalp, Zafer Dogan[pdf] [supp] [arXiv] 


Monocular Identity-Conditioned Facial Reflectance ReconstructionXingyu Ren, Jiankang Deng, Yuhao Cheng, Jia Guo, Chao Ma, Yichao Yan, Wenhan Zhu, Xiaokang Yang[pdf] [arXiv] 


C3: High-Performance and Low-Complexity Neural Compression from a Single Image or VideoHyunjik Kim, Matthias Bauer, Lucas Theis, Jonathan Richard Schwarz, Emilien Dupont[pdf] [supp] [arXiv] 


Revisiting Non-Autoregressive Transformers for Efficient Image SynthesisZanlin Ni, Yulin Wang, Renping Zhou, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Shiji Song, Yuan Yao, Gao Huang[pdf] [supp] 


ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D ImageMarco Pesavento, Yuanlu Xu, Nikolaos Sarafianos, Robert Maier, Ziyan Wang, Chun-Han Yao, Marco Volino, Edmond Boyer, Adrian Hilton, Tony Tung[pdf] [supp] 


Real-Time Simulated Avatar from Head-Mounted SensorsZhengyi Luo, Jinkun Cao, Rawal Khirodkar, Alexander Winkler, Kris Kitani, Weipeng Xu[pdf] [supp] [arXiv] 


Seamless Human Motion Composition with Blended Positional EncodingsGerman Barquero, Sergio Escalera, Cristina Palmero[pdf] [supp] [arXiv] 


FedUV: Uniformity and Variance for Heterogeneous Federated LearningHa Min Son, Moon-Hyun Kim, Tai-Myoung Chung, Chao Huang, Xin Liu[pdf] [supp] [arXiv] 


GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh LearningYe Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal[pdf] [supp] [arXiv] 


Grounding Everything: Emerging Localization Properties in Vision-Language TransformersWalid Bousselham, Felix Petersen, Vittorio Ferrari, Hilde Kuehne[pdf] [supp] [arXiv] 


Mean-Shift Feature TransformerTakumi Kobayashi[pdf] [supp] 


Domain Separation Graph Neural Networks for Saliency Object RankingZijian Wu, Jun Lu, Jing Han, Lianfa Bai, Yi Zhang, Zhuang Zhao, Siyang Song[pdf] [supp] 


RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body ControlXiang Deng, Zerong Zheng, Yuxiang Zhang, Jingxiang Sun, Chao Xu, Xiaodong Yang, Lizhen Wang, Yebin Liu[pdf] [supp] 


Video Prediction by Modeling Videos as Continuous Multi-Dimensional ProcessesGaurav Shrivastava, Abhinav Shrivastava[pdf] [supp] 


PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsignsShuliang Ning, Duomin Wang, Yipeng Qin, Zirong Jin, Baoyuan Wang, Xiaoguang Han[pdf] [supp] [arXiv] 


Towards Robust 3D Pose Transfer with Adversarial LearningHaoyu Chen, Hao Tang, Ehsan Adeli, Guoying Zhao[pdf] [supp] [arXiv] 


EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic SegmentationChanyoung Kim, Woojung Han, Dayun Ju, Seong Jae Hwang[pdf] [supp] [arXiv] 


AVID: Any-Length Video Inpainting with Diffusion ModelZhixing Zhang, Bichen Wu, Xiaoyan Wang, Yaqiao Luo, Luxin Zhang, Yinan Zhao, Peter Vajda, Dimitris Metaxas, Licheng Yu[pdf] [supp] [arXiv] 


NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and MergingTakahiro Shirakawa, Seiichi Uchida[pdf] [supp] [arXiv] 


Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion ModelWenfeng Song, Xingliang Jin, Shuai Li, Chenglizhao Chen, Aimin Hao, Xia Hou, Ning Li, Hong Qin[pdf] [supp] 


ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense PredictionsChunlong Xia, Xinliang Wang, Feng Lv, Xin Hao, Yifeng Shi[pdf] 


PromptCoT: Align Prompt Distribution via Adapted Chain-of-ThoughtJunyi Yao, Yijiang Liu, Zhen Dong, Mingfei Guo, Helan Hu, Kurt Keutzer, Li Du, Daquan Zhou, Shanghang Zhang[pdf] [supp] 


Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and VulnerabilityJaehui Hwang, Junghyuk Lee, Jong-Seok Lee[pdf] [supp] [arXiv] 


GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single ImageChong Bao, Yinda Zhang, Yuan Li, Xiyu Zhang, Bangbang Yang, Hujun Bao, Marc Pollefeys, Guofeng Zhang, Zhaopeng Cui[pdf] [supp] [arXiv] 


Learn to Rectify the Bias of CLIP for Unsupervised Semantic SegmentationJingyun Wang, Guoliang Kang[pdf] [supp] 


Unlocking Pre-trained Image Backbones for Semantic Image SynthesisTariq Berrada Ifriqi, Jakob Verbeek, Camille Couprie, Karteek Alahari[pdf] [supp] 


TexTile: A Differentiable Metric for Texture TileabilityCarlos Rodriguez-Pardo, Dan Casas, Elena Garces, Jorge Lopez-Moreno[pdf] [supp] [arXiv] 


Improving Image Restoration through Removing Degradations in Textual RepresentationsJingbo Lin, Zhilu Zhang, Yuxiang Wei, Dongwei Ren, Dongsheng Jiang, Qi Tian, Wangmeng Zuo[pdf] [supp] [arXiv] 


ZONE: Zero-Shot Instruction-Guided Local EditingShanglin Li, Bohan Zeng, Yutang Feng, Sicheng Gao, Xiuhui Liu, Jiaming Liu, Lin Li, Xu Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang[pdf] [supp] [arXiv] 


U-VAP: User-specified Visual Appearance Personalization via Decoupled Self AugmentationYou Wu, Kean Liu, Xiaoyue Mi, Fan Tang, Juan Cao, Jintao Li[pdf] [supp] 


HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion ModelsMengcheng Li, Hongwen Zhang, Yuxiang Zhang, Ruizhi Shao, Tao Yu, Yebin Liu[pdf] 


Robust Self-calibration of Focal Lengths from the Fundamental MatrixViktor Kocur, Daniel Kyselica, Zuzana Kukelova[pdf] [supp] [arXiv] 


PartDistill: 3D Shape Part Segmentation by Vision-Language Model DistillationArdian Umam, Cheng-Kun Yang, Min-Hung Chen, Jen-Hui Chuang, Yen-Yu Lin[pdf] [supp] [arXiv] 


DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image EditingYujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai[pdf] [supp] [arXiv] 


Addressing Background Context Bias in Few-Shot Segmentation through Iterative ModulationLanyun Zhu, Tianrun Chen, Jianxiong Yin, Simon See, Jun Liu[pdf] [supp] 


TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image EditingSherry X Chen, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Kuo-Chin Lien, Misha Sra, Pradeep Sen[pdf] [supp] 


AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift FactorSudong Cai[pdf] [supp] 


SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection EditingZeyinzi Jiang, Chaojie Mao, Yulin Pan, Zhen Han, Jingfeng Zhang[pdf] [supp] [arXiv] 


BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything ModelYiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma[pdf] [supp] 


Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural RepresentationsXiao Zhang, David Yunis, Michael Maire[pdf] [supp] [arXiv] 


Real-Time Exposure Correction via Collaborative Transformations and Adaptive SamplingZiwen Li, Feng Zhang, Meng Cao, Jinpu Zhang, Yuanjie Shao, Yuehuan Wang, Nong Sang[pdf] [supp] 


Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance PrimitivesRonghui Li, YuXiang Zhang, Yachao Zhang, Hongwen Zhang, Jie Guo, Yan Zhang, Yebin Liu, Xiu Li[pdf] [supp] [arXiv] 


Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake DetectionZhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, Baoyuan Wu[pdf] [supp] [arXiv] 


Scaling Laws of Synthetic Images for Model Training ... for NowLijie Fan, Kaifeng Chen, Dilip Krishnan, Dina Katabi, Phillip Isola, Yonglong Tian[pdf] [supp] [arXiv] 


State Space Models for Event CamerasNikola Zubic, Mathias Gehrig, Davide Scaramuzza[pdf] [supp] [arXiv] 


TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint VideoMinye Wu, Zehao Wang, Georgios Kouros, Tinne Tuytelaars[pdf] [supp] [arXiv] 


Event-assisted Low-Light Video Object SegmentationHebei Li, Jin Wang, Jiahui Yuan, Yue Li, Wenming Weng, Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun[pdf] [supp] [arXiv] 


VidToMe: Video Token Merging for Zero-Shot Video EditingXirui Li, Chao Ma, Xiaokang Yang, Ming-Hsuan Yang[pdf] [supp] [arXiv] 


FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven GenerationPengchong Qiao, Lei Shang, Chang Liu, Baigui Sun, Xiangyang Ji, Jie Chen[pdf] [supp] 


StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-OnJeongho Kim, Guojung Gu, Minho Park, Sunghyun Park, Jaegul Choo[pdf] [supp] [arXiv] 


Make-Your-Anchor: A Diffusion-based 2D Avatar Generation FrameworkZiyao Huang, Fan Tang, Yong Zhang, Xiaodong Cun, Juan Cao, Jintao Li, Tong-Yee Lee[pdf] [supp] 


Learning Dynamic Tetrahedra for High-Quality Talking Head SynthesisZicheng Zhang, Ruobing Zheng, Bonan Li, Congying Han, Tianqi Li, Meng Wang, Tiande Guo, Jingdong Chen, Ziwen Liu, Ming Yang[pdf] [supp] [arXiv] 


3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View SynthesisZhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Min Yang, Xiao Tang, Feng Zhu, Yuchao Dai[pdf] [supp] [arXiv] 


Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-FiKangwei Yan, Fei Wang, Bo Qian, Han Ding, Jinsong Han, Xing Wei[pdf] 


Fairy: Fast Parallelized Instruction-Guided Video-to-Video SynthesisBichen Wu, Ching-Yao Chuang, Xiaoyan Wang, Yichen Jia, Kapil Krishnakumar, Tong Xiao, Feng Liang, Licheng Yu, Peter Vajda[pdf] [supp] [arXiv] 


SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language ModelsYuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan[pdf] [supp] [arXiv] 


It's All About Your Sketch: Democratising Sketch Control in Diffusion ModelsSubhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song[pdf] [supp] 


When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image GenerationXiaoming Li, Xinyu Hou, Chen Change Loy[pdf] [supp] 


CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization PerspectiveShunsuke Yasuki, Masato Taki[pdf] [supp] [arXiv] 


Putting the Object Back into Video Object SegmentationHo Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing[pdf] [supp] [arXiv] 


Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image ModelsGihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron[pdf] [supp] [arXiv] 


Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence MiningJiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C. Kot, Shijian Lu[pdf] [supp] [arXiv] 


DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture GenerationJunming Chen, Yunfei Liu, Jianan Wang, Ailing Zeng, Yu Li, Qifeng Chen[pdf] [supp] [arXiv] 


Animating General Image with Large Visual Motion ModelDengsheng Chen, Xiaoming Wei, Xiaolin Wei[pdf] [supp] 


DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D DataQihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, Alan Yuille[pdf] [supp] 


OHTA: One-shot Hand Avatar via Data-driven Implicit PriorsXiaozheng Zheng, Chao Wen, Zhuo Su, Zeran Xu, Zhaohu Li, Yang Zhao, Zhou Xue[pdf] [supp] [arXiv] 


Human Motion Prediction Under Unexpected PerturbationJiangbei Yue, Baiyi Li, Julien Pettré, Armin Seyfried, He Wang[pdf] [supp] 


Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priorsLihe Ding, Shaocong Dong, Zhanpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixiong Gong, Dan Xu, Tianfan Xue[pdf] [supp] [arXiv] 


Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from TextJunshu Tang, Yanhong Zeng, Ke Fan, Xuheng Wang, Bo Dai, Kai Chen, Lizhuang Ma[pdf] 


Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from TextVasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas, Guanxiong Sun, Jiankang Deng, Stefanos Zafeiriou[pdf] [supp] [arXiv] 


On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation ParadigmPeng Sun, Bei Shi, Daiwei Yu, Tao Lin[pdf] [supp] [arXiv] 


Semantics-aware Motion Retargeting with Vision-Language ModelsHaodong Zhang, Zhike Chen, Haocheng Xu, Lei Hao, Xiaofei Wu, Songcen Xu, Zhensong Zhang, Yue Wang, Rong Xiong[pdf] [supp] [arXiv] 


Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and SamplingLeon Sick, Dominik Engel, Pedro Hermosilla, Timo Ropinski[pdf] [supp] [arXiv] 


RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion ModelsOzgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe, James M. Rehg, Pinar Yanardag[pdf] [supp] [arXiv] 


Video-Based Human Pose Regression via Decoupled Space-Time AggregationJijie He, Wenwu Yang[pdf] [supp] [arXiv] 


L-MAGIC: Language Model Assisted Generation of Images with CoherenceZhipeng Cai, Matthias Mueller, Reiner Birkl, Diana Wofk, Shao-Yen Tseng, Junda Cheng, Gabriela Ben-Melech Stan, Vasudev Lai, Michael Paulitsch[pdf] [supp] 


3D Face Tracking from 2D Video through Iterative Dense UV to Image FlowFelix Taubner, Prashant Raina, Mathieu Tuli, Eu Wern Teh, Chul Lee, Jinmiao Huang[pdf] [supp] [arXiv] 


Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL FinetuningDesai Xie, Jiahao Li, Hao Tan, Xin Sun, Zhixin Shu, Yi Zhou, Sai Bi, Sören Pirk, Arie E. Kaufman[pdf] [supp] [arXiv] 


Shadow Generation for Composite Image Using Diffusion ModelQingyang Liu, Junqi You, Jianting Wang, Xinhao Tao, Bo Zhang, Li Niu[pdf] [supp] [arXiv] 


DisCo: Disentangled Control for Realistic Human Dance GenerationTan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang[pdf] [supp] [arXiv] 


GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective SurfacesYingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, Yuexin Ma[pdf] [supp] [arXiv] 


pix2gestalt: Amodal Segmentation by Synthesizing WholesEge Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick[pdf] 


Weakly Supervised Point Cloud Semantic Segmentation via Artificial OracleHyeokjun Kweon, Jihun Kim, Kuk-Jin Yoon[pdf] [supp] 


Forecasting of 3D Whole-body Human Poses with Grasping ObjectsHaitao Yan, Qiongjie Cui, Jiexin Xie, Shijie Guo[pdf] 


Accelerating Diffusion Sampling with Optimized Time StepsShuchen Xue, Zhaoqiang Liu, Fei Chen, Shifeng Zhang, Tianyang Hu, Enze Xie, Zhenguo Li[pdf] [supp] [arXiv] 


Unsupervised Template-assisted Point Cloud Shape Correspondence NetworkJiacheng Deng, Jiahao Lu, Tianzhu Zhang[pdf] [arXiv] 


Finsler-Laplace-Beltrami Operators with Application to Shape AnalysisSimon Weber, Thomas Dagès, Maolin Gao, Daniel Cremers[pdf] [supp] 


Minimal Perspective AutocalibrationAndrea Porfiri Dal Cin, Timothy Duff, Luca Magri, Tomas Pajdla[pdf] [supp] [arXiv] 


Time- Memory- and Parameter-Efficient Visual AdaptationOtniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid, Anurag Arnab[pdf] [supp] 


Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-SpoofingXun Lin, Shuai Wang, Rizhao Cai, Yizhong Liu, Ying Fu, Wenzhong Tang, Zitong Yu, Alex Kot[pdf] [arXiv] 


Universal Segmentation at Arbitrary Granularity with Language InstructionYong Liu, Cairong Zhang, Yitong Wang, Jiahao Wang, Yujiu Yang, Yansong Tang[pdf] [arXiv] 


Layout-Agnostic Scene Text Image Synthesis with Diffusion ModelsQilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan[pdf] 


SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout ControlJaskirat Singh, Jianming Zhang, Qing Liu, Cameron Smith, Zhe Lin, Liang Zheng[pdf] [supp] [arXiv] 


Customization Assistant for Text-to-Image GenerationYufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Tong Sun[pdf] [supp] [arXiv] 


GenHowTo: Learning to Generate Actions and State Transformations from Instructional VideosTomáš Sou?ek, Dima Damen, Michael Wray, Ivan Laptev, Josef Sivic[pdf] [supp] 


Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based RenderingKim Youwang, Tae-Hyun Oh, Gerard Pons-Moll[pdf] [supp] 


Physics-Aware Hand-Object Interaction DenoisingHaowen Luo, Yunze Liu, Li Yi[pdf] [supp] [arXiv] 


VastGaussian: Vast 3D Gaussians for Large Scene ReconstructionJiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang[pdf] [arXiv] 


Edit One for All: Interactive Batch Image EditingThao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Lee[pdf] [supp] [arXiv] 


Deformable One-shot Face Stylization via DINO Semantic GuidanceYang Zhou, Zichong Chen, Hui Huang[pdf] [supp] [arXiv] 


Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image SynthesisYanzuo Lu, Manlin Zhang, Andy J Ma, Xiaohua Xie, Jianhuang Lai[pdf] [supp] [arXiv] 


OMG: Towards Open-vocabulary Motion Generation via Mixture of ControllersHan Liang, Jiacheng Bao, Ruichi Zhang, Sihan Ren, Yuecheng Xu, Sibei Yang, Xin Chen, Jingyi Yu, Lan Xu[pdf] [supp] [arXiv] 


Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion ModelsHuan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis[pdf] [supp] [arXiv] 


PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic SegmentationJinfeng Xu, Siyuan Yang, Xianzhi Li, Yuan Tang, Yixue Hao, Long Hu, Min Chen[pdf] [arXiv] 


Test-Time Domain Generalization for Face Anti-SpoofingQianyu Zhou, Ke-Yue Zhang, Taiping Yao, Xuequan Lu, Shouhong Ding, Lizhuang Ma[pdf] [arXiv] 


Real-time 3D-aware Portrait Video RelightingZiqi Cai, Kaiwen Jiang, Shu-Yu Chen, Yu-Kun Lai, Hongbo Fu, Boxin Shi, Lin Gao[pdf] [supp] 


3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian SplattingZhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang[pdf] [supp] 


Style Aligned Image Generation via Shared AttentionAmir Hertz, Andrey Voynov, Shlomi Fruchter, Daniel Cohen-Or[pdf] [supp] [arXiv] 


Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D FeaturesThomas Wimmer, Peter Wonka, Maks Ovsjanikov[pdf] [supp] [arXiv] 


Neural Markov Random Field for Stereo MatchingTongfan Guan, Chen Wang, Yun-Hui Liu[pdf] [supp] [arXiv] 


PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk MinimizationYanlu Cai, Weizhong Zhang, Yuan Wu, Cheng Jin[pdf] [supp] 


CCEdit: Creative and Controllable Video Editing via Diffusion ModelsRuoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo[pdf] [supp] [arXiv] 


HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained ImagesXihe Yang, Xingyu Chen, Daiheng Gao, Shaohui Wang, Xiaoguang Han, Baoyuan Wang[pdf] [supp] 


DiffMorpher: Unleashing the Capability of Diffusion Models for Image MorphingKaiwen Zhang, Yifan Zhou, Xudong Xu, Bo Dai, Xingang Pan[pdf] [supp] [arXiv] 


Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment NetworkYong Shu, Liquan Shen, Xiangyu Hu, Mengyao Li, Zihao Zhou[pdf] [supp] [arXiv] 


Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table BlendshapesZiqian Bai, Feitong Tan, Sean Fanello, Rohit Pandey, Mingsong Dou, Shichen Liu, Ping Tan, Yinda Zhang[pdf] [supp] [arXiv] 


No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene SegmentationXiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao[pdf] [supp] [arXiv] 


PhysGaussian: Physics-Integrated 3D Gaussians for Generative DynamicsTianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, Chenfanfu Jiang[pdf] [supp] [arXiv] 


Spatio-Temporal Turbulence Mitigation: A Translational PerspectiveXingguang Zhang, Nicholas Chimitt, Yiheng Chi, Zhiyuan Mao, Stanley H. Chan[pdf] [supp] [arXiv] 


Grounded Text-to-Image Synthesis with Attention RefocusingQuynh Phung, Songwei Ge, Jia-Bin Huang[pdf] [supp] [arXiv] 


IReNe: Instant Recoloring of Neural Radiance FieldsAlessio Mazzucchelli, Adrian Garcia-Garcia, Elena Garces, Fernando Rivas-Manzaneque, Francesc Moreno-Noguer, Adrian Penate-Sanchez[pdf] [supp] 


Class Tokens Infusion for Weakly Supervised Semantic SegmentationSung-Hoon Yoon, Hoyong Kwon, Hyeonseong Kim, Kuk-Jin Yoon[pdf] [supp] 


FedHCA2: Towards Hetero-Client Federated Multi-Task LearningYuxiang Lu, Suizhi Huang, Yuwen Yang, Shalayiding Sirejiding, Yue Ding, Hongtao Lu[pdf] [supp] 


Motion Diversification NetworksHee Jae Kim, Eshed Ohn-Bar[pdf] 


Telling Left from Right: Identifying Geometry-Aware Semantic CorrespondenceJunyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang[pdf] [supp] [arXiv] 


PAIR Diffusion: A Comprehensive Multimodal Object-Level Image EditorVidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Xingqian Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi[pdf] [supp] 


TokenCompose: Text-to-Image Diffusion with Token-level SupervisionZirui Wang, Zhizhou Sha, Zheng Ding, Yilin Wang, Zhuowen Tu[pdf] [supp] 


FINER: Flexible Spectral-bias Tuning in Implicit NEural Representation by Variable-periodic Activation FunctionsZhen Liu, Hao Zhu, Qi Zhang, Jingde Fu, Weibing Deng, Zhan Ma, Yanwen Guo, Xun Cao[pdf] [supp] [arXiv] 


TextCraftor: Your Text Encoder Can be Image Quality ControllerYanyu Li, Xian Liu, Anil Kag, Ju Hu, Yerlan Idelbayev, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov, Jian Ren[pdf] [supp] [arXiv] 


IMPRINT: Generative Object Compositing by Learning Identity-Preserving RepresentationYizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga[pdf] [supp] [arXiv] 


Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic DataYu Deng, Duomin Wang, Xiaohang Ren, Xingyu Chen, Baoyuan Wang[pdf] [supp] 


ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture SynthesisMuhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie, Lucia Donatelli, Marc Habermann, Christian Theobalt[pdf] [supp] [arXiv] 


Boosting Neural Representations for Videos with a Conditional DecoderXinjie Zhang, Ren Yang, Dailan He, Xingtong Ge, Tongda Xu, Yan Wang, Hongwei Qin, Jun Zhang[pdf] [supp] [arXiv] 


From Audio to Photoreal Embodiment: Synthesizing Humans in ConversationsEvonne Ng, Javier Romero, Timur Bagautdinov, Shaojie Bai, Trevor Darrell, Angjoo Kanazawa, Alexander Richard[pdf] [supp] [arXiv] 


Single-View Scene Point Cloud Human Grasp GenerationYan-Kang Wang, Chengyi Xing, Yi-Lin Wei, Xiao-Ming Wu, Wei-Shi Zheng[pdf] [supp] [arXiv] 


One-step Diffusion with Distribution Matching DistillationTianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman, Taesung Park[pdf] [arXiv] 


Rethinking Human Motion Prediction with Symplectic IntegralHaipeng Chen, Kedi Lyu, Zhenguang Liu, Yifang Yin, Xun Yang, Yingda Lyu[pdf] 


CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality EnhancementQiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu[pdf] [arXiv] 


MicroCinema: A Divide-and-Conquer Approach for Text-to-Video GenerationYanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai, Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Xiaoyan Sun, Chong Luo, Baining Guo[pdf] [supp] [arXiv] 


Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image InpaintingHaipeng Liu, Yang Wang, Biao Qian, Meng Wang, Yong Rui[pdf] [supp] [arXiv] 


Makeup Prior Models for 3D Facial Makeup Estimation and ApplicationsXingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori[pdf] [supp] [arXiv] 


I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object InteractionsChengfeng Zhao, Juze Zhang, Jiashen Du, Ziwei Shan, Junye Wang, Jingyi Yu, Jingya Wang, Lan Xu[pdf] [supp] 


Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image ClassificationTingting Zheng, Kui Jiang, Hongxun Yao[pdf] [supp] [arXiv] 


LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR SynthesisZehan Zheng, Fan Lu, Weiyi Xue, Guang Chen, Changjun Jiang[pdf] [supp] [arXiv] 


Exploiting Diffusion Prior for Generalizable Dense PredictionHsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee, Ming-Hsuan Yang[pdf] [supp] [arXiv] 


Orthogonal Adaptation for Modular Customization of Diffusion ModelsRyan Po, Guandao Yang, Kfir Aberman, Gordon Wetzstein[pdf] [supp] [arXiv] 


Optimizing Diffusion Noise Can Serve As Universal Motion PriorsKorrawe Karunratanakul, Konpat Preechakul, Emre Aksan, Thabo Beeler, Supasorn Suwajanakorn, Siyu Tang[pdf] [supp] [arXiv] 


OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual RepresentationXiongwei Wu, Sicheng Yu, Ee-Peng Lim, Chong-Wah Ngo[pdf] [supp] [arXiv] 


XFeat: Accelerated Features for Lightweight Image MatchingGuilherme Potje, Felipe Cadar, André Araujo, Renato Martins, Erickson R. Nascimento[pdf] [supp] [arXiv] 


VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video StreamsLiao Wang, Kaixin Yao, Chengcheng Guo, Zhirui Zhang, Qiang Hu, Jingyi Yu, Lan Xu, Minye Wu[pdf] [supp] [arXiv] 


DPHMs: Diffusion Parametric Head Models for Depth-based TrackingJiapeng Tang, Angela Dai, Yinyu Nie, Lev Markhasin, Justus Thies, Matthias Nießner[pdf] [supp] [arXiv] 


DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and PerceptionYibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang[pdf] [supp] [arXiv] 


Perception-Oriented Video Frame Interpolation via Asymmetric BlendingGuangyang Wu, Xin Tao, Changlin Li, Wenyi Wang, Xiaohong Liu, Qingqing Zheng[pdf] [arXiv] 


DUDF: Differentiable Unsigned Distance Fields with Hyperbolic ScalingMiguel Fainstein, Viviana Siless, Emmanuel Iarussi[pdf] [supp] [arXiv] 


2S-UDF: A Novel Two-stage UDF Learning Method for Robust Non-watertight Model Reconstruction from Multi-view ImagesJunkai Deng, Fei Hou, Xuhui Chen, Wencheng Wang, Ying He[pdf] [supp] 


UniVS: Unified and Universal Video Segmentation with Prompts as QueriesMinghan Li, Shuai Li, Xindong Zhang, Lei Zhang[pdf] [supp] [arXiv] 


Efficiently Assemble Normalization Layers and Regularization for Federated Domain GeneralizationKhiem Le, Long Ho, Cuong Do, Danh Le-Phuoc, Kok-Seng Wong[pdf] [supp] [arXiv] 


Depth Information Assisted Collaborative Mutual Promotion Network for Single Image DehazingYafei Zhang, Shen Zhou, Huafeng Li[pdf] [supp] [arXiv] 


Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship DescriptorsZiqin Zhou, Hai-Ming Xu, Yangyang Shu, Lingqiao Liu[pdf] [supp] 


CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head GenerationXi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yan[pdf] [supp] [arXiv] 


Fun with Flags: Robust Principal Directions via Flag ManifoldsNathan Mankovich, Gustau Camps-Valls, Tolga Birdal[pdf] [supp] [arXiv] 


Generating Non-Stationary Textures using Self-RectificationYang Zhou, Rongjun Xiao, Dani Lischinski, Daniel Cohen-Or, Hui Huang[pdf] [supp] [arXiv] 


SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh DeformationYanzhe Liu, Rong Chen, Yushi Li, Yixi Li, Xuehou Tan[pdf] [supp] 


Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video SynthesisWilli Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov[pdf] [supp] [arXiv] 


JointSQ: Joint Sparsification-Quantization for Distributed LearningWeiying Xie, Haowei Li, Jitao Ma, Yunsong Li, Jie Lei, Donglai Liu, Leyuan Fang[pdf] [supp] 


A Unified Framework for Human-centric Point Cloud Video UnderstandingYiteng Xu, Kecheng Ye, Xiao Han, Yiming Ren, Xinge Zhu, Yuexin Ma[pdf] [supp] [arXiv] 


Shadow-Enlightened Image OutpaintingHang Yu, Ruilin Li, Shaorong Xie, Jiayan Qiu[pdf] [supp] 


BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body DynamicsWenqian Zhang, Molin Huang, Yuxuan Zhou, Juze Zhang, Jingyi Yu, Jingya Wang, Lan Xu[pdf] [supp] [arXiv] 


DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion ModelsYukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong[pdf] [supp] [arXiv] 


Bidirectional Autoregessive Diffusion Model for Dance GenerationCanyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang[pdf] 


FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video TranslationShuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy[pdf] [supp] [arXiv] 


SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian SplattingZhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang[pdf] [supp] [arXiv] 


MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable ShadingAbdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amin Fadaeinejad, Rafael M. O. Cruz, Marc-André Carbonneau[pdf] [supp] [arXiv] 


RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based LossesBedrettin Cetinkaya, Sinan Kalkan, Emre Akbas[pdf] [supp] [arXiv] 


DiffHuman: Probabilistic Photorealistic 3D Reconstruction of HumansAkash Sengupta, Thiemo Alldieck, Nikos Kolotouros, Enric Corona, Andrei Zanfir, Cristian Sminchisescu[pdf] [supp] [arXiv] 


Permutation Equivariance of Transformers and Its ApplicationsHengyuan Xu, Liyao Xiang, Hangyu Ye, Dixi Yao, Pengzhi Chu, Baochun Li[pdf] [supp] [arXiv] 


SVDTree: Semantic Voxel Diffusion for Single Image Tree ReconstructionYuan Li, Zhihao Liu, Bedrich Benes, Xiaopeng Zhang, Jianwei Guo[pdf] [supp] 


Rethinking FID: Towards a Better Evaluation Metric for Image GenerationSadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, Sanjiv Kumar[pdf] [supp] [arXiv] 


SuperPrimitive: Scene Reconstruction at a Primitive LevelKirill Mazur, Gwangbin Bae, Andrew J. Davison[pdf] [supp] [arXiv] 


TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion ModelsYushi Huang, Ruihao Gong, Jing Liu, Tianlong Chen, Xianglong Liu[pdf] [supp] 


CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion ModelsTuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag[pdf] [supp] [arXiv] 


Self-Supervised Facial Representation Learning with Facial Region AwarenessZheng Gao, Ioannis Patras[pdf] [supp] [arXiv] 


GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion ModelsTaoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang Wang[pdf] [supp] [arXiv] 


Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion ModelsPablo Marcos-Manchón, Roberto Alcover-Couso, Juan C. SanMiguel, José M. Martínez[pdf] [supp] 


DreamComposer: Controllable 3D Object Generation via Multi-View ConditionsYunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu[pdf] [supp] [arXiv] 


Self-Calibrating Vicinal Risk Minimisation for Model CalibrationJiawei Liu, Changkun Ye, Ruikai Cui, Nick Barnes[pdf] [supp] 


LPSNet: End-to-End Human Pose and Shape Estimation with Lensless ImagingHaoyang Ge, Qiao Feng, Hailong Jia, Xiongzheng Li, Xiangjun Yin, You Zhou, Jingyu Yang, Kun Li[pdf] [supp] [arXiv] 


Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face GenerationRenshuai Liu, Bowen Ma, Wei Zhang, Zhipeng Hu, Changjie Fan, Tangjie Lv, Yu Ding, Xuan Cheng[pdf] [supp] [arXiv] 


PEEKABOO: Interactive Video Generation via Masked-DiffusionYash Jain, Anshul Nasery, Vibhav Vineet, Harkirat Behl[pdf] [supp] [arXiv] 


High-fidelity Person-centric Subject-to-Image SynthesisYibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin[pdf] [supp] [arXiv] 


JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image GenerationYu Zeng, Vishal M. Patel, Haochen Wang, Xun Huang, Ting-Chun Wang, Ming-Yu Liu, Yogesh Balaji[pdf] [supp] 


HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point CloudWencan Cheng, Hao Tang, Luc Van Gool, Jong Hwan Ko[pdf] [arXiv] 


VP3D: Unleashing 2D Visual Prompt for Text-to-3D GenerationYang Chen, Yingwei Pan, Haibo Yang, Ting Yao, Tao Mei[pdf] [supp] [arXiv] 


Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground TruthZhaoyang Sun, Shengwu Xiong, Yaxiong Chen, Yi Rong[pdf] [supp] [arXiv] 


You Only Need Less Attention at Each Stage in Vision TransformersShuoxi Zhang, Hanpeng Liu, Stephen Lin, Kun He[pdf] 


Generalizable Novel-View Synthesis using a Stereo CameraHaechan Lee, Wonjoon Jin, Seung-Hwan Baek, Sunghyun Cho[pdf] [supp] [arXiv] 


Digital Life Project: Autonomous 3D Characters with Social IntelligenceZhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Liang Pan, Xiangyu Fan, Han Du, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu[pdf] [supp] [arXiv] 


Rethinking Prior Information Generation with CLIP for Few-Shot SegmentationJin Wang, Bingfeng Zhang, Jian Pang, Honglong Chen, Weifeng Liu[pdf] [supp] [arXiv] 


Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion ModelsShengqu Cai, Duygu Ceylan, Matheus Gadelha, Chun-Hao Paul Huang, Tuanfeng Yang Wang, Gordon Wetzstein[pdf] [supp] [arXiv] 


Relightable Gaussian Codec AvatarsShunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, Giljoo Nam[pdf] [supp] [arXiv] 


Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose EstimationRuicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato[pdf] [arXiv] 


Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character AnimationLi Hu[pdf] [supp] [arXiv] 


FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept CompositionGanggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen[pdf] [supp] [arXiv] 


MaskINT: Video Editing via Interpolative Non-autoregressive Masked TransformersHaoyu Ma, Shahin Mahdizadehaghdam, Bichen Wu, Zhipeng Fan, Yuchao Gu, Wenliang Zhao, Lior Shapira, Xiaohui Xie[pdf] [supp] [arXiv] 


Learning Multi-Dimensional Human Preference for Text-to-Image GenerationSixian Zhang, Bohan Wang, Junqiang Wu, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang[pdf] [supp] [arXiv] 


ViVid-1-to-3: Novel View Synthesis with Video Diffusion ModelsJeong-gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, Kwang Moo Yi[pdf] [supp] 


Generating Human Motion in 3D Scenes from Text DescriptionsZhi Cen, Huaijin Pi, Sida Peng, Zehong Shen, Minghui Yang, Shuai Zhu, Hujun Bao, Xiaowei Zhou[pdf] [arXiv] 


QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic DecompositionXiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj[pdf] [supp] [arXiv] 


Fast Adaptation for Human Pose Estimation via Meta-OptimizationShengxiang Hu, Huaijiang Sun, Bin Li, Dong Wei, Weiqing Li, Jianfeng Lu[pdf] [supp] 


WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion ModelsChanghoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, Yezhou Yang[pdf] [supp] [arXiv] 


Text-Conditioned Generative Model of 3D Strand-based Human HairstylesVanessa Sklyarova, Egor Zakharov, Otmar Hilliges, Michael J. Black, Justus Thies[pdf] [supp] [arXiv] 


Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context LearningXinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Chen Chen, Mengyuan Liu[pdf] [supp] 


DemoFusion: Democratising High-Resolution Image Generation With No $$$Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma[pdf] [supp] [arXiv] 


Total Selfie: Generating Full-Body SelfiesBowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz[pdf] [supp] [arXiv] 


Learning Structure-from-Motion with Graph Attention NetworksLucas Brynte, José Pedro Iglesias, Carl Olsson, Fredrik Kahl[pdf] [supp] [arXiv] 


Geometry Transfer for Stylizing Radiance FieldsHyunyoung Jung, Seonghyeon Nam, Nikolaos Sarafianos, Sungjoo Yoo, Alexander Sorkine-Hornung, Rakesh Ranjan[pdf] [supp] [arXiv] 


Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB CamerasAshwath Shetty, Marc Habermann, Guoxing Sun, Diogo Luvizon, Vladislav Golyanik, Christian Theobalt[pdf] [supp] [arXiv] 


SEAS: ShapE-Aligned Supervision for Person Re-IdentificationHaidong Zhu, Pranav Budhwant, Zhaoheng Zheng, Ram Nevatia[pdf] [supp] 


Making Vision Transformers Truly Shift-EquivariantRenan A. Rojas-Gomez, Teck-Yian Lim, Minh N. Do, Raymond A. Yeh[pdf] [supp] [arXiv] 


SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike StreamLin Zhu, Kangmin Jia, Yifan Zhao, Yunshan Qi, Lizhi Wang, Hua Huang[pdf] [supp] [arXiv] 


A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness ConstraintXiaofeng Cong, Jie Gui, Jing Zhang, Junming Hou, Hao Shen[pdf] [supp] [arXiv] 


Deep Equilibrium Diffusion Restoration with Parallel SamplingJiezhang Cao, Yue Shi, Kai Zhang, Yulun Zhang, Radu Timofte, Luc Van Gool[pdf] [supp] [arXiv] 


Gaussian Shell Maps for Efficient 3D Human GenerationRameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-Yan Yeung, Gordon Wetzstein[pdf] [supp] [arXiv] 


MoST: Motion Style Transformer Between Diverse Action ContentsBoeun Kim, Jungho Kim, Hyung Jin Chang, Jin Young Choi[pdf] [supp] [arXiv] 


Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion ModelsShweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal[pdf] [supp] [arXiv] 


Unmixing Before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image SynthesisYang Yu, Erting Pan, Xinya Wang, Yuheng Wu, Xiaoguang Mei, Jiayi Ma[pdf] [supp] 


CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image GenerationKangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar[pdf] [supp] [arXiv] 


X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion ModelLingmin Ran, Xiaodong Cun, Jia-Wei Liu, Rui Zhao, Song Zijie, Xintao Wang, Jussi Keppo, Mike Zheng Shou[pdf] [supp] 


CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD ProgramsHaocheng Yuan, Jing Xu, Hao Pan, Adrien Bousseau, Niloy J. Mitra, Changjian Li[pdf] [supp] [arXiv] 


Inversion-Free Image Editing with Language-Guided Diffusion ModelsSihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma, Joyce Chai[pdf] [supp] 


HumMUSS: Human Motion Understanding using State Space ModelsArnab Mondal, Stefano Alletto, Denis Tome[pdf] [supp] [arXiv] 


Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic PropagationHaofeng Liu, Chenshu Xu, Yifei Yang, Lihua Zeng, Shengfeng He[pdf] [supp] [arXiv] 


ContextSeg: Sketch Semantic Segmentation by Querying the Context with AttentionJiawei Wang, Changjian Li[pdf] [supp] [arXiv] 


Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower ResolutionsSaeed Khorram, Mingqi Jiang, Mohamad Shahbazi, Mohamad H. Danesh, Li Fuxin[pdf] [supp] [arXiv] 


VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point CorrespondenceYuchao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, Kevin Tang[pdf] [supp] [arXiv] 


Hierarchical Histogram Threshold Segmentation - Auto-terminating High-detail OversegmentationThomas V. Chang, Simon Seibt, Bartosz von Rymon Lipinski[pdf] [supp] 


Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer CompressionHancheng Ye, Chong Yu, Peng Ye, Renqiu Xia, Yansong Tang, Jiwen Lu, Tao Chen, Bo Zhang[pdf] [supp] [arXiv] 


As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion PriorsSeungwoo Yoo, Kunho Kim, Vladimir G. Kim, Minhyuk Sung[pdf] [supp] 


ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt TuningBeomyoung Kim, Joonsang Yu, Sung Ju Hwang[pdf] [supp] [arXiv] 


MaGGIe: Masked Guided Gradual Human Instance MattingChuong Huynh, Seoung Wug Oh, Abhinav Shrivastava, Joon-Young Lee[pdf] [supp] [arXiv] 


Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic SegmentationXiaoyang Wang, Huihui Bai, Limin Yu, Yao Zhao, Jimin Xiao[pdf] [arXiv] 


RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose EstimationPeng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, Wenming Yang[pdf] [arXiv] 


WaveFace: Authentic Face Restoration with Efficient Frequency RecoveryYunqi Miao, Jiankang Deng, Jungong Han[pdf] [supp] [arXiv] 


UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided TexturesMingyuan Zhou, Rakib Hyder, Ziwei Xuan, Guojun Qi[pdf] [supp] [arXiv] 


Attention-Propagation Network for Egocentric Heatmap to 3D Pose LiftingTaeho Kang, Youngki Lee[pdf] [supp] [arXiv] 


OmniMotionGPT: Animal Motion Generation with Limited DataZhangsihao Yang, Mingyuan Zhou, Mengyi Shan, Bingbing Wen, Ziwei Xuan, Mitch Hill, Junjie Bai, Guo-Jun Qi, Yalin Wang[pdf] [supp] [arXiv] 


InstanceDiffusion: Instance-level Control for Image GenerationXudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra[pdf] [supp] [arXiv] 


Unifying Top-down and Bottom-up Scanpath Prediction Using TransformersZhibo Yang, Sounak Mondal, Seoyoung Ahn, Ruoyu Xue, Gregory Zelinsky, Minh Hoai, Dimitris Samaras[pdf] [supp] [arXiv] 


3D-Aware Face Editing via Warping-Guided Latent Direction LearningYuhao Cheng, Zhuo Chen, Xingyu Ren, Wenhan Zhu, Zhengqin Xu, Di Xu, Changpeng Yang, Yichao Yan[pdf] 


CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic SegmentationSeokju Cho, Heeseong Shin, Sunghwan Hong, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim[pdf] [supp] 


Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention ModulationQin Guo, Tianwei Lin[pdf] [supp] [arXiv] 


AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and BeyondZixiang Zhou, Yu Wan, Baoyuan Wang[pdf] [supp] [arXiv] 


Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion ModelXu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu[pdf] [supp] [arXiv] 


CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-ResolutionQingguo Liu, Chenyi Zhuang, Pan Gao, Jie Qin[pdf] [supp] [arXiv] 


HumanRef: Single Image to 3D Human Generation via Reference-Guided DiffusionJingbo Zhang, Xiaoyu Li, Qi Zhang, Yanpei Cao, Ying Shan, Jing Liao[pdf] [supp] [arXiv] 


Rethinking Interactive Image Segmentation with Low Latency High Quality and Diverse PromptsQin Liu, Jaemin Cho, Mohit Bansal, Marc Niethammer[pdf] [supp] [arXiv] 


DITTO: Dual and Integrated Latent Topologies for Implicit 3D ReconstructionJaehyeok Shim, Kyungdon Joo[pdf] [supp] [arXiv] 


HIT: Estimating Internal Human Implicit Tissues from the Body SurfaceMarilyn Keller, Vaibhav Arora, Abdelmouttaleb Dakri, Shivam Chandhok, Jürgen Machann, Andreas Fritsche, Michael J. Black, Sergi Pujades[pdf] [supp] 


DanceCamera3D: 3D Camera Movement Synthesis with Music and DanceZixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo[pdf] [supp] [arXiv] 


Cross Initialization for Face Personalization of Text-to-Image ModelsLianyu Pang, Jian Yin, Haoran Xie, Qiping Wang, Qing Li, Xudong Mao[pdf] [supp] 


LEDITS++: Limitless Image Editing using Text-to-Image ModelsManuel Brack, Felix Friedrich, Katharia Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, Apolinario Passos[pdf] [supp] 


Video Interpolation with Diffusion ModelsSiddhant Jain, Daniel Watson, Eric Tabellion, Aleksander Ho?ynski, Ben Poole, Janne Kontkanen[pdf] [supp] [arXiv] 


Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression ManipulationTianshui Chen, Jianman Lin, Zhijing Yang, Chunmei Qing, Liang Lin[pdf] [supp] 


WHAM: Reconstructing World-grounded Humans with Accurate 3D MotionSoyong Shin, Juyong Kim, Eni Halilaj, Michael J. Black[pdf] [supp] [arXiv] 


DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video GenerationChenyang Wang, Zerong Zheng, Tao Yu, Xiaoqian Lv, Bineng Zhong, Shengping Zhang, Liqiang Nie[pdf] [supp] 


Category-Level Multi-Part Multi-Joint 3D Shape AssemblyYichen Li, Kaichun Mo, Yueqi Duan, He Wang, Jiequan Zhang, Lin Shao[pdf] [supp] [arXiv] 


One-Shot Open Affordance Learning with Foundation ModelsGen Li, Deqing Sun, Laura Sevilla-Lara, Varun Jampani[pdf] [supp] [arXiv] 


Don't Look into the Dark: Latent Codes for Pluralistic Image InpaintingHaiwei Chen, Yajie Zhao[pdf] [supp] 


DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image EditingChong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang[pdf] [supp] [arXiv] 


InstructVideo: Instructing Video Diffusion Models with Human FeedbackHangjie Yuan, Shiwei Zhang, Xiang Wang, Yujie Wei, Tao Feng, Yining Pan, Yingya Zhang, Ziwei Liu, Samuel Albanie, Dong Ni[pdf] [supp] [arXiv] 


On the Content Bias in Frechet Video DistanceSongwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, Jia-Bin Huang[pdf] [supp] [arXiv] 


Image Neural Field Diffusion ModelsYinbo Chen, Oliver Wang, Richard Zhang, Eli Shechtman, Xiaolong Wang, Michael Gharbi[pdf] [supp] 


Discriminative Probing and Tuning for Text-to-Image GenerationLeigang Qu, Wenjie Wang, Yongqi Li, Hanwang Zhang, Liqiang Nie, Tat-Seng Chua[pdf] [arXiv] 


Towards More Accurate Diffusion Model Acceleration with A Timestep TunerMengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Deli Zhao, Ran Yi, Wenping Wang, Yong-Jin Liu[pdf] [supp] 


Rethinking Generalizable Face Anti-spoofing via Hierarchical Prototype-guided Distribution Refinement in Hyperbolic SpaceChengyang Hu, Ke-Yue Zhang, Taiping Yao, Shouhong Ding, Lizhuang Ma[pdf] [supp] 


GenesisTex: Adapting Image Denoising Diffusion to Texture SpaceChenjian Gao, Boyan Jiang, Xinghui Li, Yingpeng Zhang, Qian Yu[pdf] [supp] [arXiv] 


Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic SegmentationYuan Wang, Rui Sun, Naisong Luo, Yuwen Pan, Tianzhu Zhang[pdf] [supp] [arXiv] 


BigGait: Learning Gait Representation You Want by Large Vision ModelsDingqiang Ye, Chao Fan, Jingzhe Ma, Xiaoming Liu, Shiqi Yu[pdf] [supp] [arXiv] 


HOIST-Former: Hand-held Objects Identification Segmentation and Tracking in the WildSupreeth Narasimhaswamy, Huy Anh Nguyen, Lihan Huang, Minh Hoai[pdf] 


Contextrast: Contextual Contrastive Learning for Semantic SegmentationChangki Sung, Wanhee Kim, Jungho An, Wooju Lee, Hyungtae Lim, Hyun Myung[pdf] [supp] [arXiv] 


AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit DisentanglementShiwei Jin, Zhen Wang, Lei Wang, Peng Liu, Ning Bi, Truong Nguyen[pdf] [supp] [arXiv] 


BodyMAP - Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in BedAbhishek Tandon, Anujraaj Goyal, Henry M. Clever, Zackory Erickson[pdf] [supp] 


KPConvX: Modernizing Kernel Point Convolution with Kernel AttentionHugues Thomas, Yao-Hung Hubert Tsai, Timothy D. Barfoot, Jian Zhang[pdf] [supp] [arXiv] 


Clockwork Diffusion: Efficient Generation With Model-Step DistillationAmirhossein Habibian, Amir Ghodrati, Noor Fathima, Guillaume Sautiere, Risheek Garrepalli, Fatih Porikli, Jens Petersen[pdf] [supp] [arXiv] 


Pick-or-Mix: Dynamic Channel Sampling for ConvNetsAshish Kumar, Daneul Kim, Jaesik Park, Laxmidhar Behera[pdf] [supp] 


DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular VideoHuiqiang Sun, Xingyi Li, Liao Shen, Xinyi Ye, Ke Xian, Zhiguo Cao[pdf] [supp] [arXiv] 


AAMDM: Accelerated Auto-regressive Motion Diffusion ModelTianyu Li, Calvin Qiao, Guanqiao Ren, KangKang Yin, Sehoon Ha[pdf] [arXiv] 


Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image EditingBingyan Liu, Chengyu Wang, Tingfeng Cao, Kui Jia, Jun Huang[pdf] [supp] [arXiv] 


DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative DataChengxiang Fan, Muzhi Zhu, Hao Chen, Yang Liu, Weijia Wu, Huaqi Zhang, Chunhua Shen[pdf] [supp] [arXiv] 


Learning Disentangled Identifiers for Action-Customized Text-to-Image GenerationSiteng Huang, Biao Gong, Yutong Feng, Xi Chen, Yuqian Fu, Yu Liu, Donglin Wang[pdf] [supp] [arXiv] 


Automatic Controllable Colorization via ImaginationXiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei[pdf] [arXiv] 


EMOPortraits: Emotion-enhanced Multimodal One-shot Head AvatarsNikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas, Zoe Landgraf, Stavros Petridis, Maja Pantic[pdf] [supp] [arXiv] 


Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask GuidancePhuc Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis, Chuang Gan, Anh Tran, Cuong Pham, Khoi Nguyen[pdf] [supp] [arXiv] 


ControlRoom3D: Room Generation using Semantic Proxy RoomsJonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou[pdf] [supp] 


UniPTS: A Unified Framework for Proficient Post-Training SparsityJingjing Xie, Yuxin Zhang, Mingbao Lin, Liujuan Cao, Rongrong Ji[pdf] [arXiv] 


HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human GenerationXin Huang, Ruizhi Shao, Qi Zhang, Hongwen Zhang, Ying Feng, Yebin Liu, Qing Wang[pdf] [supp] [arXiv] 


Cross-view and Cross-pose Completion for 3D Human UnderstandingMatthieu Armando, Salma Galaaoui, Fabien Baradel, Thomas Lucas, Vincent Leroy, Romain Brégier, Philippe Weinzaepfel, Grégory Rogez[pdf] [supp] [arXiv] 


Efficient Scene Recovery Using Luminous Flux PriorZhongyu Li, Lei Zhang[pdf] [supp] 


Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative TrainingRunze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu[pdf] [supp] [arXiv] 


Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical RepresentationSangyun Shin, Kaichen Zhou, Madhu Vankadari, Andrew Markham, Niki Trigoni[pdf] [supp] [arXiv] 


FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-pose and Facial Expression FeaturesAndre Rochow, Max Schwarz, Sven Behnke[pdf] [supp] [arXiv] 


TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud AnalysisPavlo Melnyk, Andreas Robinson, Michael Felsberg, Mårten Wadenbäck[pdf] [supp] 


WANDR: Intention-guided Human Motion GenerationMarkos Diomataris, Nikos Athanasiou, Omid Taheri, Xi Wang, Otmar Hilliges, Michael J. Black[pdf] [supp] [arXiv] 


GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D UnderstandingChengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao, Jiaya Jia[pdf] [supp] [arXiv] 


Privacy-Preserving Face Recognition Using Trainable Feature SubtractionYuxi Mi, Zhizhou Zhong, Yuge Huang, Jiazhen Ji, Jianqing Xu, Jun Wang, Shaoming Wang, Shouhong Ding, Shuigeng Zhou[pdf] [supp] [arXiv] 


Learning Visual Prompt for Gait RecognitionKang Ma, Ying Fu, Chunshui Cao, Saihui Hou, Yongzhen Huang, Dezhi Zheng[pdf] 


SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic ScenesYi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi[pdf] [supp] 


Tri-Modal Motion Retrieval by Learning a Joint Embedding SpaceKangning Yin, Shihao Zou, Yuxuan Ge, Zheng Tian[pdf] [supp] [arXiv] 


Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance FieldsTianqi Liu, Xinyi Ye, Min Shi, Zihao Huang, Zhiyu Pan, Zhan Peng, Zhiguo Cao[pdf] [supp] [arXiv] 


VideoBooth: Diffusion-based Video Generation with Image PromptsYuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu[pdf] [supp] [arXiv] 


SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human MeshesSoubhik Sanyal, Partha Ghosh, Jinlong Yang, Michael J. Black, Justus Thies, Timo Bolkart[pdf] [supp] [arXiv] 


EasyDrag: Efficient Point-based Manipulation on Diffusion ModelsXingzhong Hou, Boxiao Liu, Yi Zhang, Jihao Liu, Yu Liu, Haihang You[pdf] [supp] 


InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse DiffusionJihyun Lee, Shunsuke Saito, Giljoo Nam, Minhyuk Sung, Tae-Kyun Kim[pdf] [supp] [arXiv] 


Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single VideoHongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang[pdf] [supp] [arXiv] 


Tackling the Singularities at the Endpoints of Time Intervals in Diffusion ModelsPengze Zhang, Hubery Yin, Chen Li, Xiaohua Xie[pdf] [supp] [arXiv] 


CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned NormalizationYao Ni, Piotr Koniusz[pdf] [supp] [arXiv] 


High-Quality Facial Geometry and Appearance Capture at HomeYuxuan Han, Junfeng Lyu, Feng Xu[pdf] [supp] [arXiv] 


Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and FusionSofia Casarin, Cynthia I. Ugwu, Sergio Escalera, Oswald Lanz[pdf] [supp] [arXiv] 


SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural NetworksXinyu Shi, Zecheng Hao, Zhaofei Yu[pdf] [supp] [arXiv] 


Self-Supervised Dual ContouringRamana Sundararaman, Roman Klokov, Maks Ovsjanikov[pdf] [supp] [arXiv] 


GSVA: Generalized Segmentation via Multimodal Large Language ModelsZhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang[pdf] [supp] [arXiv] 


AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-ResolutionCheeun Hong, Kyoung Mu Lee[pdf] [supp] [arXiv] 


SVGDreamer: Text Guided SVG Generation with Diffusion ModelXiming Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, Qian Yu[pdf] [supp] [arXiv] 


BlockGCN: Redefine Topology Awareness for Skeleton-Based Action RecognitionYuxuan Zhou, Xudong Yan, Zhi-Qi Cheng, Yan Yan, Qi Dai, Xian-Sheng Hua[pdf] [supp] 


Structure-Guided Adversarial Training of Diffusion ModelsLing Yang, Haotian Qian, Zhilong Zhang, Jingwei Liu, Bin Cui[pdf] [supp] [arXiv] 


NIFTY: Neural Object Interaction Fields for Guided Human Motion SynthesisNilesh Kulkarni, Davis Rempe, Kyle Genova, Abhijit Kundu, Justin Johnson, David Fouhey, Leonidas Guibas[pdf] [supp] [arXiv] 


Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory PredictionInhwan Bae, Junoh Lee, Hae-Gon Jeon[pdf] [arXiv] 


Building Optimal Neural Architectures using Interpretable KnowledgeKeith G. Mills, Fred X. Han, Mohammad Salameh, Shengyao Lu, Chunhua Zhou, Jiao He, Fengyu Sun, Di Niu[pdf] [supp] [arXiv] 


Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single ImageYiqun Mei, Yu Zeng, He Zhang, Zhixin Shu, Xuaner Zhang, Sai Bi, Jianming Zhang, HyunJoon Jung, Vishal M. Patel[pdf] [supp] 


Noisy One-point Homographies are Surprisingly GoodYaqing Ding, Jonathan Astermark, Magnus Oskarsson, Viktor Larsson[pdf] [supp] 


Panacea: Panoramic and Controllable Video Generation for Autonomous DrivingYuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang[pdf] [supp] [arXiv] 


DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image PersonalizationJisu Nam, Heesu Kim, DongJae Lee, Siyoon Jin, Seungryong Kim, Seunggyu Chang[pdf] [supp] [arXiv] 


PolarMatte: Fully Computational Ground-Truth-Quality Alpha Matte Extraction for Images and Video using Polarized Screen MattingKenji Enomoto, TJ Rhodes, Brian Price, Gavin Miller[pdf] [supp] 


HOIDiffusion: Generating Realistic 3D Hand-Object Interaction DataMengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang[pdf] [supp] [arXiv] 


VecFusion: Vector Font Generation with DiffusionVikas Thamizharasan, Difan Liu, Shantanu Agarwal, Matthew Fisher, Michael Gharbi, Oliver Wang, Alec Jacobson, Evangelos Kalogerakis[pdf] [supp] [arXiv] 


Towards Text-guided 3D Scene CompositionQihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee[pdf] [supp] [arXiv] 


EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture ModelingHaiyang Liu, Zihao Zhu, Giorgio Becherini, Yichen Peng, Mingyang Su, You Zhou, Xuefei Zhe, Naoya Iwamoto, Bo Zheng, Michael J. Black[pdf] [supp] [arXiv] 


Adversarial Text to Continuous Image GenerationKilichbek Haydarov, Aashiq Muhamed, Xiaoqian Shen, Jovana Lazarevic, Ivan Skorokhodov, Chamuditha Jayanga Galappaththige, Mohamed Elhoseiny[pdf] [supp] 


HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse PosesCaoyuan Ma, Yu-Lun Liu, Zhixiang Wang, Wu Liu, Xinchen Liu, Zheng Wang[pdf] [supp] 


HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from VideoZicong Fan, Maria Parelli, Maria Eleni Kadoglou, Xu Chen, Muhammed Kocabas, Michael J. Black, Otmar Hilliges[pdf] [supp] [arXiv] 


Continual Segmentation with Disentangled Objectness Learning and Class RecognitionYizheng Gong, Siyue Yu, Xiaoyang Wang, Jimin Xiao[pdf] [supp] [arXiv] 


ASAM: Boosting Segment Anything Model with Adversarial TuningBo Li, Haoke Xiao, Lv Tang[pdf] [supp] [arXiv] 


Dynamic Support Information Mining for Category-Agnostic Pose EstimationPengfei Ren, Yuanyuan Gao, Haifeng Sun, Qi Qi, Jingyu Wang, Jianxin Liao[pdf] [supp] 


Taming Mode Collapse in Score Distillation for Text-to-3D GenerationPeihao Wang, Dejia Xu, Zhiwen Fan, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra[pdf] [supp] [arXiv] 


MagicAnimate: Temporally Consistent Human Image Animation using Diffusion ModelZhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou[pdf] [supp] [arXiv] 


From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without DisambiguationJavier Tirado-Garín, Javier Civera[pdf] [supp] 


Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear JacketChengxu Zuo, Yiming Wang, Lishuang Zhan, Shihui Guo, Xinyu Yi, Feng Xu, Yipeng Qin[pdf] [supp] 


Training-Free Pretrained Model MergingZhengqi Xu, Ke Yuan, Huiqiong Wang, Yong Wang, Mingli Song, Jie Song[pdf] [supp] [arXiv] 


NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal CompensationZiyi Chen, Xiaolong Wu, Yu Zhang[pdf] [supp] 


Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image EditingChangHee Yang, ChanHee Kang, Kyeongbo Kong, Hanni Oh, Suk-Ju Kang[pdf] [supp] 


ChatPose: Chatting about 3D Human PoseYao Feng, Jing Lin, Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Michael J. Black[pdf] [supp] [arXiv] 


Distilling ODE Solvers of Diffusion Models into Smaller StepsSanghwan Kim, Hao Tang, Fisher Yu[pdf] [supp] [arXiv] 


LightIt: Illumination Modeling and Control for Diffusion ModelsPeter Kocsis, Julien Philip, Kalyan Sunkavalli, Matthias Nießner, Yannick Hold-Geoffroy[pdf] [supp] [arXiv] 


Neural LineageRunpeng Yu, Xinchao Wang[pdf] [supp] 


Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout GenerationMohammad Amin Shabani, Zhaowen Wang, Difan Liu, Nanxuan Zhao, Jimei Yang, Yasutaka Furukawa[pdf] [supp] 


3D Multi-frame Fusion for Video StabilizationZhan Peng, Xinyi Ye, Weiyue Zhao, Tianqi Liu, Huiqiang Sun, Baopu Li, Zhiguo Cao[pdf] [supp] [arXiv] 


Local-consistent Transformation Learning for Rotation-invariant Point Cloud AnalysisYiyang Chen, Lunhao Duan, Shanshan Zhao, Changxing Ding, Dacheng Tao[pdf] [supp] [arXiv] 


Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt RewritingZijie Chen, Lichao Zhang, Fangsheng Weng, Lili Pan, Zhenzhong Lan[pdf] [supp] [arXiv] 


Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision ApplicationsYuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai[pdf] [supp] [arXiv] 


CoDe: An Explicit Content Decoupling Framework for Image RestorationEnxuan Gu, Hongwei Ge, Yong Guo[pdf] [supp] 


DreamVideo: Composing Your Dream Videos with Customized Subject and MotionYujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan[pdf] [supp] [arXiv] 


Using Human Feedback to Fine-tune Diffusion Models without Any Reward ModelKai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Weihan Shen, Xiaolong Zhu, Xiu Li[pdf] [supp] [arXiv] 


SynSP: Synergy of Smoothness and Precision in Pose Sequences RefinementTao Wang, Lei Jin, Zheng Wang, Jianshu Li, Liang Li, Fang Zhao, Yu Cheng, Li Yuan, Li Zhou, Junliang Xing, Jian Zhao[pdf] [supp] 


Learned Representation-Guided Diffusion Models for Large-Image GenerationAlexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, Dimitris Samaras[pdf] [supp] [arXiv] 


Ranni: Taming Text-to-Image Diffusion for Accurate Instruction FollowingYutong Feng, Biao Gong, Di Chen, Yujun Shen, Yu Liu, Jingren Zhou[pdf] [supp] [arXiv] 


Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D DiffusionYuanxun Lu, Jingyang Zhang, Shiwei Li, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan, Xun Cao, Yao Yao[pdf] [supp] 


MatFuse: Controllable Material Generation with Diffusion ModelsGiuseppe Vecchio, Renato Sortino, Simone Palazzo, Concetto Spampinato[pdf] [supp] [arXiv] 


Training Vision Transformers for Semi-Supervised Semantic SegmentationXinting Hu, Li Jiang, Bernt Schiele[pdf] [supp] 


Quantifying Task Priority for Multi-Task OptimizationWooseong Jeong, Kuk-Jin Yoon[pdf] [supp] 


On the Scalability of Diffusion-based Text-to-Image GenerationHao Li, Yang Zou, Ying Wang, Orchid Majumder, Yusheng Xie, R. Manmatha, Ashwin Swaminathan, Zhuowen Tu, Stefano Ermon, Stefano Soatto[pdf] [supp] [arXiv] 


AnySkill: Learning Open-Vocabulary Physical Skill for Interactive AgentsJieming Cui, Tengyu Liu, Nian Liu, Yaodong Yang, Yixin Zhu, Siyuan Huang[pdf] [supp] [arXiv] 


Generative Unlearning for Any IdentityJuwon Seo, Sung-Hoon Lee, Tae-Young Lee, Seungjun Moon, Gyeong-Moon Park[pdf] [supp] [arXiv] 


FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video SynthesisFeng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana Marculescu[pdf] [supp] [arXiv] 


StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGANJongwoo Choi, Kwanggyoon Seo, Amirsaman Ashtari, Junyong Noh[pdf] [supp] [arXiv] 


Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated SynthesisAtefeh Khoshkhahtinat, Ali Zafari, Piyush M. Mehta, Nasser M. Nasrabadi[pdf] [supp] [arXiv] 


RMT: Retentive Networks Meet Vision TransformersQihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu, Ran He[pdf] [supp] [arXiv] 


Multimodal Pathway: Improve Transformers with Irrelevant Data from Other ModalitiesYiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue[pdf] [arXiv] 


FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled AudioChao Xu, Yang Liu, Jiazheng Xing, Weida Wang, Mingze Sun, Jun Dan, Tianxin Huang, Siyuan Li, Zhi-Qi Cheng, Ying Tai, Baigui Sun[pdf] [supp] 


SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven GenerationYuxuan Zhang, Yiren Song, Jiaming Liu, Rui Wang, Jinpeng Yu, Hao Tang, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing[pdf] [supp] 


MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion PriorHonghua Chen, Chen Change Loy, Xingang Pan[pdf] [supp] 


StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image TranslationSidi Wu, Yizi Chen, Samuel Mermet, Lorenz Hurni, Konrad Schindler, Nicolas Gonthier, Loic Landrieu[pdf] [supp] [arXiv] 


M&M VTO: Multi-Garment Virtual Try-On and EditingLuyang Zhu, Yingwei Li, Nan Liu, Hao Peng, Dawei Yang, Ira Kemelmacher-Shlizerman[pdf] [supp] 


Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial SensorsYu Zhang, Songpengcheng Xia, Lei Chu, Jiarui Yang, Qi Wu, Ling Pei[pdf] [supp] [arXiv] 


GraCo: Granularity-Controllable Interactive SegmentationYian Zhao, Kehan Li, Zesen Cheng, Pengchong Qiao, Xiawu Zheng, Rongrong Ji, Chang Liu, Li Yuan, Jie Chen[pdf] [supp] [arXiv] 


G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp SynthesisYufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani[pdf] [supp] 


Contrastive Denoising Score for Text-guided Latent Diffusion Image EditingHyelin Nam, Gihyun Kwon, Geon Yeong Park, Jong Chul Ye[pdf] [supp] [arXiv] 


Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance GenerationPhilipp Schröppel, Christopher Wewer, Jan Eric Lenssen, Eddy Ilg, Thomas Brox[pdf] [supp] 


VAREN: Very Accurate and Realistic Equine NetworkSilvia Zuffi, Ylva Mellbin, Ci Li, Markus Hoeschle, Hedvig Kjellström, Senya Polikovsky, Elin Hernlund, Michael J. Black[pdf] [supp] 


SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion TransformerRui Zhu, Yingwei Pan, Yehao Li, Ting Yao, Zhenglong Sun, Tao Mei, Chang Wen Chen[pdf] [supp] 


MedBN: Robust Test-Time Adaptation against Malicious Test SamplesHyejin Park, Jeongyeon Hwang, Sunung Mun, Sangdon Park, Jungseul Ok[pdf] [supp] [arXiv] 


Unsupervised Gaze Representation Learning from Multi-view Face ImagesYiwei Bao, Feng Lu[pdf] 


AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction ErrorJonas Ricker, Denis Lukovnikov, Asja Fischer[pdf] [supp] [arXiv] 


Point2CAD: Reverse Engineering CAD Models from 3D Point CloudsYujia Liu, Anton Obukhov, Jan Dirk Wegner, Konrad Schindler[pdf] [supp] [arXiv] 


LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language ModelDongkai Wang, Shiyu Xuan, Shiliang Zhang[pdf] [supp] 


MMA-Diffusion: MultiModal Attack on Diffusion ModelsYijun Yang, Ruiyuan Gao, Xiaosen Wang, Tsung-Yi Ho, Nan Xu, Qiang Xu[pdf] [supp] 


HanDiffuser: Text-to-Image Generation With Realistic Hand AppearancesSupreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen, Ishita Dasgupta, Saayan Mitra, Minh Hoai[pdf] [supp] [arXiv] 


Hierarchical Patch Diffusion Models for High-Resolution Video GenerationIvan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov[pdf] [supp] 


Neural Implicit Morphing of Face ImagesGuilherme Schardong, Tiago Novello, Hallison Paz, Iurii Medvedev, Vinícius da Silva, Luiz Velho, Nuno Gonçalves[pdf] [supp] [arXiv] 


UniGS: Unified Representation for Image Generation and SegmentationLu Qi, Lehan Yang, Weidong Guo, Yu Xu, Bo Du, Varun Jampani, Ming-Hsuan Yang[pdf] [supp] [arXiv] 


Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype GenerationLuca Barsellotti, Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara[pdf] [supp] [arXiv] 


HUGS: Human Gaussian SplatsMuhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan[pdf] [arXiv] 


PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular VideosYufei Zhang, Jeffrey O. Kephart, Zijun Cui, Qiang Ji[pdf] [supp] [arXiv] 


EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion PriorsZhipeng Hu, Minda Zhao, Chaoyi Zhao, Xinyue Liang, Lincheng Li, Zeng Zhao, Changjie Fan, Xiaowei Zhou, Xin Yu[pdf] [supp] [arXiv] 


HOIAnimator: Generating Text-prompt Human-object Animations using Novel Perceptive Diffusion ModelsWenfeng Song, Xinyu Zhang, Shuai Li, Yang Gao, Aimin Hao, Xia Hou, Chenglizhao Chen, Ning Li, Hong Qin[pdf] [supp] 


SyncTalk: The Devil is in the Synchronization for Talking Head SynthesisZiqiao Peng, Wentao Hu, Yue Shi, Xiangyu Zhu, Xiaomei Zhang, Hao Zhao, Jun He, Hongyan Liu, Zhaoxin Fan[pdf] [arXiv] 


DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face GenerationHaonan Lin[pdf] [supp] [arXiv] 


Neural Super-Resolution for Real-time Rendering with Radiance DemodulationJia Li, Ziling Chen, Xiaolong Wu, Lu Wang, Beibei Wang, Lei Zhang[pdf] [supp] [arXiv] 


MMM: Generative Masked Motion ModelEkkasit Pinyoanuntapong, Pu Wang, Minwoo Lee, Chen Chen[pdf] [supp] [arXiv] 


PEGASUS: Personalized Generative 3D Avatars with Composable AttributesHyunsoo Cha, Byungjun Kim, Hanbyul Joo[pdf] [supp] [arXiv] 


Diff-Plugin: Revitalizing Details for Diffusion-based Low-level TasksYuhao Liu, Zhanghan Ke, Fang Liu, Nanxuan Zhao, Rynson W.H. Lau[pdf] 


Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion ModelsChang Liu, Haoning Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie[pdf] 


GenTron: Diffusion Transformers for Image and Video GenerationShoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua[pdf] [supp] 


TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion ModelsZhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, Tao Mei[pdf] [arXiv] 


TexVocab: Texture Vocabulary-conditioned Human AvatarsYuxiao Liu, Zhe Li, Yebin Liu, Haoqian Wang[pdf] [supp] [arXiv] 


KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree RotationFengyuan Yang, Kerui Gu, Angela Yao[pdf] [supp] [arXiv] 


SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh RenderingAntoine Guédon, Vincent Lepetit[pdf] [supp] 


Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image GenerationJunyan Wang, Zhenhong Sun, Zhiyu Tan, Xuanbai Chen, Weihua Chen, Hao Li, Cheng Zhang, Yang Song[pdf] [supp] [arXiv] 


A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video EditingMaomao Li, Yu Li, Tianyu Yang, Yunfei Liu, Dongxu Yue, Zhihui Lin, Dong Xu[pdf] [supp] [arXiv] 


URHand: Universal Relightable HandsZhaoxi Chen, Gyeongsik Moon, Kaiwen Guo, Chen Cao, Stanislav Pidhorskyi, Tomas Simon, Rohan Joshi, Yuan Dong, Yichen Xu, Bernardo Pires, He Wen, Lucas Evans, Bo Peng, Julia Buffalini, Autumn Trimble, Kevyn McPhail, Melissa Schoeller, Shoou-I Yu, Javier Romero, Michael Zollhofer, Yaser Sheikh, Ziwei Liu, Shunsuke Saito[pdf] [supp] [arXiv] 


Named Entity Driven Zero-Shot Image ManipulationZhida Feng, Li Chen, Jing Tian, JiaXiang Liu, Shikun Feng[pdf] [supp] 


ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view ImagesJinseo Jeong, Junseo Koo, Qimeng Zhang, Gunhee Kim[pdf] [supp] 


Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video SegmentationJiafan Zhuang, Zilei Wang, Yixin Zhang, Zhun Fan[pdf] 


Video Frame Interpolation via Direct Synthesis with the Event-based ReferenceYuhan Liu, Yongjian Deng, Hao Chen, Zhen Yang[pdf] [supp] 


DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided TransformerWei-Ting Chen, Gurunandan Krishnan, Qiang Gao, Sy-Yen Kuo, Sizhou Ma, Jian Wang[pdf] [supp] 


FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and DeblurringGeunhyuk Youk, Jihyong Oh, Munchurl Kim[pdf] [supp] 


Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose EstimationWenhao Li, Mengyuan Liu, Hong Liu, Pichao Wang, Jialun Cai, Nicu Sebe[pdf] [supp] [arXiv] 


Boosting Diffusion Models with Moving Average Sampling in Frequency DomainYurui Qian, Qi Cai, Yingwei Pan, Yehao Li, Ting Yao, Qibin Sun, Tao Mei[pdf] [supp] [arXiv] 


Bi-Causal: Group Activity Recognition via Bidirectional CausalityYouliang Zhang, Wenxuan Liu, Danni Xu, Zhuo Zhou, Zheng Wang[pdf] [supp] 


Space-Time Diffusion Features for Zero-Shot Text-Driven Motion TransferDanah Yatim, Rafail Fridman, Omer Bar-Tal, Yoni Kasten, Tali Dekel[pdf] [supp] [arXiv] 


MIGC: Multi-Instance Generation Controller for Text-to-Image SynthesisDewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang[pdf] [supp] [arXiv] 


Distilling CLIP with Dual Guidance for Learning Discriminative Human Body Shape RepresentationFeng Liu, Minchul Kim, Zhiyuan Ren, Xiaoming Liu[pdf] [supp] 


LLaFS: When Large Language Models Meet Few-Shot SegmentationLanyun Zhu, Tianrun Chen, Deyi Ji, Jieping Ye, Jun Liu[pdf] [supp] [arXiv] 


Kernel Adaptive Convolution for Scene Text Detection via Distance Map PredictionJinzhi Zheng, Heng Fan, Libo Zhang[pdf] 


Adaptive Multi-Modal Cross-Entropy Loss for Stereo MatchingPeng Xu, Zhiyu Xiang, Chengyu Qiao, Jingyun Fu, Tianyu Pu[pdf] [arXiv] 


Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated LearningWenlong Deng, Christos Thrampoulidis, Xiaoxiao Li[pdf] [supp] [arXiv] 


GALA: Generating Animatable Layered Assets from a Single ScanTaeksoo Kim, Byungjun Kim, Shunsuke Saito, Hanbyul Joo[pdf] [supp] [arXiv] 


LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One ExampleSoyeon Yoon, Kwan Yun, Kwanggyoon Seo, Sihun Cha, Jung Eun Yoo, Junyong Noh[pdf] [supp] [arXiv] 


Frequency-Adaptive Dilated Convolution for Semantic SegmentationLinwei Chen, Lin Gu, Dezhi Zheng, Ying Fu[pdf] [supp] 


Multiple View Geometry Transformers for 3D Human Pose EstimationZiwei Liao, Jialiang Zhu, Chunyu Wang, Han Hu, Steven L. Waslander[pdf] [supp] [arXiv] 


SiTH: Single-view Textured Human Reconstruction with Image-Conditioned DiffusionHsuan- I Ho, Jie Song, Otmar Hilliges[pdf] [supp] [arXiv] 


DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video EditingJia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, Yuchao Gu, Rui Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou[pdf] [supp] 


Real-Time Neural BRDF with Spherically Distributed PrimitivesYishun Dou, Zhong Zheng, Qiaoqiao Jin, Bingbing Ni, Yugang Chen, Junxiang Ke[pdf] [supp] [arXiv] 


VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion ModelsHaoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, Ying Shan[pdf] [supp] [arXiv] 


Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style TransferJiwoo Chung, Sangeek Hyun, Jae-Pil Heo[pdf] [supp] [arXiv] 


OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and PruningXinyu Geng, Jiaming Wang, Jiawei Gong, Yuerong Xue, Jun Xu, Fanglin Chen, Xiaolin Huang[pdf] [supp] [arXiv] 


Florence-2: Advancing a Unified Representation for a Variety of Vision TasksBin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan[pdf] [supp] 


NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the WildWeining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, Songyou Peng[pdf] [supp] 


3D Human Pose Perception from Egocentric Stereo VideosHiroyasu Akada, Jian Wang, Vladislav Golyanik, Christian Theobalt[pdf] [supp] [arXiv] 


Grid Diffusion Models for Text-to-Video GenerationTaegyeong Lee, Soyeong Kwon, Taehwan Kim[pdf] [supp] [arXiv] 


LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score MatchingYixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, Yingcong Chen[pdf] [supp] [arXiv] 


PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the WildKun Yuan, Hongbo Liu, Mading Li, Muyi Sun, Ming Sun, Jiachao Gong, Jinhua Hao, Chao Zhou, Yansong Tang[pdf] 


REACTO: Reconstructing Articulated Objects from a Single VideoChaoyue Song, Jiacheng Wei, Chuan Sheng Foo, Guosheng Lin, Fayao Liu[pdf] [supp] [arXiv] 


Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion RefinementJian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar, Danhang Tang, Thabo Beeler, Christian Theobalt[pdf] [supp] [arXiv] 


Language Embedded 3D Gaussians for Open-Vocabulary Scene UnderstandingJin-Chuan Shi, Miao Wang, Hao-Bin Duan, Shao-Hua Guan[pdf] [supp] [arXiv] 


Towards Automated Movie Trailer GenerationDawit Mureja Argaw, Mattia Soldan, Alejandro Pardo, Chen Zhao, Fabian Caba Heilbron, Joon Son Chung, Bernard Ghanem[pdf] [arXiv] 


Sheared Backpropagation for Fine-tuning Foundation ModelsZhiyuan Yu, Li Shen, Liang Ding, Xinmei Tian, Yixin Chen, Dacheng Tao[pdf] [supp] 


Misalignment-Robust Frequency Distribution Loss for Image TransformationZhangkai Ni, Juncheng Wu, Zian Wang, Wenhan Yang, Hanli Wang, Lin Ma[pdf] [supp] [arXiv] 


Degrees of Freedom Matter: Inferring Dynamics from Point TrajectoriesYan Zhang, Sergey Prokudin, Marko Mihajlovic, Qianli Ma, Siyu Tang[pdf] [supp] 


Low-Latency Neural Stereo StreamingQiqi Hou, Farzad Farhadzadeh, Amir Said, Guillaume Sautiere, Hoang Le[pdf] [supp] [arXiv] 


Intrinsic Image Diffusion for Indoor Single-view Material EstimationPeter Kocsis, Vincent Sitzmann, Matthias Nießner[pdf] [supp] 


Material Palette: Extraction of Materials from a Single ImageIvan Lopes, Fabio Pizzati, Raoul de Charette[pdf] [supp] [arXiv] 


RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image CustomizationMengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhang[pdf] [arXiv] 


Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code GenerationGuangyang Wu, Xiaohong Liu, Jun Jia, Xuehao Cui, Guangtao Zhai[pdf] [arXiv] 


ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image GenerationsMaitreya Patel, Changhoon Kim, Sheng Cheng, Chitta Baral, Yezhou Yang[pdf] [supp] [arXiv] 


Adaptive Bidirectional Displacement for Semi-Supervised Medical Image SegmentationHanyang Chi, Jian Pang, Bingfeng Zhang, Weifeng Liu[pdf] [supp] [arXiv] 


Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence TheoryJonas Kälble, Sascha Wirges, Maxim Tatarchenko, Eddy Ilg[pdf] [supp] 


DiffusionLight: Light Probes for Free by Painting a Chrome BallPakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet, Varun Jampani, Amit Raj, Pramook Khungurn, Supasorn Suwajanakorn[pdf] [supp] [arXiv] 


Rethinking the Spatial Inconsistency in Classifier-Free Diffusion GuidanceDazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, Yu Liu[pdf] [supp] [arXiv] 


KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose EstimationJihua Peng, Yanghong Zhou, P. Y. Mok[pdf] [supp] [arXiv] 


Differentiable Micro-Mesh ConstructionYishun Dou, Zhong Zheng, Qiaoqiao Jin, Rui Shi, Yuhan Li, Bingbing Ni[pdf] [supp] 


SNED: Superposition Network Architecture Search for Efficient Video Diffusion ModelZhengang Li, Yan Kang, Yuchen Liu, Difan Liu, Tobias Hinz, Feng Liu, Yanzhi Wang[pdf] 


LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion ModelChenjie Cao, Yunuo Cai, Qiaole Dong, Yikai Wang, Yanwei Fu[pdf] [supp] [arXiv] 


Personalized Residuals for Concept-Driven Text-to-Image GenerationCusuh Ham, Matthew Fisher, James Hays, Nicholas Kolkin, Yuchen Liu, Richard Zhang, Tobias Hinz[pdf] [supp] [arXiv] 


Condition-Aware Neural Network for Controlled Image GenerationHan Cai, Muyang Li, Qinsheng Zhang, Ming-Yu Liu, Song Han[pdf] [arXiv] 


Prompt Augmentation for Self-supervised Text-guided Image ManipulationRumeysa Bodur, Binod Bhattarai, Tae-Kyun Kim[pdf] [supp] 


Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D GlimpsesInhee Lee, Byungjun Kim, Hanbyul Joo[pdf] [supp] [arXiv] 


HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image ModelsNataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman[pdf] [supp] [arXiv] 


HardMo: A Large-Scale Hardcase Dataset for Motion CaptureJiaqi Liao, Chuanchen Luo, Yinuo Du, Yuxi Wang, Xucheng Yin, Man Zhang, Zhaoxiang Zhang, Junran Peng[pdf] [supp] 


Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic SegmentationZhiwei Yang, Kexue Fu, Minghong Duan, Linhao Qu, Shuo Wang, Zhijian Song[pdf] [supp] [arXiv] 


BiPer: Binary Neural Networks using a Periodic FunctionEdwin Vargas, Claudia V. Correa, Carlos Hinojosa, Henry Arguello[pdf] [supp] [arXiv] 


Segment Any Event Streams via Weighted Adaptation of Pivotal TokensZhiwen Chen, Zhiyu Zhu, Yifan Zhang, Junhui Hou, Guangming Shi, Jinjian Wu[pdf] [supp] 


AnyDoor: Zero-shot Object-level Image CustomizationXi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao[pdf] [arXiv] 


Clustering Propagation for Universal Medical Image SegmentationYuhang Ding, Liulei Li, Wenguan Wang, Yi Yang[pdf] [supp] [arXiv] 


Garment Recovery with Shape and Deformation PriorsRen Li, Corentin Dumery, Benoît Guillard, Pascal Fua[pdf] [supp] [arXiv] 


Psychometry: An Omnifit Model for Image Reconstruction from Human Brain ActivityRuijie Quan, Wenguan Wang, Zhibo Tian, Fan Ma, Yi Yang[pdf] [supp] [arXiv] 


Exploring Regional Clues in CLIP for Zero-Shot Semantic SegmentationYi Zhang, Meng-Hao Guo, Miao Wang, Shi-Min Hu[pdf] 


Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene AffordanceZan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang[pdf] [supp] [arXiv] 


Generalizable Face Landmarking Guided by Conditional Face WarpingJiayi Liang, Haotian Liu, Hongteng Xu, Dixin Luo[pdf] [supp] [arXiv] 


Sat2Scene: 3D Urban Scene Generation from Satellite Images with DiffusionZuoyue Li, Zhenqiang Li, Zhaopeng Cui, Marc Pollefeys, Martin R. Oswald[pdf] [supp] [arXiv] 


Control4D: Efficient 4D Portrait Editing with TextRuizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, Yebin Liu[pdf] [supp] [arXiv] 


CLIPtone: Unsupervised Learning for Text-based Image Tone AdjustmentHyeongmin Lee, Kyoungkook Kang, Jungseul Ok, Sunghyun Cho[pdf] [supp] [arXiv] 


Codebook Transfer with Part-of-Speech for Vector-Quantized Image ModelingBaoquan Zhang, Huaibin Wang, Chuyao Luo, Xutao Li, Guotao Liang, Yunming Ye, Xiaochen Qi, Yao He[pdf] [supp] [arXiv] 


InceptionNeXt: When Inception Meets ConvNeXtWeihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang[pdf] [supp] [arXiv] 


LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free EnvironmentYiming Ren, Xiao Han, Chengfeng Zhao, Jingya Wang, Lan Xu, Jingyi Yu, Yuexin Ma[pdf] [supp] [arXiv] 


Segment Every Out-of-Distribution ObjectWenjie Zhao, Jia Li, Xin Dong, Yu Xiang, Yunhui Guo[pdf] [supp] [arXiv] 


Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image RestorationChen Zhao, Weiling Cai, Chenyu Dong, Chengwei Hu[pdf] [supp] [arXiv] 


PoNQ: a Neural QEM-based Mesh RepresentationNissim Maruani, Maks Ovsjanikov, Pierre Alliez, Mathieu Desbrun[pdf] [supp] [arXiv] 


Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning ApproachBeichen Zhang, Xiaoxing Wang, Xiaohan Qin, Junchi Yan[pdf] [supp] [arXiv] 


Dr. Bokeh: DiffeRentiable Occlusion-aware Bokeh RenderingYichen Sheng, Zixun Yu, Lu Ling, Zhiwen Cao, Xuaner Zhang, Xin Lu, Ke Xian, Haiting Lin, Bedrich Benes[pdf] [supp] 


LAENeRF: Local Appearance Editing for Neural Radiance FieldsLukas Radl, Michael Steiner, Andreas Kurz, Markus Steinberger[pdf] [supp] [arXiv] 


Adversarial Score Distillation: When score distillation meets GANMin Wei, Jingkai Zhou, Junyao Sun, Xuesong Zhang[pdf] [supp] [arXiv] 


Vector Graphics Generation via Mutually Impulsed Dual-domain DiffusionZhongyin Zhao, Ye Chen, Zhangli Hu, Xuanhong Chen, Bingbing Ni[pdf] [supp] 


ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis ScoringYuan Xu, Xiaoxuan Ma, Jiajun Su, Wentao Zhu, Yu Qiao, Yizhou Wang[pdf] [supp] 


MeshPose: Unifying DensePose and 3D Body Mesh ReconstructionEric-Tuan Le, Antonis Kakolyris, Petros Koutras, Himmy Tam, Efstratios Skordos, George Papandreou, Riza Alp Güler, Iasonas Kokkinos[pdf] [supp] 


Unsupervised Salient Instance DetectionXin Tian, Ke Xu, Rynson Lau[pdf] 


Move Anything with Layered Scene DiffusionJiawei Ren, Mengmeng Xu, Jui-Chieh Wu, Ziwei Liu, Tao Xiang, Antoine Toisoul[pdf] [supp] [arXiv] 


Human Gaussian Splatting: Real-time Rendering of Animatable AvatarsArthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero[pdf] [supp] [arXiv] 


The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image EditingDenis Bobkov, Vadim Titov, Aibek Alanov, Dmitry Vetrov[pdf] [supp] 


Unbiased Estimator for Distorted Conics in Camera CalibrationChaehyeon Song, Jaeho Shin, Myung-Hwan Jeon, Jongwoo Lim, Ayoung Kim[pdf] [supp] [arXiv] 


MultiPhys: Multi-Person Physics-aware 3D Motion EstimationNicolas Ugrinovic, Boxiao Pan, Georgios Pavlakos, Despoina Paschalidou, Bokui Shen, Jordi Sanchez-Riera, Francesc Moreno-Noguer, Leonidas Guibas[pdf] [supp] [arXiv] 


NIVeL: Neural Implicit Vector Layers for Text-to-Vector GenerationVikas Thamizharasan, Difan Liu, Matthew Fisher, Nanxuan Zhao, Evangelos Kalogerakis, Michal Lukac[pdf] [supp] [arXiv] 


OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task CompletionXinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu[pdf] [supp] [arXiv] 


Text-Guided 3D Face Synthesis - From Generation to EditingYunjie Wu, Yapeng Meng, Zhipeng Hu, Lincheng Li, Haoqian Wu, Kun Zhou, Weiwei Xu, Xin Yu[pdf] [supp] 


Multiplane Prior Guided Few-Shot Aerial Scene RenderingZihan Gao, Licheng Jiao, Lingling Li, Xu Liu, Fang Liu, Puhua Chen, Yuwei Guo[pdf] [supp] 


MAS: Multi-view Ancestral Sampling for 3D Motion Generation Using 2D DiffusionRoy Kapon, Guy Tevet, Daniel Cohen-Or, Amit H. Bermano[pdf] [supp] [arXiv] 


Bilateral Event Mining and Complementary for Event Stream Super-ResolutionZhilin Huang, Quanmin Liang, Yijie Yu, Chujun Qin, Xiawu Zheng, Kai Huang, Zikun Zhou, Wenming Yang[pdf] [supp] [arXiv] 


SANeRF-HQ: Segment Anything for NeRF in High QualityYichen Liu, Benran Hu, Chi-Keung Tang, Yu-Wing Tai[pdf] [supp] 


Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token DictionaryLeheng Zhang, Yawei Li, Xingyu Zhou, Xiaorui Zhao, Shuhang Gu[pdf] [supp] [arXiv] 


Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous DevicesHuancheng Chen, Haris Vikalo[pdf] [supp] [arXiv] 


Neural Fields as Distributions: Signal Processing Beyond Euclidean SpaceDaniel Rebain, Soroosh Yazdani, Kwang Moo Yi, Andrea Tagliasacchi[pdf] [supp] 


Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive LearningWoo-Jin Ahn, Geun-Yeong Yang, Hyun-Duck Choi, Myo-Taeg Lim[pdf] [supp] [arXiv] 


X-3D: Explicit 3D Structure Modeling for Point Cloud RecognitionShuofeng Sun, Yongming Rao, Jiwen Lu, Haibin Yan[pdf] [supp] 


One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency ControlsMinghui Hu, Jianbin Zheng, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham[pdf] [supp] [arXiv] 


HIVE: Harnessing Human Feedback for Instructional Visual EditingShu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, Ran Xu[pdf] [supp] [arXiv] 


StrokeFaceNeRF: Stroke-based Facial Appearance Editing in Neural Radiance FieldXiao-Juan Li, Dingxi Zhang, Shu-Yu Chen, Feng-Lin Liu[pdf] [supp] 


ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion LearningYuxiang Zhang, Hongwen Zhang, Liangxiao Hu, Jiajun Zhang, Hongwei Yi, Shengping Zhang, Yebin Liu[pdf] [supp] [arXiv] 


On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth EstimationAgneet Chatterjee, Tejas Gokhale, Chitta Baral, Yezhou Yang[pdf] [supp] [arXiv] 


UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANsYanwu Xu, Yang Zhao, Zhisheng Xiao, Tingbo Hou[pdf] [supp] [arXiv] 


A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose EstimationQucheng Peng, Ce Zheng, Chen Chen[pdf] [supp] [arXiv] 


ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion ModelsFei Kong, Jinhao Duan, Lichao Sun, Hao Cheng, Renjing Xu, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu[pdf] [supp] 


Spectral Meets Spatial: Harmonising 3D Shape Matching and InterpolationDongliang Cao, Marvin Eisenberger, Nafie El Amrani, Daniel Cremers, Florian Bernard[pdf] [supp] [arXiv] 


Emu Edit: Precise Image Editing via Recognition and Generation TasksShelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, Yaniv Taigman[pdf] [supp] [arXiv] 


Face2Diffusion for Fast and Editable Face PersonalizationKaede Shiohara, Toshihiko Yamasaki[pdf] [supp] [arXiv] 


Dancing with Still Images: Video Distillation via Static-Dynamic DisentanglementZiyu Wang, Yue Xu, Cewu Lu, Yong-Lu Li[pdf] [supp] [arXiv] 


UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image RecognitionXiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan[pdf] [supp] [arXiv] 


SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score DistillationThuan Hoang Nguyen, Anh Tran[pdf] [supp] [arXiv] 


DEADiff: An Efficient Stylization Diffusion Model with Disentangled RepresentationsTianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang[pdf] [supp] [arXiv] 


Exact Fusion via Feature Distribution Matching for Few-shot Image GenerationYingbo Zhou, Yutong Ye, Pengyu Zhang, Xian Wei, Mingsong Chen[pdf] 


CoDeF: Content Deformation Fields for Temporally Consistent Video ProcessingHao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen[pdf] [supp] [arXiv] 


QUADify: Extracting Meshes with Pixel-level Details and Materials from ImagesMaximilian Frühauf, Hayko Riemenschneider, Markus Gross, Christopher Schroers[pdf] [supp] 


RecDiffusion: Rectangling for Image Stitching with Diffusion ModelsTianhao Zhou, Haipeng Li, Ziyi Wang, Ao Luo, Chen-Lin Zhang, Jiajun Li, Bing Zeng, Shuaicheng Liu[pdf] [supp] [arXiv] 


Eclipse: Disambiguating Illumination and Materials using Unintended ShadowsDor Verbin, Ben Mildenhall, Peter Hedman, Jonathan T. Barron, Todd Zickler, Pratul P. Srinivasan[pdf] [supp] [arXiv] 


Balancing Act: Distribution-Guided Debiasing in Diffusion ModelsRishubh Parihar, Abhijnya Bhat, Abhipsa Basu, Saswat Mallick, Jogendra Nath Kundu, R. Venkatesh Babu[pdf] [supp] [arXiv] 


Differentiable Point-based Inverse RenderingHoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek[pdf] [supp] [arXiv] 


A Unified and Interpretable Emotion Representation and Expression GenerationReni Paskaleva, Mykyta Holubakha, Andela Ilic, Saman Motamed, Luc Van Gool, Danda Paudel[pdf] [supp] [arXiv] 


Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-ResolutionShangchen Zhou, Peiqing Yang, Jianyi Wang, Yihang Luo, Chen Change Loy[pdf] [supp] 


4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic AnnotationsWenbo Wang, Hsuan-I Ho, Chen Guo, Boxiang Rong, Artur Grigorev, Jie Song, Juan Jose Zarate, Otmar Hilliges[pdf] [supp] 


Specularity Factorization for Low-Light EnhancementSaurabh Saini, P J Narayanan[pdf] [supp] [arXiv] 


Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion ModelsXianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu[pdf] [supp] [arXiv] 


MS-MANO: Enabling Hand Pose Tracking with Biomechanical ConstraintsPengfei Xie, Wenqiang Xu, Tutian Tang, Zhenjun Yu, Cewu Lu[pdf] [supp] 


Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion ModelsBin Fu, Fanghua Yu, Anran Liu, Zixuan Wang, Jie Wen, Junjun He, Yu Qiao[pdf] [supp] 


Diffuse Attend and Segment: Unsupervised Zero-Shot Segmentation using Stable DiffusionJunjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco[pdf] [supp] [arXiv] 


Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-IdentificationKaijie Ren, Lei Zhang[pdf] [arXiv] 


Gradient Alignment for Cross-Domain Face Anti-SpoofingBinh M. Le, Simon S. Woo[pdf] [supp] [arXiv] 


OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression RecognitionYuchen Pan, Junjun Jiang, Kui Jiang, Zhihao Wu, Keyuan Yu, Xianming Liu[pdf] [supp] [arXiv] 


Observation-Guided Diffusion Probabilistic ModelsJunoh Kang, Jinyoung Choi, Sungik Choi, Bohyung Han[pdf] [supp] [arXiv] 


Spatial-Aware Regression for Keypoint LocalizationDongkai Wang, Shiliang Zhang[pdf] [supp] 


EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits MattingZitao Wang, Qiguang Miao, Yue Xi, Peipei Zhao[pdf] [supp] [arXiv] 


MultiPly: Reconstruction of Multiple People from Monocular Video in the WildZeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin, Otmar Hilliges, Jie Song[pdf] [supp] 


ConsistNet: Enforcing 3D Consistency for Multi-view Images DiffusionJiayu Yang, Ziang Cheng, Yunfei Duan, Pan Ji, Hongdong Li[pdf] [arXiv] 


GenN2N: Generative NeRF2NeRF TranslationXiangyue Liu, Han Xue, Kunming Luo, Ping Tan, Li Yi[pdf] [supp] [arXiv] 


Universal Robustness via Median Randomized Smoothing for Real-World Super-ResolutionZakariya Chaouai, Mohamed Tamaazousti[pdf] [supp] [arXiv] 


One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing ApplicationsMengyao Lyu, Yuhong Yang, Haiwen Hong, Hui Chen, Xuan Jin, Yuan He, Hui Xue, Jungong Han, Guiguang Ding[pdf] [supp] [arXiv] 


Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation AlgorithmsJoren Brunekreef, Eric Marcus, Ray Sheombarsing, Jan-Jakob Sonke, Jonas Teuwen[pdf] [supp] [arXiv] 


Diversity-aware Channel Pruning for StyleGAN CompressionJiwoo Chung, Sangeek Hyun, Sang-Heon Shim, Jae-Pil Heo[pdf] [supp] [arXiv] 


Neural Clustering based Visual Representation LearningGuikun Chen, Xia Li, Yi Yang, Wenguan Wang[pdf] [supp] [arXiv] 


Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object DetectionTahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Muhammad Zeshan Afzal[pdf] [supp] 


Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation TransformerYuang Ai, Xiaoqiang Zhou, Huaibo Huang, Lei Zhang, Ran He[pdf] [arXiv] 


Spacetime Gaussian Feature Splatting for Real-Time Dynamic View SynthesisZhan Li, Zhang Chen, Zhong Li, Yi Xu[pdf] [supp] [arXiv] 


Instruct-Imagen: Image Generation with Multi-modal InstructionHexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia[pdf] [supp] 


Rethinking Few-shot 3D Point Cloud Semantic SegmentationZhaochong An, Guolei Sun, Yun Liu, Fayao Liu, Zongwei Wu, Dan Wang, Luc Van Gool, Serge Belongie[pdf] [supp] [arXiv] 


GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNsMustafa Munir, William Avery, Md Mostafijur Rahman, Radu Marculescu[pdf] [supp] [arXiv] 


Relightable and Animatable Neural Avatar from Sparse-View VideoZhen Xu, Sida Peng, Chen Geng, Linzhan Mou, Zihan Yan, Jiaming Sun, Hujun Bao, Xiaowei Zhou[pdf] [arXiv] 


Pose Adapted Shape Learning for Large-Pose Face ReenactmentGee-Sern Jison Hsu, Jie-Ying Zhang, Huang Yu Hsiang, Wei-Jie Hong[pdf] [supp] 


NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose PriorsYannan He, Garvita Tiwari, Tolga Birdal, Jan Eric Lenssen, Gerard Pons-Moll[pdf] [supp] [arXiv] 


RepAn: Enhanced Annealing through Re-parameterizationXiang Fei, Xiawu Zheng, Yan Wang, Fei Chao, Chenglin Wu, Liujuan Cao[pdf] [supp] 


DreamControl: Control-Based Text-to-3D Generation with 3D Self-PriorTianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, Rynson W.H. Lau, Wangmeng Zuo[pdf] [supp] [arXiv] 


ODIN: A Single Model for 2D and 3D SegmentationAyush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, Katerina Fragkiadaki[pdf] [supp] [arXiv] 


InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise OptimizationXiefan Guo, Jinlin Liu, Miaomiao Cui, Jiankai Li, Hongyu Yang, Di Huang[pdf] [supp] [arXiv] 


Multimodal Sense-Informed Forecasting of 3D Human MotionsZhenyu Lou, Qiongjie Cui, Haofan Wang, Xu Tang, Hong Zhou[pdf] [arXiv] 


FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph TransformerDongyeong Hwang, Hyunju Kim, Sunwoo Kim, Kijung Shin[pdf] [supp] [arXiv] 


EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion ModelsJingyuan Yang, Jiawei Feng, Hui Huang[pdf] [supp] [arXiv] 


Neural Implicit Representation for Building Digital Twins of Unknown Articulated ObjectsYijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, Stan Birchfield[pdf] [supp] [arXiv] 


Vanishing-Point-Guided Video Semantic Segmentation of Driving ScenesDiandian Guo, Deng-Ping Fan, Tongyu Lu, Christos Sakaridis, Luc Van Gool[pdf] [supp] [arXiv] 


LAMP: Learn A Motion Pattern for Few-Shot Video GenerationRuiqi Wu, Liangyu Chen, Tong Yang, Chunle Guo, Chongyi Li, Xiangyu Zhang[pdf] [supp] 


Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset UpdatesKa Chun Shum, Jaeyeon Kim, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung[pdf] [supp] [arXiv] 


DREAM: Diffusion Rectification and Estimation-Adaptive ModelsJinxin Zhou, Tianyu Ding, Tianyi Chen, Jiachen Jiang, Ilya Zharkov, Zhihui Zhu, Luming Liang[pdf] [supp] [arXiv] 



LIST