도전2022
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, Date:13-18 June 2010 본문
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, Date:13-18 June 2010
hotdigi 2010. 12. 1. 19:34http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?asf_arn=null&asf_iid=null&asf_pun=5521876&asf_in=null&asf_rpp=null&asf_iv=null&asf_sp=null&asf_pn=1
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
Date:13-18 June 2010
460여개의 논문을 일일이 검색하는 것이 불편하여 링크를 복사함..
Table of contents
Page(s): 1 - 40Digital Object Identifier : 10.1109/CVPR.2010.5540236
Full Text: PDF (466KB)
Object-graphs for context-aware category discovery
Lee, Y.J. Grauman, K.Page(s): 1 - 8
Digital Object Identifier : 10.1109/CVPR.2010.5540237
AbstractPlus | Full Text: PDF (1179KB) | Multimedia
How can knowing about some categories help us to discover new ones in unlabeled images? Unsupervised visual category discovery is useful to mine for recurring objects without human supervision, but existing methods assume no prior information and thus tend to perform poorly for cluttered scenes with multiple objects. We propose to leverage knowledge about previously learned categories to enable more accurate discovery. We introduce a novel object-graph descriptor to encode the layout of object-l... Read More »
Grouplet: A structured image representation for recognizing human and object interactions
Bangpeng Yao Li Fei-FeiPage(s): 9 - 16
Digital Object Identifier : 10.1109/CVPR.2010.5540234
AbstractPlus | Full Text: PDF (1032KB)
Psychologists have proposed that many human-object interaction activities form unique classes of scenes. Recognizing these scenes is important for many social functions. To enable a computer to do this is however a challenging task. Take people-playing-musical-instrument (PPMI) as an example; to distinguish a person playing violin from a person just holding a violin requires subtle distinction of characteristic image features and feature arrangements that differentiate these two scenes. Most of ... Read More »
Modeling mutual context of object and human pose in human-object interaction activities
Bangpeng Yao Li Fei-FeiPage(s): 17 - 24
Digital Object Identifier : 10.1109/CVPR.2010.5540235
AbstractPlus | Full Text: PDF (1857KB)
Detecting objects in cluttered scenes and estimating articulated human body parts are two challenging problems in computer vision. The difficulty is particularly pronounced in activities involving human-object interactions (e.g. playing tennis), where the relevant object tends to be small or only partially visible, and the human body parts are often self-occluded. We observe, however, that objects and human poses can serve as mutual context to each other - recognizing one facilitates the recogni... Read More »
The chains model for detecting parts by their context
Karlinsky, L. Dinerstein, M. Harari, D. Ullman, S.Page(s): 25 - 32
Digital Object Identifier : 10.1109/CVPR.2010.5540232
AbstractPlus | Full Text: PDF (3293KB) | Multimedia
Detecting an object part relies on two sources of information - the appearance of the part itself and the context supplied by surrounding parts. In this paper we consider problems in which a target part cannot be recognized reliably using its own appearance, such as detecting low-resolution hands, and must be recognized using the context of surrounding parts. We develop the `chains model' which can locate parts of interest in a robust and precise manner, even when the surrounding context is high... Read More »
Detecting and sketching the common
Shai Bagon Brostovski, O. Galun, M. Irani, M.Page(s): 33 - 40
Digital Object Identifier : 10.1109/CVPR.2010.5540233
AbstractPlus | Full Text: PDF (2627KB)
Given very few images containing a common object of interest under severe variations in appearance, we detect the common object and provide a compact visual representation of that object, depicted by a binary sketch. Our algorithm is composed of two stages: (i) Detect a mutually common (yet non-trivial) ensemble of `self-similarity descriptors' shared by all the input images. (ii) Having found such a mutually common ensemble, `invert' it to generate a compact sketch which best represents this en... Read More »
High performance object detection by collaborative learning of Joint Ranking of Granules features
Chang Huang Nevatia, R.Page(s): 41 - 48
Digital Object Identifier : 10.1109/CVPR.2010.5540230
AbstractPlus | Full Text: PDF (3045KB)
Object detection remains an important but challenging task in computer vision. We present a method that combines high accuracy with high efficiency. We adopt simplified forms of APCF features, which we term Joint Ranking of Granules (JRoG) features; the features consists of discrete values by uniting binary ranking results of pair-wise granules in the image. We propose a novel collaborative learning method for JRoG features, which consists of a Simulated Annealing (SA) module and an incremental ... Read More »
P-N learning: Bootstrapping binary classifiers by structural constraints
Kalal, Z. Matas, J. Mikolajczyk, K.Page(s): 49 - 56
Digital Object Identifier : 10.1109/CVPR.2010.5540231
AbstractPlus | Full Text: PDF (2500KB) | Multimedia
This paper shows that the performance of a binary classifier can be significantly improved by the processing of structured unlabeled data, i.e. data are structured if knowing the label of one example restricts the labeling of the others. We propose a novel paradigm for training a binary classifier from labeled and unlabeled examples that we call P-N learning. The learning process is guided by positive (P) and negative (N) constraints which restrict the labeling of the unlabeled set. P-N learning... Read More »
3D Scene priors for road detection
Alvarez, J.M. Gevers, T. Lopez, A.M.Page(s): 57 - 64
Digital Object Identifier : 10.1109/CVPR.2010.5540228
AbstractPlus | Full Text: PDF (3266KB)
Vision-based road detection is important in different areas of computer vision such as autonomous driving, car collision warning and pedestrian crossing detection. However, current vision-based road detection methods are usually based on low-level features and they assume structured roads, road homogeneity, and uniform lighting conditions. Therefore, in this paper, contextual 3D information is used in addition to low-level cues. Low-level photometric invariant cues are derived from the appearanc... Read More »
Toward coherent object detection and scene layout understanding
Sid Ying-Ze Bao Min Sun Savarese, S.Page(s): 65 - 72
Digital Object Identifier : 10.1109/CVPR.2010.5540229
AbstractPlus | Full Text: PDF (6113KB)
What is an object?
Alexe, B. Deselaers, T. Ferrari, V.Page(s): 73 - 80
Digital Object Identifier : 10.1109/CVPR.2010.5540226
AbstractPlus | Full Text: PDF (3520KB)
We present a generic objectness measure, quantifying how likely it is for an image window to contain an object of any class. We explicitly train it to distinguish objects with a well-defined boundary in space, such as cows and telephones, from amorphous background elements, such as grass and road. The measure combines in a Bayesian framework several image cues measuring characteristics of objects, such as appearing different from their surroundings and having a closed boundary. This includes an ... Read More »
Fast globally optimal 2D human detection with loopy graph models
Tai-Peng Tian Sclaroff, S.Page(s): 81 - 88
Digital Object Identifier : 10.1109/CVPR.2010.5540227
AbstractPlus | Full Text: PDF (356KB)
This paper presents an algorithm for recovering the globally optimal 2D human figure detection using a loopy graph model. This is computationally challenging because the time complexity scales exponentially in the size of the largest clique in the graph. The proposed algorithm uses Branch and Bound (BB) to search for the globally optimal solution. The algorithm converges rapidly in practice and this is due to a novel method for quickly computing tree based lower bounds. The key idea is to recycl... Read More »
Cascaded L1-norm Minimization Learning (CLML) classifier for human detection
Ran Xu Baochang Zhang Qixiang Ye Jianbin JiaoPage(s): 89 - 96
Digital Object Identifier : 10.1109/CVPR.2010.5540224
AbstractPlus | Full Text: PDF (496KB)
This paper proposes a new learning method, which integrates feature selection with classifier construction for human detection via solving three optimization models. Firstly, the method trains a series of weak-classifiers by the proposed L1-norm Minimization Learning (LML) and min-max penalty function models. Secondly, the proposed method selects the weak-classifiers by using the integer optimization model to construct a strong classifier. The L1-norm minimization and integer optimization models... Read More »
Convex shape decomposition
Hairong Liu Wenyu Liu Latecki, L.J.Page(s): 97 - 104
Digital Object Identifier : 10.1109/CVPR.2010.5540225
AbstractPlus | Full Text: PDF (498KB)
In this paper, we propose a new shape decomposition method, called convex shape decomposition. We formalize the convex decomposition problem as an integer linear programming problem, and obtain approximate optimal solution by minimizing the total cost of decomposition under some concavity constraints. Our method is based on Morse theory and combines information from multiple Morse functions. The obtained decomposition provides a compact representation, both geometrical and topological, of origin... Read More »
A theory of phase-sensitive rotation invariance with spherical harmonic and moment-based representations
Kakarala, R. Dansheng MaoPage(s): 105 - 112
Digital Object Identifier : 10.1109/CVPR.2010.5540222
AbstractPlus | Full Text: PDF (380KB)
This paper describes how phase-sensitive rotation invariants for three-dimensional data may be obtained. A “bispectrum” is formulated for rotations, and its properties are derived for spherical harmonic coefficients as well as for moments. The bispectral invariants offer improved discrimination over previously published magnitude-only invariants. They are able to distinguish rotations from reflections, as well as rotations of an entire shape from component-wise rotations of element... Read More »
Multi-class object localization by combining local contextual interactions
Galleguillos, C. McFee, B. Belongie, S. Lanckriet, G.Page(s): 113 - 120
Digital Object Identifier : 10.1109/CVPR.2010.5540223
AbstractPlus | Full Text: PDF (2288KB)
Recent work in object localization has shown that the use of contextual cues can greatly improve accuracy over models that use appearance features alone. Although many of these models have successfully explored different types of contextual sources, they only consider one type of contextual interaction (e.g., pixel, region or object level interactions), leaving open questions about the true potential contribution of context. Furthermore, contributions across object classes and over appearance fe... Read More »
Automatic discovery of meaningful object parts with latent CRFs
Schnitzspan, P. Roth, S. Schiele, B.Page(s): 121 - 128
Digital Object Identifier : 10.1109/CVPR.2010.5540220
AbstractPlus | Full Text: PDF (1387KB) | Multimedia
Object recognition is challenging due to high intra-class variability caused, e.g., by articulation, viewpoint changes, and partial occlusion. Successful methods need to strike a balance between being flexible enough to model such variation and discriminative enough to detect objects in cluttered, real world scenes. Motivated by these challenges we propose a latent conditional random field (CRF) based on a flexible assembly of parts. By modeling part labels as hidden nodes and developing an EM a... Read More »
Exploiting hierarchical context on a large database of object categories
Myung Jin Choi Lim, J.J. Torralba, A. Willsky, A.S.Page(s): 129 - 136
Digital Object Identifier : 10.1109/CVPR.2010.5540221
AbstractPlus | Full Text: PDF (4420KB)
There has been a growing interest in exploiting contextual information in addition to local features to detect and localize multiple object categories in an image. Context models can efficiently rule out some unlikely combinations or locations of objects and guide detectors to produce a semantically coherent interpretation of a scene. However, the performance benefit from using context models has been limited because most of these methods were tested on datasets with only a few object categories... Read More »
Learning appearance in virtual scenarios for pedestrian detection
Marin, J. Vazquez, D. Geronimo, D. Lopez, A.M.Page(s): 137 - 144
Digital Object Identifier : 10.1109/CVPR.2010.5540218
AbstractPlus | Full Text: PDF (683KB)
Detecting pedestrians in images is a key functionality to avoid vehicle-to-pedestrian collisions. The most promising detectors rely on appearance-based pedestrian classifiers trained with labelled samples. This paper addresses the following question: can a pedestrian appearance model learnt in virtual scenarios work successfully for pedestrian detection in real images? (Fig. 1). Our experiments suggest a positive answer, which is a new and relevant conclusion for research in pedestrian detection... Read More »
Polynomial shape from shading
Ecker, A. Jepson, A.D.Page(s): 145 - 152
Digital Object Identifier : 10.1109/CVPR.2010.5540219
AbstractPlus | Full Text: PDF (5163KB)
Analysis of light transport in scattering media
Mukaigawa, Y. Yagi, Y. Raskar, R.Page(s): 153 - 160
Digital Object Identifier : 10.1109/CVPR.2010.5540216
AbstractPlus | Full Text: PDF (741KB)
We propose a new method to analyze light transport in homogeneous scattering media. The incident light undergoes multiple bounces in translucent objects, and produces a complex light field. Our method analyzes the light transport in two steps. First, single and multiple scattering are separated by projecting high-frequency stripe patterns. Then, multiple scattering is decomposed into each bounce component based on the light transport equation. The light field for each bounce is recursively estim... Read More »
A new texture descriptor using multifractal analysis in multi-orientation wavelet pyramid
Yong Xu Xiong Yang Haibin Ling Hui JiPage(s): 161 - 168
Digital Object Identifier : 10.1109/CVPR.2010.5540217
AbstractPlus | Full Text: PDF (6839KB)
Based on multifractal analysis in wavelet pyramids of texture images, a new texture descriptor is proposed in this paper that implicitly combines information from both spatial and frequency domains. Beyond the traditional wavelet transform, a multi-oriented wavelet leader pyramid is used in our approach that robustly encodes the multi-scale information of texture edgels. Moreover, the resulting texture model shows empirically a strong power law relationship for nature textures, which can be char... Read More »
A content-aware image prior
Taeg Sang Cho Joshi, N. Zitnick, C.L. Sing Bing Kang Szeliski, R. Freeman, W.T.Page(s): 169 - 176
Digital Object Identifier : 10.1109/CVPR.2010.5540214
AbstractPlus | Full Text: PDF (9032KB)
In image restoration tasks, a heavy-tailed gradient distribution of natural images has been extensively exploited as an image prior. Most image restoration algorithms impose a sparse gradient prior on the whole image, reconstructing an image with piecewise smooth characteristics. While the sparse gradient prior removes ringing and noise artifacts, it also tends to remove mid-frequency textures, degrading the visual quality. We can attribute such degradations to imposing an incorrect image prior.... Read More »
Learning from interpolated images using neural networks for digital forensics
Yizhen Huang Na FanPage(s): 177 - 182
Digital Object Identifier : 10.1109/CVPR.2010.5540215
AbstractPlus | Full Text: PDF (377KB)
Interpolated images have data redundancy, and special correlation exists among neighboring pixels, which is a crucial clue in digital forensics. We design a neural network based framework to approximate the stylized computational rules of interpolation algorithms for learning statistical inter-pixel correlation of interpolated images. The interpolation process is cognized from the interpolation results. Experiments are carried out on camera built-in Color Filter Array interpolation and super res... Read More »
A probabilistic image jigsaw puzzle solver
Taeg Sang Cho Avidan, S. Freeman, W.T.Page(s): 183 - 190
Digital Object Identifier : 10.1109/CVPR.2010.5540212
AbstractPlus | Full Text: PDF (3210KB)
We explore the problem of reconstructing an image from a bag of square, non-overlapping image patches, the jigsaw puzzle problem. Completing jigsaw puzzles is challenging and requires expertise even for humans, and is known to be NP-complete. We depart from previous methods that treat the problem as a constraint satisfaction problem and develop a graphical model to solve it. Each patch location is a node and each patch is a label at nodes in the graph. A graphical model requires a pairwise compa... Read More »
Dynamic texture recognition based on distributions of spacetime oriented structure
Derpanis, K.G. Wildes, R.P.Page(s): 191 - 198
Digital Object Identifier : 10.1109/CVPR.2010.5540213
AbstractPlus | Full Text: PDF (888KB)
This paper addresses the challenge of recognizing dynamic textures based on their observed visual dynamics. Typically, the term dynamic texture is used with reference to image sequences of various natural processes that exhibit stochastic dynamics (e.g., smoke, water and windblown vegetation); although, it applies equally well to images of simpler dynamics when analyzed in terms of aggregate region properties (e.g., uniform motion of elements in traffic video). In this paper, a novel approach to... Read More »
Finding dots: Segmentation as popping out regions from boundaries
Bernardis, E. Yu, S.X.Page(s): 199 - 206
Digital Object Identifier : 10.1109/CVPR.2010.5540210
AbstractPlus | Full Text: PDF (5557KB)
Many applications need to segment out all small round regions in an image. This task of finding dots can be viewed as a region segmentation problem where the dots form one region and the areas between dots form the other. We formulate it as a graph cuts problem with two types of grouping cues: short-range attraction based on feature similarity and long-range repulsion based on feature dissimilarity. The feature we use is a pixel-centric relational representation that encodes local convexity: Pix... Read More »
Estimating optical properties of layered surfaces using the spider model
Morimoto, T. Tan, R.T. Kawakami, R. Ikeuchi, K.Page(s): 207 - 214
Digital Object Identifier : 10.1109/CVPR.2010.5540211
AbstractPlus | Full Text: PDF (3344KB) | Multimedia
Many object surfaces are composed of layers of different physical substances, known as layered surfaces. These surfaces, such as patinas, water colors, and wall paintings, have more complex optical properties than diffuse surfaces. Although the characteristics of layered surfaces, like layer opacity, mixture of colors, and color gradations, are significant, they are usually ignored in the analysis of many methods in computer vision, causing inaccurate or even erroneous results. Therefore, the ma... Read More »
Optimal HDR reconstruction with linear digital cameras
Granados, M. Ajdin, B. Wand, M. Theobalt, C. Seidel, H.-P. Lensch, H.P.A.Page(s): 215 - 222
Digital Object Identifier : 10.1109/CVPR.2010.5540208
AbstractPlus | Full Text: PDF (508KB) | Multimedia
Given a multi-exposure sequence of a scene, our aim is to recover the absolute irradiance falling onto a linear camera sensor. The established approach is to perform a weighted average of the scaled input exposures. However, there is no clear consensus on the appropriate weighting to use. We propose a weighting function that produces statistically optimal estimates under the assumption of compound-Gaussian noise. Our weighting is based on a calibrated camera model that accounts for all noise sou... Read More »
Learning to recognize shadows in monochromatic natural images
Jiejie Zhu Samuel, K.G.G. Masood, S.Z. Tappen, M.F.Page(s): 223 - 230
Digital Object Identifier : 10.1109/CVPR.2010.5540209
AbstractPlus | Full Text: PDF (1692KB)
Context-constrained hallucination for image super-resolution
Jian Sun Jiejie Zhu Tappen, M.F.Page(s): 231 - 238
Digital Object Identifier : 10.1109/CVPR.2010.5540206
AbstractPlus | Full Text: PDF (7501KB)
This paper proposes a context-constrained hallucination approach for image super-resolution. Through building a training set of high-resolution/low-resolution image segment pairs, the high-resolution pixel is hallucinated from its texturally similar segments which are retrieved from the training set by texture similarity. Given the discrete hallucinated examples, a continuous energy function is designed to enforce the fidelity of high-resolution image to low-resolution input and the constraints ... Read More »
Exploring features in a Bayesian framework for material recognition
Ce Liu Sharan, L. Adelson, E.H. Rosenholtz, R.Page(s): 239 - 246
Digital Object Identifier : 10.1109/CVPR.2010.5540207
AbstractPlus | Full Text: PDF (4715KB)
We are interested in identifying the material category, e.g. glass, metal, fabric, plastic or wood, from a single image of a surface. Unlike other visual recognition tasks in computer vision, it is difficult to find good, reliable features that can tell material categories apart. Our strategy is to use a rich set of low and mid-level features that capture various aspects of material appearance. We propose an augmented Latent Dirichlet Allocation (aLDA) model to combine these features under a Bay... Read More »
Image restoration and disparity estimation from an uncalibrated multi-layered image
Yano, T. Shimizu, M. Okutomi, M.Page(s): 247 - 254
Digital Object Identifier : 10.1109/CVPR.2010.5540204
AbstractPlus | Full Text: PDF (900KB)
Watching a reflection in a glass window, one can often observe a multi-layered image consisting of a front-surface reflection from the glass and a rear-surface reflection through the glass. That multi-layered image is a composition of dual aspects of the same image, resembling a sound reverberation. As described herein, we propose a method to estimate the original reflection image before the layering. First, we model the multi-layered image generation process; then we derive a restoration filter... Read More »
Estimation of image bias field with sparsity constraints
Yuanjie Zheng Gee, J.C.Page(s): 255 - 262
Digital Object Identifier : 10.1109/CVPR.2010.5540205
AbstractPlus | Full Text: PDF (333KB)
We propose a new scheme to estimate image bias field through introducing two sparsity constraints. One is that the bias-free image has concise representation with image gradients or coefficients of other image transformations. The other constraint is that model fit on the bias field should be as concise as possible. The new scheme enables adaptive specifications of the estimated bias field's smoothness, and results in extremely accurate solutions with more efficient optimization techniques, e.g.... Read More »
Performance evaluation of color correction approaches for automatic multi-view image and video stitching
Wei Xu Mulligan, J.Page(s): 263 - 270
Digital Object Identifier : 10.1109/CVPR.2010.5540202
AbstractPlus | Full Text: PDF (731KB)
Many different automatic color correction approaches have been proposed by different research communities in the past decade. However, these approaches are seldom compared, so their relative performance and applicability are unclear. For multi-view image and video stitching applications, an ideal color correction approach should be effective at transferring the color palette of the source image to the target image, and meanwhile be able to extend the transferred color from the overlapped area to... Read More »
Surface color estimation based on inter- and intra-pixel relationships in outdoor scenes
Hirose, S. Suenaga, T. Takemura, K. Kawakami, R. Takamatsu, J. Ogasawara, T.Page(s): 271 - 278
Digital Object Identifier : 10.1109/CVPR.2010.5540203
AbstractPlus | Full Text: PDF (3571KB)
We propose a method for estimating inherent surface color robustly against image noises from two registered images taken under different outdoor illuminations. We formulate the estimation based on maximum likelihood manner while considering both inter-pixel and intra-pixel relationships. We define inter-pixel relationship based on stochastic behavior of image noises and properties of outdoor illumination chromaticity. We rely on the spatial continuity of both surface color and illumination to de... Read More »
Estimating demosaicing algorithms using image noise variance
Takamatsu, J. Matsushita, Y. Ogasawara, T. Ikeuchi, K.Page(s): 279 - 286
Digital Object Identifier : 10.1109/CVPR.2010.5540200
AbstractPlus | Full Text: PDF (5091KB)
We propose a method for estimating demosaicing algorithms from image noise variance. We show that the noise variance in interpolated pixels becomes smaller than that of directly observed pixels without interpolation. Our method capitalizes on the spatial variation of image noise variance in demosaiced images to estimate the color filter array patterns and demosaicing algorithms. We verify the effectiveness of the proposed method using various images demosaiced with different demosaicing algorith... Read More »
Object-to-object color transfer: Optimal flows and SMSP transformations
Freedman, D. Kisilev, P.Page(s): 287 - 294
Digital Object Identifier : 10.1109/CVPR.2010.5540201
AbstractPlus | Full Text: PDF (1132KB)
Given a source object and a target object, we consider the problem of transferring the “color scheme” of the source to the target, while at the same time maintaining the target's original look and feel. This is a challenging problem due to the fact that the source and target may each consist of multiple colors, each of which comes in multiple shades. We propose a two stage solution to this problem. (1) A discrete color flow is computed from the target histogram to the source histog... Read More »
The phase only transform for unsupervised surface defect detection
Aiger, D. Talbot, H.Page(s): 295 - 302
Digital Object Identifier : 10.1109/CVPR.2010.5540198
AbstractPlus | Full Text: PDF (2298KB)
We present a simple, fast, and effective method to detect defects on textured surfaces. Our method is unsupervised and contains no learning stage or information on the texture being inspected. The new method is based on the Phase Only Transform (PHOT) which correspond to the Discrete Fourier Transform (DFT), normalized by the magnitude. The PHOT removes any regularities, at arbitrary scales, from the image while preserving only irregular patterns considered to represent defects. The localization... Read More »
Direct image alignment of projector-camera systems with planar surfaces
Audet, S. Okutomi, M. Tanaka, M.Page(s): 303 - 310
Digital Object Identifier : 10.1109/CVPR.2010.5540199
AbstractPlus | Full Text: PDF (356KB) | Multimedia
Spatialized epitome and its applications
Xinqi Chu Shuicheng Yan Liyuan Li Kap Luk Chan Huang, T.S.Page(s): 311 - 318
Digital Object Identifier : 10.1109/CVPR.2010.5540196
AbstractPlus | Full Text: PDF (959KB)
Due to the lack of explicit spatial consideration, existing epitome model may fail for image recognition and target detection, which directly motivates us to propose the so-called spatialized epitome in this paper. Extended from the original graphical model of epitome, the spatialized epitome provides a general framework to integrate both appearance and spatial arrangement of patches in the image to achieve a more precise likelihood representation for image(s) and eliminate ambiguities in image ... Read More »
Global optimization for estimating a BRDF with multiple specular lobes
Chanki Yu Yongduek Seo Sang Wook LeePage(s): 319 - 326
Digital Object Identifier : 10.1109/CVPR.2010.5540197
AbstractPlus | Full Text: PDF (423KB)
This paper presents a global minimization framework for estimating analytical BRDF model parameters using the techniques of convex programming and branch and bound. Traditional local minimization suffers from local minima and requires a large number of initial conditions and supervision for successful results especially when a model is highly complex and nonlinear. We consider the Cook-Torrance model, a parametric model with the Gaussian-like Beckmann distributions for specular reflectances. Ins... Read More »
An approach to vectorial total variation based on geometric measure theory
Goldluecke, B. Cremers, D.Page(s): 327 - 333
Digital Object Identifier : 10.1109/CVPR.2010.5540194
AbstractPlus | Full Text: PDF (2064KB)
We analyze a previously unexplored generalization of the scalar total variation to vector-valued functions, which is motivated by geometric measure theory. A complete mathematical characterization is given, which proves important invariance properties as well as existence of solutions of the vectorial ROF model. As an important feature, there exists a dual formulation for the proposed vectorial total variation, which leads to a fast and stable minimization algorithm. The main difference to previ... Read More »
Robust order-based methods for feature description
Gupta, R. Patil, H. Mittal, A.Page(s): 334 - 341
Digital Object Identifier : 10.1109/CVPR.2010.5540195
AbstractPlus | Full Text: PDF (437KB)
Feature-based methods have found increasing use in many applications such as object recognition, 3D reconstruction and mosaicing. In this paper, we focus on the problem of matching such features. While a histogram-of-gradients type methods such as SIFT, GLOH and Shape Context are currently popular, several papers have suggested using orders of pixels rather than raw intensities and shown improved results for some applications. The papers suggest two different techniques for doing so: (1) A Histo... Read More »
Rectilinear parsing of architecture in urban environment
Peng Zhao Tian Fang Jianxiong Xiao Honghui Zhang Qinping Zhao Long QuanPage(s): 342 - 349
Digital Object Identifier : 10.1109/CVPR.2010.5540192
AbstractPlus | Full Text: PDF (7304KB)
We propose an approach that parses registered images captured at ground level into architectural units for large-scale city modeling. Each parsed unit has a regularized shape, which can be used for further modeling purposes. In our approach, we first parse the environment into buildings, the ground, and the sky using a joint 2D-3D segmentation method. Then, we partition buildings into individual façades. The partition problem is formulated as a dynamic programming optimization for a seque... Read More »
Hybrid multi-view reconstruction by Jump-Diffusion
Lafarge, F. Keriven, R. Brédif, M. Vu Hoang HiepPage(s): 350 - 357
Digital Object Identifier : 10.1109/CVPR.2010.5540193
AbstractPlus | Full Text: PDF (6408KB)
We propose a multi-view stereo reconstruction algorithm which recovers urban scenes as a combination of meshes and geometric primitives. It provides a compact model while preserving details: irregular elements such as statues and ornaments are described by meshes whereas regular structures such as columns and walls are described by primitives (planes, spheres, cylinders, cones and tori). A Jump-Diffusion process is designed to sample these two types of elements simultaneously. The quality of a r... Read More »
Building reconstruction using manhattan-world grammars
Vanegas, C.A. Aliaga, D.G. Benes̆, B.Page(s): 358 - 365
Digital Object Identifier : 10.1109/CVPR.2010.5540190
AbstractPlus | Full Text: PDF (1932KB) | Multimedia
We present a passive computer vision method that exploits existing mapping and navigation databases in order to automatically create 3D building models. Our method defines a grammar for representing changes in building geometry that approximately follow the Manhattan-world assumption which states there is a predominance of three mutually orthogonal directions in the scene. By using multiple calibrated aerial images, we extend previous Manhattan-world methods to robustly produce a single, coheren... Read More »
Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map
Tat-Jen Cham Ciptadi, A. Wei-Chian Tan Minh-Tri Pham Liang-Tien ChiaPage(s): 366 - 373
Digital Object Identifier : 10.1109/CVPR.2010.5540191
AbstractPlus | Full Text: PDF (693KB) | Multimedia
A framework is presented for estimating the pose of a camera based on images extracted from a single omnidirectional image of an urban scene, given a 2D map with building outlines with no 3D geometric information nor appearance data. The framework attempts to identify vertical corner edges of buildings in the query image, which we term VCLH, as well as the neighboring plane normals, through vanishing point analysis. A bottom-up process further groups VCLH into elemental planes and subsequently i... Read More »
Posture invariant surface description and feature extraction
Wuhrer, S. Azouz, Z.B. Chang ShuPage(s): 374 - 381
Digital Object Identifier : 10.1109/CVPR.2010.5540188
AbstractPlus | Full Text: PDF (2093KB)
We propose a posture invariant surface descriptor for triangular meshes. Using intrinsic geometry, the surface is first transformed into a representation that is independent of the posture. Spin image is then adapted to derive a descriptor for the representation. The descriptor is used for extracting surface features automatically. It is invariant with respect to rigid and isometric deformations, and robust to noise and changes in resolution. The result is demonstrated by using the automatically... Read More »
Dense non-rigid surface registration using high-order graph matching
Yun Zeng Chaohui Wang Yang Wang Xianfeng Gu Samaras, D. Paragios, N.Page(s): 382 - 389
Digital Object Identifier : 10.1109/CVPR.2010.5540189
AbstractPlus | Full Text: PDF (4423KB)
Line matching leveraged by point correspondences
Bin Fan Fuchao Wu Zhanyi HuPage(s): 390 - 397
Digital Object Identifier : 10.1109/CVPR.2010.5540186
AbstractPlus | Full Text: PDF (1788KB)
A novel method for line matching is proposed. The basic idea is to use tentative point correspondences, which can be easily obtained by keypoint matching methods, to significantly improve line matching performance, even when the point correspondences are severely contaminated by outliers. When matching a pair of image lines, a group of corresponding points that may be coplanar with these lines in 3D space is firstly obtained from all corresponding image points in the local neighborhoods of these... Read More »
Detecting and parsing architecture at city scale from range data
Toshev, A. Mordohai, P. Taskar, B.Page(s): 398 - 405
Digital Object Identifier : 10.1109/CVPR.2010.5540187
AbstractPlus | Full Text: PDF (2740KB)
We present a method for detecting and parsing buildings from unorganized 3D point clouds into a compact, hierarchical representation that is useful for high-level tasks. The input is a set of range measurements that cover large-scale urban environment. The desired output is a set of parse trees, such that each tree represents a semantic decomposition of a building - the nodes are roof surfaces as well as volumetric parts inferred from the observable surfaces. We model the above problem using a s... Read More »
Dynamic and scalable large scale image reconstruction
Strecha, C. Pylvänäinen, T. Fua, P.Page(s): 406 - 413
Digital Object Identifier : 10.1109/CVPR.2010.5540184
AbstractPlus | Full Text: PDF (1948KB) | Multimedia
Recent approaches to reconstructing city-sized areas from large image collections usually process them all at once and only produce disconnected descriptions of image subsets, which typically correspond to major landmarks. In contrast, we propose a framework that lets us take advantage of the available meta-data to build a single, consistent description from these potentially disconnected descriptions. Furthermore, this description can be incrementally updated and enriched as new images become a... Read More »
Learning 3D shape from a single facial image via non-linear manifold embedding and alignment
Xianwang Wang Ruigang YangPage(s): 414 - 421
Digital Object Identifier : 10.1109/CVPR.2010.5540185
AbstractPlus | Full Text: PDF (600KB)
The 3D reconstruction of a face from a single frontal image is an ill-posed problem. This is further accentuated when the face image is captured under different poses and/or complex illumination conditions. In this paper, we aim to solve the shape recovery problem from a single facial image under these challenging conditions. The local image models for each patch of facial images and the local surface models for each patch of 3D shape are learned using a non-linear dimensionality reduction techn... Read More »
Adaptive pose priors for pictorial structures
Sapp, B. Jordan, C. Taskar, B.Page(s): 422 - 429
Digital Object Identifier : 10.1109/CVPR.2010.5540182
AbstractPlus | Full Text: PDF (2384KB) | Multimedia
Pictorial structure (PS) models are extensively used for part-based recognition of scenes, people, animals and multi-part objects. To achieve tractability, the structure and parameterization of the model is often restricted, for example, by assuming tree dependency structure and unimodal, data-independent pairwise interactions. These expressivity restrictions fail to capture important patterns in the data. On the other hand, local methods such as nearest-neighbor classification and kernel densit... Read More »
A game-theoretic approach to fine surface registration without initial motion estimation
Albarelli, A. Rodola, E. Torsello, A.Page(s): 430 - 437
Digital Object Identifier : 10.1109/CVPR.2010.5540183
AbstractPlus | Full Text: PDF (2507KB)
Surface registration is a fundamental step in the reconstruction of three-dimensional objects. This is typically a two step process where an initial coarse motion estimation is followed by a refinement. Most coarse registration algorithms exploit some local point descriptor that is intrinsic to the shape and does not depend on the relative position of the surfaces. By contrast, refinement techniques iteratively minimize a distance function measured between pairs of selected neighboring points an... Read More »
Global and local isometry-invariant descriptor for 3D shape comparison and partial matching
Huai-Yu Wu Hongbin Zha Tao Luo Xu-Lei Wang Songde MaPage(s): 438 - 445
Digital Object Identifier : 10.1109/CVPR.2010.5540180
AbstractPlus | Full Text: PDF (2405KB) | Multimedia
In this paper, based on manifold harmonics, we propose a novel framework for 3D shape similarity comparison and partial matching. First, we propose a novel symmetric mean-value representation to robustly construct high-quality manifold harmonic bases on nonuniform-sampling meshes. Then, based on the manifold harmonic bases constructed, a novel shape descriptor is presented to capture both of global and local features of 3D shape. This feature descriptor is isometry-invariant, i.e., invariant to ... Read More »
Point-based non-rigid surface registration with accuracy estimation
Hontani, H. Watanabe, W.Page(s): 446 - 452
Digital Object Identifier : 10.1109/CVPR.2010.5540181
AbstractPlus | Full Text: PDF (1760KB)
This article presents a new method for non-rigid surface registration between a surface model and a surface of an internal organ in a given 3D medical image. The surface is represented with a set of feature points, of which locations are represented by a graphical model. For constructing the representation, a set of corresponding points is distributed on each of training surfaces based on an entropy-based particle system. From these corresponding points, we estimate probability densities of the ... Read More »
3D Shape correspondence by isometry-driven greedy optimization
Sahillioglu, Y. Yemez, Y.Page(s): 453 - 458
Digital Object Identifier : 10.1109/CVPR.2010.5540178
AbstractPlus | Full Text: PDF (679KB)
We present an automatic method that establishes 3D correspondence between isometric shapes. Our goal is to find an optimal correspondence between two given (nearly) isometric shapes, that minimizes the amount of deviation from isometry. We cast the problem as a complete surface correspondence problem. Our method first divides the given shapes to be matched into surface patches of equal area and then seeks for a mapping between the patch centers which we refer to as base vertices. Hence the corre... Read More »
On growth and formlets: Sparse multi-scale coding of planar shape
Oleskiw, T.D. Elder, J.H. Peyré, G.Page(s): 459 - 466
Digital Object Identifier : 10.1109/CVPR.2010.5540179
AbstractPlus | Full Text: PDF (1969KB)
Growing semantically meaningful models for visual SLAM
Flint, A. Mei, C. Reid, I. Murray, D.Page(s): 467 - 474
Digital Object Identifier : 10.1109/CVPR.2010.5540176
AbstractPlus | Full Text: PDF (397KB) | Multimedia
Though modern Visual Simultaneous Localisation and Mapping (vSLAM) systems are capable of localising robustly and efficiently even in the case of a monocular camera, the maps produced are typically sparse point-clouds that are difficult to interpret and of little use for higher-level reasoning tasks such as scene understanding or human- machine interaction. In this paper we begin to address this deficiency, presenting progress on expanding the competency of visual SLAM systems to build richer ma... Read More »
Diffeomorphic sulcal shape analysis for cortical surface registration
Joshi, S.H. Cabeen, R.P. Joshi, A.A. Woods, R.P. Narr, K.L. Toga, A.W.Page(s): 475 - 482
Digital Object Identifier : 10.1109/CVPR.2010.5540177
AbstractPlus | Full Text: PDF (1496KB)
We present an intrinsic framework for constructing sulcal shape atlases on the human cortex. We propose the analysis of sulcal and gyral patterns by representing them by continuous open curves in R3. The space of such curves, also termed as the shape manifold is equipped with a Riemannian L2 metric on the tangent space, and shows desirable properties while matching shapes of sulci. On account of the spherical nature of the shape space, geodesics between shapes can be comput... Read More »
A theory of plenoptic multiplexing
Ihrke, I. Wetzstein, G. Heidrich, W.Page(s): 483 - 490
Digital Object Identifier : 10.1109/CVPR.2010.5540174
AbstractPlus | Full Text: PDF (2221KB) | Multimedia
Multiplexing is a common technique for encoding high-dimensional image data into a single, two-dimensional image. Examples of spatial multiplexing include Bayer patterns to capture color channels, and integral images to encode light fields. In the Fourier domain, optical heterodyning has been used to acquire light fields. In this paper, we develop a general theory of multiplexing the dimensions of the plenoptic function onto an image sensor. Our theory enables a principled comparison of plenopti... Read More »
Non-uniform deblurring for shaken images
Whyte, O. Sivic, J. Zisserman, A. Ponce, J.Page(s): 491 - 498
Digital Object Identifier : 10.1109/CVPR.2010.5540175
AbstractPlus | Full Text: PDF (3384KB)
Blur from camera shake is mostly due to the 3D rotation of the camera, resulting in a blur kernel that can be significantly non-uniform across the image. However, most current deblurring methods model the observed image as a convolution of a sharp image with a uniform blur kernel. We propose a new parametrized geometric model of the blurring process in terms of the rotational velocity of the camera during exposure. We apply this model to two different algorithms for camera shake removal: the fir... Read More »
Axial light field for curved mirrors: Reflect your perspective, widen your view
Taguchi, Y. Agrawal, A. Ramalingam, S. Veeraraghavan, A.Page(s): 499 - 506
Digital Object Identifier : 10.1109/CVPR.2010.5540172
AbstractPlus | Full Text: PDF (6011KB) | Multimedia
Mirrors have been used to enable wide field-of-view (FOV) catadioptric imaging. The mapping between the incoming and reflected light rays depends non-linearly on the mirror shape and has been well-studied using caustics. We analyze this mapping using two-plane light field parameterization, which provides valuable insight into the geometric structure of reflected rays. Using this analysis, we study the problem of generating a single-viewpoint virtual perspective image for catadioptric systems, wh... Read More »
Rectifying rolling shutter video from hand-held devices
Forssén, P. Ringaby, E.Page(s): 507 - 514
Digital Object Identifier : 10.1109/CVPR.2010.5540173
AbstractPlus | Full Text: PDF (1804KB) | Multimedia
This paper presents a method for rectifying video sequences from rolling shutter (RS) cameras. In contrast to previous RS rectification attempts we model distortions as being caused by the 3D motion of the camera. The camera motion is parametrised as a continuous curve, with knots at the last row of each frame. Curve parameters are solved for using non-linear least squares over inter-frame correspondences obtained from a KLT tracker. We have generated synthetic RS sequences with associated groun... Read More »
Correcting over-exposure in photographs
Dong Guo Yuan Cheng Shaojie Zhuo Sim, T.Page(s): 515 - 521
Digital Object Identifier : 10.1109/CVPR.2010.5540170
AbstractPlus | Full Text: PDF (3906KB)
This paper introduces a method to correct over-exposure in an existing photograph by recovering the color and lightness separately. First, the dynamic range of well exposed region is slightly compressed to make room for the recovered lightness of the over-exposed region. Then the lightness is recovered based on an over-exposure likelihood. The color of each pixel is corrected via neighborhood propagation and also based on the confidence of the original color. Previous methods make use of ratios ... Read More »
Denoising vs. deblurring: HDR imaging techniques using moving cameras
Li Zhang Deshpande, A. Xin ChenPage(s): 522 - 529
Digital Object Identifier : 10.1109/CVPR.2010.5540171
AbstractPlus | Full Text: PDF (7713KB)
New cameras such as the Canon EOS 7D and Pointgrey Grasshopper have 14-bit sensors. We present a theoretical analysis and a practical approach that exploit these new cameras with high-resolution quantization for reliable HDR imaging from a moving camera. Specifically, we propose a unified probabilistic formulation that allows us to analytically compare two HDR imaging alternatives: (1) deblurring a single blurry but clean image and (2) denoising a sequence of sharp but noisy images. By analyzing... Read More »
Gradient-directed composition of multi-exposure images
Wei Zhang Wai-Kuen ChamPage(s): 530 - 536
Digital Object Identifier : 10.1109/CVPR.2010.5540168
AbstractPlus | Full Text: PDF (7344KB)
In this paper, we present a simple yet effective method that takes advantage of the gradient information to accomplish the multi-exposure image composition in both static and dynamic scenes. Given multiple images with different exposures, the proposed approach is capable of producing a pleasant tone mapped-like high dynamic range (HDR) image by compositing them seamlessly with the guidance of gradient-based quality assessment. Especially, two novel quality measures: visibility and consistency, a... Read More »
Warp propagation for video resizing
Yuzhen Niu Feng Liu Xueqing Li Gleicher, M.Page(s): 537 - 544
Digital Object Identifier : 10.1109/CVPR.2010.5540169
AbstractPlus | Full Text: PDF (3748KB)
Sensor saturation in Fourier multiplexed imaging
Wetzstein, G. Ihrke, I. Heidrich, W.Page(s): 545 - 552
Digital Object Identifier : 10.1109/CVPR.2010.5540166
AbstractPlus | Full Text: PDF (4930KB) | Multimedia
Optically multiplexed image acquisition techniques have become increasingly popular for encoding different exposures, color channels, light fields, and other properties of light onto two-dimensional image sensors. Recently, Fourier-based multiplexing and reconstruction approaches have been introduced in order to achieve a superior light transmission of the employed modulators and better signal-to-noise characteristics of the reconstructed data. We show in this paper that Fourier-based reconstruc... Read More »
Noise-optimal capture for high dynamic range photography
Hasinoff, S.W. Durand, F. Freeman, W.T.Page(s): 553 - 560
Digital Object Identifier : 10.1109/CVPR.2010.5540167
AbstractPlus | Full Text: PDF (1294KB) | Multimedia
Taking multiple exposures is a well-established approach both for capturing high dynamic range (HDR) scenes and for noise reduction. But what is the optimal set of photos to capture? The typical approach to HDR capture uses a set of photos with geometrically-spaced exposure times, at a fixed ISO setting (typically ISO 100 or 200). By contrast, we show that the capture sequence with optimal worst-case performance, in general, uses much higher and variable ISO settings, and spends longer capturing... Read More »
Using optical defocus to denoise
Qi Shan Jiaya Jia Sing Bing Kang Zenglu QinPage(s): 561 - 568
Digital Object Identifier : 10.1109/CVPR.2010.5540164
AbstractPlus | Full Text: PDF (8341KB)
Effective reduction of noise is generally difficult because of the possible tight coupling of noise with high-frequency image structure. The problem is worse under low-light conditions. In this paper, we propose slightly optically defocusing the image in order to loosen this noise-image structure coupling. This allows us to more effectively reduce noise and subsequently restore the small defocus. We analytically show how this is possible, and demonstrate our technique on a number of examples tha... Read More »
Discontinuous seam-carving for video retargeting
Grundmann, M. Kwatra, V. Mei Han Essa, I.Page(s): 569 - 576
Digital Object Identifier : 10.1109/CVPR.2010.5540165
AbstractPlus | Full Text: PDF (8917KB) | Multimedia
We introduce a new algorithm for video retargeting that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. This formulation optimizes the difference in appearance of the resultant retargeted frame to the optimal temporally coherent one, and allows f... Read More »
Hybrid shift map for video retargeting
Yiqun Hu Rajan, D.Page(s): 577 - 584
Digital Object Identifier : 10.1109/CVPR.2010.5540162
AbstractPlus | Full Text: PDF (1620KB)
We propose a new method for video retargeting, which can generate spatial-temporal consistent video. The new measure called spatial-temporal naturality preserves the motion in the source video without any motion analysis in contrast to other methods that need motion estimation. This advantage prevents the retargeted video from degenerating due to the propagation of the errors in motion analysis. It allows the proposed method to be applied on challenging videos with complex camera and object moti... Read More »
Geo-location estimation from two shadow trajectories
Lin Wu Xiaochun CaoPage(s): 585 - 590
Digital Object Identifier : 10.1109/CVPR.2010.5540163
AbstractPlus | Full Text: PDF (448KB)
The position of a world point's solar shadow depends on its geographical location, the geometrical relationship between the orientation of the sunshine and the ground plane where the shadow casts. This paper investigates the property of solar shadow trajectories on a planar surface and shows that camera parameters, latitude, longitude can be estimated from two observed shadow trajectories. Our contribution is that we use the design of the analemmatic sundial to get the shadow conic and furthermo... Read More »
Estimating satellite attitude from pushbroom sensors
Perrier, R. Arnaud, E. Sturm, P. Ortner, M.Page(s): 591 - 598
Digital Object Identifier : 10.1109/CVPR.2010.5540160
AbstractPlus | Full Text: PDF (1177KB)
Linear pushbroom cameras are widely used in passive remote sensing from space as they provide high resolution images. In earth observation applications, where several pushbroom sensors are mounted in a single focal plane, small dynamic disturbances of the satellite's orientation lead to noticeable geometrical distortions in the images. In this paper, we present a global method to estimate those disturbances, which are effectively vibrations. We exploit the geometry of the focal plane and the sta... Read More »
Optimal coded sampling for temporal super-resolution
Agrawal, A. Gupta, M. Veeraraghavan, A. Narasimhan, S.G.Page(s): 599 - 606
Digital Object Identifier : 10.1109/CVPR.2010.5540161
AbstractPlus | Full Text: PDF (6016KB)
Conventional low frame rate cameras result in blur and/or aliasing in images while capturing fast dynamic events. Multiple low speed cameras have been used previously with staggered sampling to increase the temporal resolution. However, previous approaches are inefficient: they either use small integration time for each camera which does not provide light benefit, or use large integration time in a way that requires solving a big ill-posed linear system. We propose coded sampling that address th... Read More »
Efficient filter flow for space-variant multiframe blind deconvolution
Hirsch, M. Sra, S. Scholkopf, B. Harmeling, S.Page(s): 607 - 614
Digital Object Identifier : 10.1109/CVPR.2010.5540158
AbstractPlus | Full Text: PDF (1334KB)
Ultimately being motivated by facilitating space-variant blind deconvolution, we present a class of linear transformations, that are expressive enough for space-variant filters, but at the same time especially designed for efficient matrix-vector-multiplications. Successful results on astronomical imaging through atmospheric turbulences and on noisy magnetic resonance images of constantly moving objects demonstrate the practical significance of our approach. Read More »
Regenerative morphing
Shechtman, E. Rav-Acha, A. Irani, M. Seitz, S.Page(s): 615 - 622
Digital Object Identifier : 10.1109/CVPR.2010.5540159
AbstractPlus | Full Text: PDF (5143KB) | Multimedia
Monocular 3D pose estimation and tracking by detection
Andriluka, M. Roth, S. Schiele, B.Page(s): 623 - 630
Digital Object Identifier : 10.1109/CVPR.2010.5540156
AbstractPlus | Full Text: PDF (1086KB)
Automatic recovery of 3D human pose from monocular image sequences is a challenging and important research topic with numerous applications. Although current methods are able to recover 3D pose for a single person in controlled environments, they are severely challenged by real-world scenarios, such as crowded street scenes. To address this problem, we propose a three-stage process building on a number of recent advances. The first stage obtains an initial estimate of the 2D articulation and vie... Read More »
Dynamical binary latent variable models for 3D human pose tracking
Taylor, G.W. Sigal, L. Fleet, D.J. Hinton, G.E.Page(s): 631 - 638
Digital Object Identifier : 10.1109/CVPR.2010.5540157
AbstractPlus | Full Text: PDF (1993KB) | Multimedia
We introduce a new class of probabilistic latent variable model called the Implicit Mixture of Conditional Restricted Boltzmann Machines (imCRBM) for use in human pose tracking. Key properties of the imCRBM are as follows: (1) learning is linear in the number of training exemplars so it can be learned from large datasets; (2) it learns coherent models of multiple activities; (3) it automatically discovers atomic “movemes” and (4) it can infer transitions between activities, even wh... Read More »
Contour people: A parameterized model of 2D articulated human shape
Freifeld, O. Weiss, A. Zuffi, S. Black, M.J.Page(s): 639 - 646
Digital Object Identifier : 10.1109/CVPR.2010.5540154
AbstractPlus | Full Text: PDF (3761KB)
We define a new “contour person” model of the human body that has the expressive power of a detailed 3D model and the computational benefits of a simple 2D part-based model. The contour person (CP) model is learned from a 3D SCAPE model of the human body that captures natural shape and pose variations; the projected contours of this model, along with their segmentation into parts forms the training set. The CP model factors deformations of the body into three components: shape vari... Read More »
Combining discriminative and generative methods for 3D deformable surface and articulated pose reconstruction
Salzmann, M. Urtasun, R.Page(s): 647 - 654
Digital Object Identifier : 10.1109/CVPR.2010.5540155
AbstractPlus | Full Text: PDF (1242KB) | Multimedia
Historically non-rigid shape recovery and articulated pose estimation have evolved as separate fields. Recent methods for non-rigid shape recovery have focused on improving the algorithmic formulation, but have only considered the case of reconstruction from point-to-point correspondences. In contrast, many techniques for pose estimation have followed a discriminative approach, which allows for the use of more general image cues. However, these techniques typically require large training sets an... Read More »
Efficient extraction of human motion volumes by tracking
Niebles, J.C. Bohyung Han Li Fei-FeiPage(s): 655 - 662
Digital Object Identifier : 10.1109/CVPR.2010.5540152
AbstractPlus | Full Text: PDF (8998KB)
We present an automatic and efficient method to extract spatio-temporal human volumes from video, which combines top-down model-based and bottom-up appearance-based approaches. From the top-down perspective, our algorithm applies shape priors probabilistically to candidate image regions obtained by pedestrian detection, and provides accurate estimates of the human body areas which serve as important constraints for bottom-up processing. Temporal propagation of the identified region is performed ... Read More »
Multisensor-fusion for 3D full-body human motion capture
Pons-Moll, G. Baak, A. Helten, T. Müller, M. Seidel, H.-P. Rosenhahn, B.Page(s): 663 - 670
Digital Object Identifier : 10.1109/CVPR.2010.5540153
AbstractPlus | Full Text: PDF (850KB) | Multimedia
In this work, we present an approach to fuse video with orientation data obtained from extended inertial sensors to improve and stabilize full-body human motion capture. Even though video data is a strong cue for motion analysis, tracking artifacts occur frequently due to ambiguities in the images, rapid motions, occlusions or noise. As a complementary data source, inertial sensors allow for drift-free estimation of limb orientations even under fast motions. However, accurate position informatio... Read More »
An object-dependent hand pose prior from sparse training data
Hamer, H. Gall, J. Weise, T. Van Gool, L.Page(s): 671 - 678
Digital Object Identifier : 10.1109/CVPR.2010.5540150
AbstractPlus | Full Text: PDF (4890KB) | Multimedia
In this paper, we propose a prior for hand pose estimation that integrates the direct relation between a manipulating hand and a 3d object. This is of particular interest for a variety of applications since many tasks performed by humans require hand-object interaction. Inspired by the ability of humans to learn the handling of an object from a single example, our focus lies on very sparse training data. We express estimated hand poses in local object coordinates and extract for each individual ... Read More »
Vehicle detection and tracking in wide field-of-view aerial video
Jiangjian Xiao Hui Cheng Sawhney, H. Feng HanPage(s): 679 - 684
Digital Object Identifier : 10.1109/CVPR.2010.5540151
AbstractPlus | Full Text: PDF (704KB)
This paper presents a joint probabilistic relation graph approach to simultaneously detect and track a large number of vehicles in low frame rate aerial videos. Due to low frame rate, low spatial resolution and sheer number of moving objects, detection and tracking in wide area video poses unique challenges. In this paper, we explore vehicle behavior model from road structure and generate a set of constraints to regulate both object based vertex matching and pairwise edge matching schemes. The p... Read More »
Multi-target tracking by on-line learned discriminative appearance models
Cheng-Hao Kuo Chang Huang Nevatia, R.Page(s): 685 - 692
Digital Object Identifier : 10.1109/CVPR.2010.5540148
AbstractPlus | Full Text: PDF (1817KB)
We present an approach for online learning of discriminative appearance models for robust multi-target tracking in a crowded scene from a single camera. Although much progress has been made in developing methods for optimal data association, there has been comparatively less work on the appearance models, which are key elements for good performance. Many previous methods either use simple features such as color histograms, or focus on the discriminability between a target and the background whic... Read More »
Tracking with local spatio-temporal motion patterns in extremely crowded scenes
Kratz, L. Nishino, K.Page(s): 693 - 700
Digital Object Identifier : 10.1109/CVPR.2010.5540149
AbstractPlus | Full Text: PDF (4961KB) | Multimedia
AAM based face tracking with temporal matching and face segmentation
Mingcai Zhou Lin Liang Jian Sun Yangsheng WangPage(s): 701 - 708
Digital Object Identifier : 10.1109/CVPR.2010.5540146
AbstractPlus | Full Text: PDF (7648KB)
Active Appearance Model (AAM) based face tracking has advantages of accurate alignment, high efficiency, and effectiveness for handling face deformation. However, AAM suffers from the generalization problem and has difficulties in images with cluttered backgrounds. In this paper, we introduce two novel constraints into AAM fitting to address the above problems. We first introduce a temporal matching constraint in AAM fitting. In the proposed fitting scheme, the temporal matching enforces an inte... Read More »
Human identity recognition in aerial images
Oreifej, O. Mehran, R. Shah, M.Page(s): 709 - 716
Digital Object Identifier : 10.1109/CVPR.2010.5540147
AbstractPlus | Full Text: PDF (3355KB)
Human identity recognition is an important yet under-addressed problem. Previous methods were strictly limited to high quality photographs, where the principal techniques heavily rely on body details such as face detection. In this paper, we propose an algorithm to address the novel problem of human identity recognition over a set of unordered low quality aerial images. Assuming a user was able to manually locate a target in some images of the set, we find the target in each other query image by... Read More »
Silhouette transformation based on walking speed for gait identification
Tsuji, A. Makihara, Y. Yagi, Y.Page(s): 717 - 722
Digital Object Identifier : 10.1109/CVPR.2010.5540144
AbstractPlus | Full Text: PDF (602KB)
We propose a method of gait silhouette transformation from one speed to another to cope with walking speed changes in gait identification. When a person changes his/her walking speed, dynamic features (e.g. stride and joint angle) are changed while static features (e.g. thigh and shin lengths) are unchanged. Based on the fact, firstly, static and dynamic features are separated from gait silhouettes by fitting a human model. Secondly, a factorization-based speed transformation model for the dynam... Read More »
PROST: Parallel robust online simple tracking
Santner, J. Leistner, C. Saffari, A. Pock, T. Bischof, H.Page(s): 723 - 730
Digital Object Identifier : 10.1109/CVPR.2010.5540145
AbstractPlus | Full Text: PDF (2942KB) | Multimedia
Tracking-by-detection is increasingly popular in order to tackle the visual tracking problem. Existing adaptive methods suffer from the drifting problem, since they rely on self-updates of an on-line learning method. In contrast to previous work that tackled this problem by employing semi-supervised or multiple-instance learning, we show that augmenting an on-line learning method with complementary tracking approaches can lead to more stable results. In particular, we use a simple template model... Read More »
Player localization using multiple static cameras for sports visualization
Hamid, R. Kumar, R.K. Grundmann, M. Kihwan Kim Essa, I. Hodgins, J.Page(s): 731 - 738
Digital Object Identifier : 10.1109/CVPR.2010.5540142
AbstractPlus | Full Text: PDF (5302KB)
We present a novel approach for robust localization of multiple people observed using multiple cameras. We use this location information to generate sports visualizations, which include displaying a virtual offside line in soccer games, and showing players' positions and motion patterns. Our main contribution is the modeling and analysis for the problem of fusing corresponding players' positional information as finding minimum weight K-length cycles in complete K-partite graphs. To this end, we ... Read More »
An online approach: Learning-Semantic-Scene-by-Tracking and Tracking-by-Learning-Semantic-Scene
Xuan Song Xiaowei Shao Huijing Zhao Jinshi Cui Shibasaki, R. Hongbin ZhaPage(s): 739 - 746
Digital Object Identifier : 10.1109/CVPR.2010.5540143
AbstractPlus | Full Text: PDF (4302KB) | Multimedia
Learning the knowledge of scene structure and tracking a large number of targets are both active topics of computer vision in recent years, which plays a crucial role in surveillance, activity analysis, object classification and etc. In this paper, we propose a novel system which simultaneously performs the Learning-Semantic-Scene and Tracking, and makes them supplement each other in one framework. The trajectories obtained by the tracking are utilized to continually learn and update the scene k... Read More »
Tracking people interacting with objects
Kjellström, H. Kragić, D. Black, M.J.Page(s): 747 - 754
Digital Object Identifier : 10.1109/CVPR.2010.5540140
AbstractPlus | Full Text: PDF (837KB)
While the problem of tracking 3D human motion has been widely studied, most approaches have assumed that the person is isolated and not interacting with the environment. Environmental constraints, however, can greatly constrain and simplify the tracking problem. The most studied constraints involve gravity and contact with the ground plane. We go further to consider interaction with objects in the environment. In many cases, tracking rigid environmental objects is simpler than tracking high-dime... Read More »
Real time motion capture using a single time-of-flight camera
Ganapathi, V. Plagemann, C. Koller, D. Thrun, S.Page(s): 755 - 762
Digital Object Identifier : 10.1109/CVPR.2010.5540141
AbstractPlus | Full Text: PDF (674KB)
Markerless tracking of human pose is a hard yet relevant problem. In this paper, we derive an efficient filtering algorithm for tracking human pose using a stream of monocular depth images. The key idea is to combine an accurate generative model - which is achievable in this setting using programmable graphics hardware - with a discriminative model that provides data-driven evidence about body part locations. In each filter iteration, we apply a form of local model-based search that exploits the... Read More »
RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images
Yigang Peng Ganesh, A. Wright, J. Wenli Xu Yi MaPage(s): 763 - 770
Digital Object Identifier : 10.1109/CVPR.2010.5540138
AbstractPlus | Full Text: PDF (8102KB)
This paper studies the problem of simultaneously aligning a batch of linearly correlated images despite gross corruption (such as occlusion). Our method seeks an optimal set of image domain transformations such that the matrix of transformed images can be decomposed as the sum of a sparse matrix of errors and a low-rank matrix of recovered aligned images. We reduce this extremely challenging optimization problem to a sequence of convex programs that minimize the sum of ℓ1-norm ... Read More »
Efficient computation of robust low-rank matrix approximations in the presence of missing data using the L1 norm
Eriksson, A. van den Hengel, A. Page(s): 771 - 778
Digital Object Identifier : 10.1109/CVPR.2010.5540139
AbstractPlus | Full Text: PDF (408KB)
On the design of robust classifiers for computer vision
Masnadi-Shirazi, H. Mahadevan, V. Vasconcelos, N.Page(s): 779 - 786
Digital Object Identifier : 10.1109/CVPR.2010.5540136
AbstractPlus | Full Text: PDF (361KB)
The design of robust classifiers, which can contend with the noisy and outlier ridden datasets typical of computer vision, is studied. It is argued that such robustness requires loss functions that penalize both large positive and negative margins. The probability elicitation view of classifier design is adopted, and a set of necessary conditions for the design of such losses is identified. These conditions are used to derive a novel robust Bayes-consistent loss, denoted Tangent loss, and an ass... Read More »
Online-batch strongly convex Multi Kernel Learning
Orabona, F. Luo Jie Caputo, B.Page(s): 787 - 794
Digital Object Identifier : 10.1109/CVPR.2010.5540137
AbstractPlus | Full Text: PDF (276KB)
Several object categorization algorithms use kernel methods over multiple cues, as they offer a principled approach to combine multiple cues, and to obtain state-of-the-art performance. A general drawback of these strategies is the high computational cost during training, that prevents their application to large-scale problems. They also do not provide theoretical guarantees on their convergence rate. Here we present a Multiclass Multi Kernel Learning (MKL) algorithm that obtains state-of-the-ar... Read More »
Efficient joint 2D and 3D palmprint matching with alignment refinement
Wei Li Lei Zhang Zhang, D. Guangming Lu Jingqi YanPage(s): 795 - 801
Digital Object Identifier : 10.1109/CVPR.2010.5540134
AbstractPlus | Full Text: PDF (741KB)
Palmprint verification is a relatively new but promising personal authentication technique for its high accuracy and fast matching speed. Two dimensional (2D) palmprint recognition has been well studied in the past decade, and recently three dimensional (3D) palmprint recognition techniques were also proposed. The 2D and 3D palmprint data can be captured simultaneously and they provide different and complementary information. 3D palmprint contains the depth information of the palm surface, while... Read More »
Harvesting large-scale weakly-tagged image databases from the web
Jianping Fan Yi Shen Ning Zhou Yuli GaoPage(s): 802 - 809
Digital Object Identifier : 10.1109/CVPR.2010.5540135
AbstractPlus | Full Text: PDF (1580KB)
To leverage large-scale weakly-tagged images for computer vision tasks (such as object detection and scene recognition), a novel cross-modal tag cleansing and junk image filtering algorithm is developed for cleansing the weakly-tagged images and their social tags (i.e., removing irrelevant images and finding the most relevant tags for each image) by integrating both the visual similarity contexts between the images and the semantic similarity contexts between their tags. Our algorithm can addres... Read More »
Visual recognition using mappings that replicate margins
Wolf, L. Manor, N.Page(s): 810 - 816
Digital Object Identifier : 10.1109/CVPR.2010.5540132
AbstractPlus | Full Text: PDF (905KB)
We consider the problem of learning to map between two vector spaces given pairs of matching vectors, one from each space. This problem naturally arises in numerous vision problems, for example, when mapping between the images of two cameras, or when the annotations of each image is multidimensional. We focus on the common asymmetric case, where one vector space X is more informative than the other Y, and find a transformation from Y to X. We present a new optimization problem that aims to repli... Read More »
An eye for an eye: A single camera gaze-replacement method
Wolf, L. Freund, Z. Avidan, S.Page(s): 817 - 824
Digital Object Identifier : 10.1109/CVPR.2010.5540133
AbstractPlus | Full Text: PDF (8431KB)
The camera in video conference systems is typically positioned above, or below, the screen, causing the gaze of the users to appear misplaced. We propose an effective solution to this problem that is based on replacing the eyes of the user. This replacement, when done accurately, is enough to achieve a natural looking video. At an initialization stage the user is asked to look straight at the camera. We store these frames, then track the eyes accurately in the video sequence and replace the eyes... Read More »
Ink-bleed reduction using functional minimization
Hanasusanto, G.A. Zheng Wu Brown, M.S.Page(s): 825 - 832
Digital Object Identifier : 10.1109/CVPR.2010.5540130
AbstractPlus | Full Text: PDF (6786KB)
Ink-bleed interference is undesirable as it reduces the legibility and aesthetics of affected documents. We present a novel approach to reduce ink-bleed interference using functional minimization. In particular, we show how to modify the Chan-Vese active contour model to incorporate information from the front and back sides of the ink-bleed document. This contour model is particularly useful as it does not require edge extraction or explicit thresholding of the document. In addition, we show how... Read More »
Action classification on product manifolds
Yui Man Lui Beveridge, J.R. Kirby, M.Page(s): 833 - 839
Digital Object Identifier : 10.1109/CVPR.2010.5540131
AbstractPlus | Full Text: PDF (794KB)
Videos can be naturally represented as multidimensional arrays known as tensors. However, the geometry of the tensor space is often ignored. In this paper, we argue that the underlying geometry of the tensor space is an important property for action classification. We characterize a tensor as a point on a product manifold and perform classification on this space. First, we factorize a tensor relating to each order using a modified High Order Singular Value Decomposition (HOSVD). We recognize eac... Read More »
Motion fields to predict play evolution in dynamic sport scenes
Kihwan Kim Grundmann, M. Shamir, A. Matthews, I. Hodgins, J. Essa, I.Page(s): 840 - 847
Digital Object Identifier : 10.1109/CVPR.2010.5540128
AbstractPlus | Full Text: PDF (9548KB) | Multimedia
Videos of multi-player team sports provide a challenging domain for dynamic scene analysis. Player actions and interactions are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that constrained multi-agent events can be analyzed and even predicted from video. Such analysis requires estimating the global movements of all players in the scene at any time... Read More »
SPEC hashing: Similarity preserving algorithm for entropy-based coding
Ruei-Sung Lin Ross, D.A. Yagnik, J.Page(s): 848 - 854
Digital Object Identifier : 10.1109/CVPR.2010.5540129
AbstractPlus | Full Text: PDF (183KB)
Shape-based similarity retrieval of Doppler images for clinical decision support
Syeda-Mahmood, T. Turaga, P. Beymer, D. Wang, F. Amir, A. Greenspan, H. Pohl, K.Page(s): 855 - 862
Digital Object Identifier : 10.1109/CVPR.2010.5540126
AbstractPlus | Full Text: PDF (363KB)
Flow Doppler imaging has become an integral part of an echocardiographic exam. Automated interpretation of flow doppler imaging has so far been restricted to obtaining hemodynamic information from velocity-time profiles depicted in these images. In this paper we exploit the shape patterns in Doppler images to infer the similarity in valvular disease labels for purposes of automated clinical decision support. Specifically, we model the similarity in appearance of Doppler images from the same dise... Read More »
Real-time vehicle global localisation with a single camera in dense urban areas: Exploitation of coarse 3D city models
Lothe, P. Bourgeois, S. Royer, E. Dhome, M. Naudet-Collette, S.Page(s): 863 - 870
Digital Object Identifier : 10.1109/CVPR.2010.5540127
AbstractPlus | Full Text: PDF (4575KB)
In this system paper, we propose a real-time car localisation process in dense urban areas by using a single perspective camera and a priori on the environment. To tackle this problem, it is necessary to solve two well-known monocular SLAM limitations: scale factor drift and error accumulation. The proposed idea is to combine a monocular SLAM process based on bundle adjustment with simple knowledge, i.e. the position and orientation of the camera with regard to the road and a coarse 3D model of ... Read More »
Taxonomic classification for web-based videos
Yang Song Ming Zhao Yagnik, J. Xiaoyun WuPage(s): 871 - 878
Digital Object Identifier : 10.1109/CVPR.2010.5540124
AbstractPlus | Full Text: PDF (328KB)
Categorizing web-based videos is an important yet challenging task. The difficulties arise from large data diversity within a category, lack of labeled data, and degradation of video quality. This paper presents a large scale video taxonomic classification scheme (with more than 1000 categories) tackling these issues. Taxonomic structure of categories is deployed in classifier training. To compensate for the lack of labeled video data, a novel method is proposed to adapt the web-text documents t... Read More »
YouTubeCat: Learning to categorize wild web videos
Zheshen Wang Ming Zhao Yang Song Kumar, S. Baoxin LiPage(s): 879 - 886
Digital Object Identifier : 10.1109/CVPR.2010.5540125
AbstractPlus | Full Text: PDF (1307KB)
Automatic categorization of videos in a Web-scale unconstrained collection such as YouTube is a challenging task. A key issue is how to build an effective training set in the presence of missing, sparse or noisy labels. We propose to achieve this by first manually creating a small labeled set and then extending it using additional sources such as related videos, searched videos, and text-based webpages. The data from such disparate sources has different properties and labeling quality, and thus ... Read More »
Covering trees and lower-bounds on quadratic assignment
Yarkony, J. Fowlkes, C. Ihler, A.Page(s): 887 - 894
Digital Object Identifier : 10.1109/CVPR.2010.5540122
AbstractPlus | Full Text: PDF (390KB) | Multimedia
Many computer vision problems involving feature correspondence among images can be formulated as an assignment problem with a quadratic cost function. Such problems are computationally infeasible in general but recent advances in discrete optimization such as tree-reweighted belief propagation (TRW) often provide high-quality solutions. In this paper, we improve upon these algorithms in two ways. First, we introduce covering trees, a variant of TRW which provide the same bounds on the MAP energy... Read More »
Efficient piecewise learning for conditional random fields
Alahari, K. Russell, C. Torr, P.H.S.Page(s): 895 - 901
Digital Object Identifier : 10.1109/CVPR.2010.5540123
AbstractPlus | Full Text: PDF (285KB)
Conditional Random Field models have proved effective for several low-level computer vision problems. Inference in these models involves solving a combinatorial optimization problem, with methods such as graph cuts, belief propagation. Although several methods have been proposed to learn the model parameters from training data, they suffer from various drawbacks. Learning these parameters involves computing the partition function, which is intractable. To overcome this, state-of-the-art structur... Read More »
Multimodal semi-supervised learning for image classification
Guillaumin, M. Verbeek, J. Schmid, C.Page(s): 902 - 909
Digital Object Identifier : 10.1109/CVPR.2010.5540120
AbstractPlus | Full Text: PDF (615KB)
In image categorization the goal is to decide if an image belongs to a certain category or not. A binary classifier can be learned from manually labeled images; while using more labeled examples improves performance, obtaining the image labels is a time consuming process. We are interested in how other sources of information can aid the learning process given a fixed amount of labeled images. In particular, we consider a scenario where keywords are associated with the training images, e.g. as fo... Read More »
What helps where – and why? Semantic relatedness for knowledge transfer
Rohrbach, M. Stark, M. Szarvas, G. Gurevych, I. Schiele, B.Page(s): 910 - 917
Digital Object Identifier : 10.1109/CVPR.2010.5540121
AbstractPlus | Full Text: PDF (1008KB)
Remarkable performance has been reported to recognize single object classes. Scalability to large numbers of classes however remains an important challenge for today's recognition methods. Several authors have promoted knowledge transfer between classes as a key ingredient to address this challenge. However, in previous work the decision which knowledge to transfer has required either manual supervision or at least a few training examples limiting the scalability of these approaches. In this wor... Read More »
Towards semantic embedding in visual vocabulary
Rongrong Ji Hongxun Yao Xiaoshuai Sun Bineng Zhong Wen GaoPage(s): 918 - 925
Digital Object Identifier : 10.1109/CVPR.2010.5540118
AbstractPlus | Full Text: PDF (2855KB)
Visual vocabulary serves as a fundamental component in many computer vision tasks, such as object recognition, visual search, and scene modeling. While state-of-the-art approaches build visual vocabulary based solely on visual statistics of local image patches, the correlative image labels are left unexploited in generating visual words. In this work, we present a semantic embedding framework to integrate semantic information from Flickr labels for supervised vocabulary construction. Our main co... Read More »
Exploiting Monge structures in optimum subwindow search
Senjian An Peursum, P. Wanquan Liu Venkatesh, S. Xiaoming ChenPage(s): 926 - 933
Digital Object Identifier : 10.1109/CVPR.2010.5540119
AbstractPlus | Full Text: PDF (158KB)
Unified Real-Time Tracking and Recognition with Rotation-Invariant Fast Features
Takacs, G. Chandrasekhar, V. Tsai, S. Chen, D. Grzeszczuk, R. Girod, B.Page(s): 934 - 941
Digital Object Identifier : 10.1109/CVPR.2010.5540116
AbstractPlus | Full Text: PDF (1464KB)
We present a method that unifies tracking and video content recognition with applications to Mobile Augmented Reality (MAR). We introduce the Radial Gradient Transform (RGT) and an approximate RGT, yielding the Rotation-Invariant, Fast Feature (RIFF) descriptor. We demonstrate that RIFF is fast enough for real-time tracking, while robust enough for large scale retrieval tasks. At 26× the speed, our tracking-scheme obtains a more accurate global affine motion-model than the Kanade Lucas To... Read More »
Fast polygonal integration and its application in extending haar-like features to improve object detection
Minh-Tri Pham Yang Gao Hoang, V.D. Tat-Jen ChamPage(s): 942 - 949
Digital Object Identifier : 10.1109/CVPR.2010.5540117
AbstractPlus | Full Text: PDF (602KB) | Multimedia
The integral image is typically used for fast integrating a function over a rectangular region in an image. We propose a method that extends the integral image to do fast integration over the interior of any polygon that is not necessarily rectilinear. The integration time of the method is fast, independent of the image resolution, and only linear to the polygon's number of vertices. We apply the method to Viola and Jones' object detection framework, in which we propose to improve classical Haar... Read More »
Object detection via boundary structure segmentation
Toshev, A. Taskar, B. Daniilidis, K.Page(s): 950 - 957
Digital Object Identifier : 10.1109/CVPR.2010.5540114
AbstractPlus | Full Text: PDF (5236KB)
We address the problem of object detection and segmentation using holistic properties of object shape. Global shape representations are highly susceptible to clutter inevitably present in realistic images, and can be robustly recognized only using a precise segmentation of the object. To this end, we propose a figure/ground segmentation method for extraction of image regions that resemble the global properties of a model boundary structure and are perceptually salient. Our shape representation, ... Read More »
Implicit hierarchical boosting for multi-view object detection
Perrotton, X. Sturzel, M. Roux, M.Page(s): 958 - 965
Digital Object Identifier : 10.1109/CVPR.2010.5540115
AbstractPlus | Full Text: PDF (1020KB)
Multi-view object detection is a fundamental problem in computer vision. Current approaches generally require an explicit partition between different views with or without sharing descriptors. We present a novel boosting based learning approach which automatically learns a multi-view detector without using intra-class sub-categorization based on prior knowledge. To avoid multiplying the false alarm rate by the number of classifiers, which happens on the classical approach where one classifier pe... Read More »
Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
Socher, R. Li Fei-FeiPage(s): 966 - 973
Digital Object Identifier : 10.1109/CVPR.2010.5540112
AbstractPlus | Full Text: PDF (2838KB)
We propose a semi-supervised model which segments and annotates images using very few labeled images and a large unaligned text corpus to relate image regions to text labels. Given photos of a sports event, all that is necessary to provide a pixel-level labeling of objects and background is a set of newspaper articles about this sport and one to five labeled images. Our model is motivated by the observation that words in text corpora share certain context and feature similarities with visual obj... Read More »
Support vector regression for multi-view gait recognition based on local motion feature selection
Kusakunniran, W. Qiang Wu Jian Zhang Hongdong LiPage(s): 974 - 981
Digital Object Identifier : 10.1109/CVPR.2010.5540113
AbstractPlus | Full Text: PDF (491KB)
Gait is a well recognized biometric feature that is used to identify a human at a distance. However, in real environment, appearance changes of individuals due to viewing angle changes cause many difficulties for gait recognition. This paper re-formulates this problem as a regression problem. A novel solution is proposed to create a View Transformation Model (VTM) from the different point of view using Support Vector Regression (SVR). To facilitate the process of regression, a new method is prop... Read More »
Integrated pedestrian classification and orientation estimation
Enzweiler, M. Gavrila, D.M.Page(s): 982 - 989
Digital Object Identifier : 10.1109/CVPR.2010.5540110
AbstractPlus | Full Text: PDF (482KB)
This paper presents a novel approach to single-frame pedestrian classification and orientation estimation. Unlike previous work which addressed classification and orientation separately with different models, our method involves a probabilistic framework to approach both in a unified fashion. We address both problems in terms of a set of view-related models which couple discriminative expert classifiers with sample-dependent priors, facilitating easy integration of other cues (e.g. motion, shape... Read More »
Multi-cue pedestrian classification with partial occlusion handling
Enzweiler, M. Eigenstetter, A. Schiele, B. Gavrila, D.M.Page(s): 990 - 997
Digital Object Identifier : 10.1109/CVPR.2010.5540111
AbstractPlus | Full Text: PDF (1555KB)
This paper presents a novel mixture-of-experts framework for pedestrian classification with partial occlusion handling. The framework involves a set of component-based expert classifiers trained on features derived from intensity, depth and motion. To handle partial occlusion, we compute expert weights that are related to the degree of visibility of the associated component. This degree of visibility is determined by examining occlusion boundaries, i.e. discontinuities in depth and motion. Occlu... Read More »
Model globally, match locally: Efficient and robust 3D object recognition
Drost, B. Ulrich, M. Navab, N. Ilic, S.Page(s): 998 - 1005
Digital Object Identifier : 10.1109/CVPR.2010.5540108
AbstractPlus | Full Text: PDF (2000KB)
This paper addresses the problem of recognizing free-form 3D objects in point clouds. Compared to traditional approaches based on point descriptors, which depend on local information around points, we propose a novel method that creates a global model description based on oriented point pair features and matches that model locally using a fast voting scheme. The global model description consists of all model point pair features and represents a mapping from the point pair feature space to the mo... Read More »
Visual recognition and detection under bounded computational resources
Vijayanarasimhan, S. Kapoor, A.Page(s): 1006 - 1013
Digital Object Identifier : 10.1109/CVPR.2010.5540109
AbstractPlus | Full Text: PDF (558KB)
Talking pictures: Temporal grouping and dialog-supervised person recognition
Cour, T. Sapp, B. Nagle, A. Taskar, B.Page(s): 1014 - 1021
Digital Object Identifier : 10.1109/CVPR.2010.5540106
AbstractPlus | Full Text: PDF (732KB) | Multimedia
We address the character identification problem in movies and television videos: assigning names to faces on the screen. Most prior work on person recognition in video assumes some supervised data such as screenplay or handlabeled faces. In this paper, our only source of `supervision' are the dialog cues: first, second and third person references (such as “I'm Jack”, “Hey, Jack!” and “Jack left”). While this kind of supervision is sparse and indirect, we... Read More »
An efficient divide-and-conquer cascade for nonlinear object detection
Lampert, C.H.Page(s): 1022 - 1029
Digital Object Identifier : 10.1109/CVPR.2010.5540107
AbstractPlus | Full Text: PDF (366KB)
We introduce a method to accelerate the evaluation of object detection cascades with the help of a divide-and-conquer procedure in the space of candidate regions. Compared to the exhaustive procedure that thus far is the state-of-the-art for cascade evaluation, the proposed method requires fewer evaluations of the classifier functions, thereby speeding up the search. Furthermore, we show how the recently developed efficient subwindow search (ESS) procedure can be integrated into the last stage o... Read More »
New features and insights for pedestrian detection
Walk, S. Majer, N. Schindler, K. Schiele, B.Page(s): 1030 - 1037
Digital Object Identifier : 10.1109/CVPR.2010.5540102
AbstractPlus | Full Text: PDF (2219KB)
Despite impressive progress in people detection the performance on challenging datasets like Caltech Pedestrians or TUD-Brussels is still unsatisfactory. In this work we show that motion features derived from optic flow yield substantial improvements on image sequences, if implemented correctly - even in the case of low-quality video and consequently degraded flow fields. Furthermore, we introduce a new feature, self-similarity on color channels, which consistently improves detection performance... Read More »
Efficient rotation invariant object detection using boosted Random Ferns
Villamizar, M. Moreno-Noguer, F. Andrade-Cetto, J. Sanfeliu, A.Page(s): 1038 - 1045
Digital Object Identifier : 10.1109/CVPR.2010.5540104
AbstractPlus | Full Text: PDF (7617KB)
We present a new approach for building an efficient and robust classifier for the two class problem, that localizes objects that may appear in the image under different orientations. In contrast to other works that address this problem using multiple classifiers, each one specialized for a specific orientation, we propose a simple two-step approach with an estimation stage and a classification stage. The estimator yields an initial set of potential object poses that are then validated by the cla... Read More »
Fast and robust object segmentation with the Integral Linear Classifier
Aldavert, D. Ramisa, A. de Mantaras, R.L. Toledo, R.Page(s): 1046 - 1053
Digital Object Identifier : 10.1109/CVPR.2010.5540098
AbstractPlus | Full Text: PDF (2661KB)
We propose an efficient method, built on the popular Bag of Features approach, that obtains robust multiclass pixel-level object segmentation of an image in less than 500ms, with results comparable or better than most state of the art methods. We introduce the Integral Linear Classifier (ILC), that can readily obtain the classification score for any image sub-window with only 6 additions and 1 product by fusing the accumulation and classification steps in a single operation. In order to design a... Read More »
Segmenting video into classes of algorithm-suitability
Mac Aodha, O. Brostow, G.J. Pollefeys, M.Page(s): 1054 - 1061
Digital Object Identifier : 10.1109/CVPR.2010.5540099
AbstractPlus | Full Text: PDF (1940KB) | Multimedia
Given a set of algorithms, which one(s) should you apply to, i) compute optical flow, or ii) perform feature matching? Would looking at the sequence in question help you decide? It is unclear if even a person with intimate knowledge of all the different algorithms and access to the sequence itself could predict which one to apply. Our hypothesis is that the most suitable algorithm can be chosen for each video automatically, through supervised training of a classifier. The classifier treats the d... Read More »
Latent hierarchical structural learning for object detection
Long Zhu Yuanhao Chen Yuille, A. Freeman, W.Page(s): 1062 - 1069
Digital Object Identifier : 10.1109/CVPR.2010.5540096
AbstractPlus | Full Text: PDF (1727KB)
We present a latent hierarchical structural learning method for object detection. An object is represented by a mixture of hierarchical tree models where the nodes represent object parts. The nodes can move spatially to allow both local and global shape deformations. The models can be trained discriminatively using latent structural SVM learning, where the latent variables are the node positions and the mixture component. But current learning methods are slow, due to the large number of paramete... Read More »
A Steiner tree approach to efficient object detection
Russakovsky, O. Ng, A.Y.Page(s): 1070 - 1077
Digital Object Identifier : 10.1109/CVPR.2010.5540097
AbstractPlus | Full Text: PDF (735KB)
We propose an approach to speeding up object detection, with an emphasis on settings where multiple object classes are being detected. Our method uses a segmentation algorithm to select a small number of image regions on which to run a classifier. Compared to the classical sliding window approach, this results in a significantly smaller number of rectangles examined, and thus significantly faster object detection. Further, in the multiple object class setting, we show that the computational cost... Read More »
Cascaded pose regression
Dollár, P. Welinder, P. Perona, P.Page(s): 1078 - 1085
Digital Object Identifier : 10.1109/CVPR.2010.5540094
AbstractPlus | Full Text: PDF (4501KB)
We present a fast and accurate algorithm for computing the 2D pose of objects in images called cascaded pose regression (CPR). CPR progressively refines a loosely specified initial guess, where each refinement is carried out by a different regressor. Each regressor performs simple image measurements that are dependent on the output of the previous regressors; the entire system is automatically learned from human annotated training examples. CPR is not restricted to rigid transformations: `pose' ... Read More »
Free-shape subwindow search for object localization
Zhiqi Zhang Yu Cao Salvi, D. Oliver, K. Waggoner, J. Song WangPage(s): 1086 - 1093
Digital Object Identifier : 10.1109/CVPR.2010.5540095
AbstractPlus | Full Text: PDF (2304KB)
Improving web image search results using query-relative classifiers
Krapac, J. Allan, M. Verbeek, J. Juried, F.Page(s): 1094 - 1101
Digital Object Identifier : 10.1109/CVPR.2010.5540092
AbstractPlus | Full Text: PDF (2649KB) | Multimedia
Web image search using text queries has received considerable attention. However, current state-of-the-art approaches require training models for every new query, and are therefore unsuitable for real-world web search applications. The key contribution of this paper is to introduce generic classifiers that are based on query-relative features which can be used for new queries without additional training. They combine textual features, based on the occurence of query terms in web pages and image ... Read More »
Using cloud shadows to infer scene structure and camera calibration
Jacobs, N. Bies, B. Pless, R.Page(s): 1102 - 1109
Digital Object Identifier : 10.1109/CVPR.2010.5540093
AbstractPlus | Full Text: PDF (2055KB)
We explore the use of clouds as a form of structured lighting to capture the 3D structure of outdoor scenes observed over time from a static camera. We derive two cues that relate 3D distances to changes in pixel intensity due to clouds shadows. The first cue is primarily spatial, works with low frame-rate time lapses, and supports estimating focal length and scene structure, up to a scale ambiguity. The second cue depends on cloud motion and has a more complex, but still linear, ambiguity. We d... Read More »
Depth from Diffusion
Changyin Zhou Cossairt, O. Nayar, S.Page(s): 1110 - 1117
Digital Object Identifier : 10.1109/CVPR.2010.5540090
AbstractPlus | Full Text: PDF (1728KB)
An optical diffuser is an element that scatters light and is commonly used to soften or shape illumination. In this paper, we propose a novel depth estimation method that places a diffuser in the scene prior to image capture. We call this approach depth-from-diffusion (DFDiff). We show that DFDiff is analogous to conventional depth-from-defocus (DFD), where the scatter angle of the diffuser determines the effective aperture of the system. The main benefit of DFDiff is that while DFD requires ver... Read More »
Self-calibrating photometric stereo
Boxin Shi Matsushita, Y. Yichen Wei Chao Xu Ping TanPage(s): 1118 - 1125
Digital Object Identifier : 10.1109/CVPR.2010.5540091
AbstractPlus | Full Text: PDF (2972KB)
We present a self-calibrating photometric stereo method. From a set of images taken from a fixed viewpoint under different and unknown lighting conditions, our method automatically determines a radiometric response function and resolves the generalized bas-relief ambiguity for estimating accurate surface normals and albedos. We show that color and intensity profiles, which are obtained from registered pixels across images, serve as effective cues for addressing these two calibration problems. As... Read More »
Geometric properties of multiple reflections in catadioptric camera with two planar mirrors
Xianghua Ying Kun Peng Ren Ren Hongbin ZhaPage(s): 1126 - 1132
Digital Object Identifier : 10.1109/CVPR.2010.5540088
AbstractPlus | Full Text: PDF (591KB)
A catadioptric system consisting of a pinhole camera and two planar mirrors is deeply investigated in this paper. The two mirrors combine to form a corner and face-to-face with the pinhole. Their relative pose is unknown. An object will be reflected in the mirror corner one-time or multiple-times. Using the pinhole, we may take an image containing the object and its reflections, i.e., simultaneously imaging multiple views of an object by a single camera. We discovered that each 3D point and its ... Read More »
Recovering thin structures via nonlocal-means regularization with application to depth from defocus
Favaro, P.Page(s): 1133 - 1140
Digital Object Identifier : 10.1109/CVPR.2010.5540089
AbstractPlus | Full Text: PDF (4026KB)
We propose a novel scheme to recover depth maps containing thin structures based on nonlocal-means filtering regularization. The scheme imposes a distributed smoothness constraint by relying on the assumption that pixels with similar colors are likely to belong to the same surface, and therefore can be used jointly to obtain a robust estimate of their depth. This scheme can be used to solve shape-from-X problems and we demonstrate its use in the case of depth from defocus. We cast the problem in... Read More »
Upsampling range data in dynamic environments
Dolson, J. Jongmin Baek Plagemann, C. Thrun, S.Page(s): 1141 - 1148
Digital Object Identifier : 10.1109/CVPR.2010.5540086
AbstractPlus | Full Text: PDF (1411KB)
We present a flexible method for fusing information from optical and range sensors based on an accelerated high-dimensional filtering approach. Our system takes as input a sequence of monocular camera images as well as a stream of sparse range measurements as obtained from a laser or other sensor system. In contrast with existing approaches, we do not assume that the depth and color data streams have the same data rates or that the observed scene is fully static. Our method produces a dense, hig... Read More »
Object cut: Complex 3D object reconstruction through line drawing separation
Tianfan Xue Jianzhuang Liu Xiaoou TangPage(s): 1149 - 1156
Digital Object Identifier : 10.1109/CVPR.2010.5540087
AbstractPlus | Full Text: PDF (425KB)
This paper proposes an approach called object cut to tackle an important problem in computer vision, 3D object reconstruction from single line drawings. Given a complex line drawing representing a solid object, our algorithm finds the places, called cuts, to separate the line drawing into much simpler ones. The complex 3D object is obtained by first reconstructing the 3D objects from these simpler line drawings and then combining them together. Several propositions and criteria are presented for... Read More »
Consensus photometric stereo
Higo, T. Matsushita, Y. Ikeuchi, K.Page(s): 1157 - 1164
Digital Object Identifier : 10.1109/CVPR.2010.5540084
AbstractPlus | Full Text: PDF (1309KB)
This paper describes a photometric stereo method that works with a wide range of surface reflectances. Unlike previous approaches that assume simple parametric models such as Lambertian reflectance, the only assumption that we make is that the reflectance has three properties; monotonicity, visibility, and isotropy with respect to the cosine of light direction and surface orientation. In fact, these properties are observed in many non-Lambertian diffuse reflectances. We also show that the monoto... Read More »
Model evolution: An incremental approach to non-rigid structure from motion
Shengqi Zhu Li Zhang Smith, B.M.Page(s): 1165 - 1172
Digital Object Identifier : 10.1109/CVPR.2010.5540085
AbstractPlus | Full Text: PDF (1375KB)
3D shape scanning with a time-of-flight camera
Yan Cui Schuon, S. Chan, D. Thrun, S. Theobalt, C.Page(s): 1173 - 1180
Digital Object Identifier : 10.1109/CVPR.2010.5540082
AbstractPlus | Full Text: PDF (1734KB) | Multimedia
We describe a method for 3D object scanning by aligning depth scans that were taken from around an object with a time-of-flight camera. These ToF cameras can measure depth scans at video rate. Due to comparably simple technology they bear potential for low cost production in big volumes. Our easy-to-use, cost-effective scanning solution based on such a sensor could make 3D scanning technology more accessible to everyday users. The algorithmic challenge we face is that the sensor's level of rando... Read More »
Refinement of digital elevation models from shadowing cues
Hogan, J. Smith, W.A.P.Page(s): 1181 - 1188
Digital Object Identifier : 10.1109/CVPR.2010.5540083
AbstractPlus | Full Text: PDF (1084KB)
In this paper we derive formal constraints relating terrain elevation and observed cast shadows. We show how an optimisation framework can be used to refine surface estimates using shadowing constraints from one or more images. The method is particularly applicable to the digital elevation models produced by the Shuttle Radar Topography Mission (SRTM), which have an abundance of voids in mountainous areas where elevation data is missing. Cast shadow maps are detected automatically from multi-spe... Read More »
Simultaneous pose, correspondence and non-rigid shape
Sánchez-Riera, J. Östlund, J. Fua, P. Moreno-Noguer, F.Page(s): 1189 - 1196
Digital Object Identifier : 10.1109/CVPR.2010.5539831
AbstractPlus | Full Text: PDF (3335KB)
Recent works have shown that 3D shape of non-rigid surfaces can be accurately retrieved from a single image given a set of 3D-to-2D correspondences between that image and another one for which the shape is known. However, existing approaches assume that such correspondences can be readily established, which is not necessarily true when large deformations produce significant appearance changes between the input and the reference images. Furthermore, it is either assumed that the pose of the camer... Read More »
Surface extraction from binary volumes with higher-order smoothness
Lempitsky, V.Page(s): 1197 - 1204
Digital Object Identifier : 10.1109/CVPR.2010.5539832
AbstractPlus | Full Text: PDF (4381KB)
A number of 3D shape reconstruction algorithms, in particular 3D image segmentation methods, produce their results in the form of binary volumes, where a binary value indicates whether a voxel is associated with the interior or the exterior. For visualization purpose, it is often desirable to convert a binary volume into a surface representation. Straightforward extraction of the median isosurfaces for binary volumes using the marching cubes algorithm, however, produces jaggy, visually unrealist... Read More »
A framework for ultra high resolution 3D imaging
Zheng Lu Yu-Wing Tai Ben-Ezra, M. Brown, M.S.Page(s): 1205 - 1212
Digital Object Identifier : 10.1109/CVPR.2010.5539829
AbstractPlus | Full Text: PDF (1575KB)
We present an imaging framework to acquire 3D surface scans at ultra high-resolutions (exceeding 600 samples per mm2). Our approach couples a standard structured-light setup and photometric stereo using a large-format ultra-high-resolution camera. While previous approaches have employed similar hybrid imaging systems to fuse positional data with surface normals, what is unique to our approach is the significant asymmetry in the resolution between the low-resolution geometry and the ul... Read More »
3D reconstruction of glossy surfaces using stereo cameras and projector-display
Yamazaki, M. Gang XuPage(s): 1213 - 1220
Digital Object Identifier : 10.1109/CVPR.2010.5539830
AbstractPlus | Full Text: PDF (1570KB)
In this paper, we first describe our approach to measuring the shape of diffuse and specular surfaces, and then we extend the method to measuring the shape of glossy surfaces by using stereo cameras and projector-display. Existing methods using a projector or display usually assume a perfectly diffuse or specular surface for estimating the 3D shape of the objects. We develop a hybrid method for estimating the 3D shape of glossy surfaces that incorporates both diffuse and specular components by c... Read More »
Simultaneous point matching and 3D deformable surface reconstruction
Shaji, A. Varol, A. Torresani, L. Fua, P.Page(s): 1221 - 1228
Digital Object Identifier : 10.1109/CVPR.2010.5539827
AbstractPlus | Full Text: PDF (9470KB) | Multimedia
It has been shown that the 3D shape of a deformable surface in an image can be recovered by establishing correspondences between that image and a reference one in which the shape is known. These matches can then be used to set-up a convex optimization problem in terms of the shape parameters, which is easily solved. However, in many cases, the correspondences are hard to establish reliably. In this paper, we show that we can solve simultaneously for both 3D shape and correspondences, thereby usi... Read More »
Shape and refractive index recovery from single-view polarisation images
Cong Phuoc Huynh Robles-Kelly, A. Hancock, E.Page(s): 1229 - 1236
Digital Object Identifier : 10.1109/CVPR.2010.5539828
AbstractPlus | Full Text: PDF (3429KB)
In this paper, we propose an approach to the problem of simultaneous shape and refractive index recovery from multispectral polarisation imagery captured from a single viewpoint. The focus of this paper is on dielectric surfaces which diffusely polarise light transmitted from the dielectric body into the air. The diffuse polarisation of the reflection process is modelled using a Transmitted Radiance Sinusoid curve and the Fresnel transmission theory. We provide a method of estimating the azimuth... Read More »
High-resolution modeling of moving and deforming objects using sparse geometric and dense photometric measurements
Yi Xu Aliaga, D.G.Page(s): 1237 - 1244
Digital Object Identifier : 10.1109/CVPR.2010.5539825
AbstractPlus | Full Text: PDF (2461KB) | Multimedia
Modeling moving and deforming objects requires capturing as much information as possible during a very short time. When using off-the-shelf hardware, this often hinders the resolution and accuracy of the acquired model. Our key observation is that in as little as four frames both sparse surface-positional measurements and dense surface-orientation measurements can be acquired using a combination of structured light and photometric stereo, resulting in high-resolution models of moving and deformi... Read More »
Specular surface reconstruction from sparse reflection correspondences
Sankaranarayanan, A.C. Veeraraghavan, A. Tuzel, O. Agrawal, A.Page(s): 1245 - 1252
Digital Object Identifier : 10.1109/CVPR.2010.5539826
AbstractPlus | Full Text: PDF (3155KB)
Single image depth estimation from predicted semantic labels
Beyang Liu Gould, S. Koller, D.Page(s): 1253 - 1260
Digital Object Identifier : 10.1109/CVPR.2010.5539823
AbstractPlus | Full Text: PDF (2157KB)
We consider the problem of estimating the depth of each pixel in a scene from a single monocular image. Unlike traditional approaches, which attempt to map from appearance features to depth directly, we first perform a semantic segmentation of the scene and use the semantic labels to guide the 3D reconstruction. This approach provides several advantages: By knowing the semantic class of a pixel or region, depth and geometry constraints can be easily enforced (e.g., “sky” is far awa... Read More »
Robust piecewise-planar 3D reconstruction and completion from large-scale unstructured point data
Chauve, A.-L. Labatut, P. Pons, J.-P.Page(s): 1261 - 1268
Digital Object Identifier : 10.1109/CVPR.2010.5539824
AbstractPlus | Full Text: PDF (2496KB) | Multimedia
In this paper, we present a novel method, the first to date to our knowledge, which is capable of directly and automatically producing a concise and idealized 3D representation from unstructured point data of complex cluttered real-world scenes, with a high level of noise and a significant proportion of outliers, such as those obtained from passive stereo. Our algorithm can digest millions of input points into an optimized lightweight watertight polygonal mesh free of self-intersection, that pre... Read More »
Visual tracking decomposition
Junseok Kwon Kyoung Mu LeePage(s): 1269 - 1276
Digital Object Identifier : 10.1109/CVPR.2010.5539821
AbstractPlus | Full Text: PDF (2639KB) | Multimedia
We propose a novel tracking algorithm that can work robustly in a challenging scenario such that several kinds of appearance and motion changes of an object occur at the same time. Our algorithm is based on a visual tracking decomposition scheme for the efficient design of observation and motion models as well as trackers. In our scheme, the observation model is decomposed into multiple basic observation models that are constructed by sparse principal component analysis (SPCA) of a set of featur... Read More »
A globally optimal data-driven approach for image distortion estimation
Yuandong Tian Narasimhan, S.G.Page(s): 1277 - 1284
Digital Object Identifier : 10.1109/CVPR.2010.5539822
AbstractPlus | Full Text: PDF (6048KB)
Image alignment in the presence of non-rigid distortions is a challenging task. Typically, this involves estimating the parameters of a dense deformation field that warps a distorted image back to its undistorted template. Generative approaches based on parameter optimization such as Lucas-Kanade can get trapped within local minima. On the other hand, discriminative approaches like Nearest-Neighbor require a large number of training samples that grows exponentially with the desired accuracy. In ... Read More »
Tracking the invisible: Learning where the object might be
Grabner, H. Matas, J. Van Gool, L. Cattin, P.Page(s): 1285 - 1292
Digital Object Identifier : 10.1109/CVPR.2010.5539819
AbstractPlus | Full Text: PDF (3712KB) | Multimedia
Objects are usually embedded into context. Visual context has been successfully used in object detection tasks, however, it is often ignored in object tracking. We propose a method to learn supporters which are, be it only temporally, useful for determining the position of the object of interest. Our approach exploits the General Hough Transform strategy. It couples the supporters with the target and naturally distinguishes between strongly and weakly coupled motions. By this, the position of an... Read More »
Motion detail preserving optical flow estimation
Li Xu Jiaya Jia Matsushita, Y.Page(s): 1293 - 1300
Digital Object Identifier : 10.1109/CVPR.2010.5539820
AbstractPlus | Full Text: PDF (3095KB)
We discuss the cause of a severe optical flow estimation problem that fine motion structures cannot always be correctly reconstructed in the commonly employed multi-scale variational framework. Our major finding is that significant and abrupt displacement transition wrecks small-scale motion structures in the coarse-to-fine refinement. A novel optical flow estimation method is proposed in this paper to address this issue, which reduces the reliance of the flow estimates on their initial values p... Read More »
Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes
Shengcai Liao Guoying Zhao Kellokumpu, V. Pietikäinen, M. Li, S.Z.Page(s): 1301 - 1306
Digital Object Identifier : 10.1109/CVPR.2010.5539817
AbstractPlus | Full Text: PDF (996KB) | Multimedia
Background modeling plays an important role in video surveillance, yet in complex scenes it is still a challenging problem. Among many difficulties, problems caused by illumination variations and dynamic backgrounds are the key aspects. In this work, we develop an efficient background subtraction framework to tackle these problems. First, we propose a scale invariant local ternary pattern operator, and show that it is effective for handling illumination variations, especially for moving soft sha... Read More »
Real-time tracking of multiple occluding objects using level sets
Bibby, C. Reid, I.Page(s): 1307 - 1314
Digital Object Identifier : 10.1109/CVPR.2010.5539818
AbstractPlus | Full Text: PDF (2808KB) | Multimedia
We derive a probabilistic framework for robust, realtime, visual tracking of multiple previously unseen objects from a moving camera. This framework models the discrete depth ordering of the objects being tracked in the scene. The method uses the observed image data to compute a posterior over the objects' poses, shapes and relative depths. The poses are group transformations, the shapes are implicit contours represented using level-sets and the relative depths give the discrete depth ordering o... Read More »
Visual tracking via incremental self-tuning particle filtering on the affine group
Min Li Wei Chen Kaiqi Huang Tieniu TanPage(s): 1315 - 1322
Digital Object Identifier : 10.1109/CVPR.2010.5539815
AbstractPlus | Full Text: PDF (837KB) | Multimedia
We propose an incremental self-tuning particle filtering (ISPF) framework for visual tracking on the affine group. SIFT (Scale Invariant Feature Transform) like descriptors are used as basic features, and IPCA (Incremental Principle Component Analysis) is utilized to learn an adaptive appearance subspace for similarity measurement. ISPF tries to find the optimal target position in a step-by-step way: particles are incrementally drawn and intelligently tuned to their best states by an online LWPR... Read More »
Visual tracking via weakly supervised learning from multiple imperfect oracles
Bineng Zhong Hongxun Yao Sheng Chen Rongrong Ji Xiaotong Yuan Shaohui Liu Wen GaoPage(s): 1323 - 1330
Digital Object Identifier : 10.1109/CVPR.2010.5539816
AbstractPlus | Full Text: PDF (1680KB) | Multimedia
Warping background subtraction
Ko, T. Soatto, S. Estrin, D.Page(s): 1331 - 1338
Digital Object Identifier : 10.1109/CVPR.2010.5539813
AbstractPlus | Full Text: PDF (2597KB)
We present a background model that differentiates between background motion and foreground objects. Unlike most models that represent the variability of pixel intensity at a particular location in the image, we model the underlying warping of pixel locations arising from background motion. The background is modeled as a set of warping layers, where at any given time, different layers may be visible due to the motion of an occluding layer. Foreground regions are thus defined as those that cannot ... Read More »
Free-form mesh tracking: A patch-based approach
Cagniart, C. Boyer, E. Ilic, S.Page(s): 1339 - 1346
Digital Object Identifier : 10.1109/CVPR.2010.5539814
AbstractPlus | Full Text: PDF (4837KB) | Multimedia
In this paper, we consider the problem of tracking nonrigid surfaces and propose a generic data-driven mesh deformation framework. In contrast to methods using strong prior models, this framework assumes little on the observed surface and hence easily generalizes to most free-form surfaces while effectively handling large deformations. To this aim, the reference surface is divided into elementary surface cells or patches. This strategy ensures robustness by providing natural integration domains ... Read More »
Trajectory matching from unsynchronized videos
Han Hu Jie ZhouPage(s): 1347 - 1354
Digital Object Identifier : 10.1109/CVPR.2010.5539811
AbstractPlus | Full Text: PDF (4728KB)
This paper studies the problem of spatio-temporal matching between trajectories from two videos of the same scene. In real applications, trajectories are usually extracted independently in different videos. So possibly a lot of trajectories stay “alone” (have no corresponding trajectory in the other video). In this paper, we propose a novel matching algorithm which can not only find the existing correspondences between trajectories, but also recover the corresponding trajectories o... Read More »
Rapid selection of reliable templates for visual tracking
Alt, N. Hinterstoisser, S. Navab, N.Page(s): 1355 - 1362
Digital Object Identifier : 10.1109/CVPR.2010.5539812
AbstractPlus | Full Text: PDF (625KB) | Multimedia
We propose a method that rates the suitability of given templates for template-based tracking in real-time. This is important for applications with online template selection, such as SLAM, where it is essential to track a low number of preferably reliable templates. Our approach is based on simple image features specifically designed to identify texture properties which are problematic for tracking. During a training step, a support vector régresser is learned. It uses a tracking quality... Read More »
Generalized simultaneous registration and segmentation
Ghosh, P. Sargin, E. Manjunath, B.S.Page(s): 1363 - 1370
Digital Object Identifier : 10.1109/CVPR.2010.5539809
AbstractPlus | Full Text: PDF (2282KB) | Multimedia
Simultaneous registration and segmentation (SRS) provides a powerful framework for tracking an object of interest in an image sequence. The state-of-the-art SRS-based tracking methods assume that the illumination is maintained constant across consecutive frames. However, this assumption does not hold in many natural image sequences due to dynamic light source and shadows. We propose a generalized model for SRS-based tracking in this paper to account for non-uniform additive illumination changes.... Read More »
A probabilistic framework for joint segmentation and tracking
Aeschliman, C. Park, J. Kak, A.C.Page(s): 1371 - 1378
Digital Object Identifier : 10.1109/CVPR.2010.5539810
AbstractPlus | Full Text: PDF (2168KB)
Most tracking algorithms implicitly apply a coarse segmentation of each target object using a simple mask such as a rectangle or an ellipse. Although convenient, such coarse segmentation results in several problems in tracking - drift, switching of targets, poor target localization, to name a few - since it inherently includes extra non-target pixels if the mask is larger than the target or excludes some portion of target pixels if the mask is smaller than the target. In this paper, we propose a... Read More »
Probabilistic 3D occupancy flow with latent silhouette cues
Li Guan Franco, J.-S. Boyer, E. Pollefeys, M.Page(s): 1379 - 1386
Digital Object Identifier : 10.1109/CVPR.2010.5539807
AbstractPlus | Full Text: PDF (2168KB) | Multimedia
In this paper we investigate shape and motion retrieval in the context of multi-camera systems. We propose a new low-level analysis based on latent silhouette cues, particularly suited for low-texture and outdoor datasets. Our analysis does not rely on explicit surface representations, instead using an EM framework to simultaneously update a set of volumetric voxel occupancy probabilities and retrieve a best estimate of the dense 3D motion field from the last consecutively observed multi-view fr... Read More »
Novel observation model for probabilistic object tracking
Dawei Liang Qingming Huang Hongxun Yao Shuqiang Jiang Rongrong Ji Wen GaoPage(s): 1387 - 1394
Digital Object Identifier : 10.1109/CVPR.2010.5539808
AbstractPlus | Full Text: PDF (599KB)
Treating visual object tracking as foreground and background classification problem has attracted much attention in the past decade. Most methods adopt mean shift or brute force search to perform object tracking on the generated probability map, which is obtained from the classification results; however, performing probabilistic object tracking on the probability map is almost unexplored. This paper proposes a novel observation model which is suitable to perform this task. The observation model ... Read More »
Online multiple instance learning with no regret
Mu Li Kwok, J.T. Bao-Liang LuPage(s): 1395 - 1401
Digital Object Identifier : 10.1109/CVPR.2010.5539805
AbstractPlus | Full Text: PDF (2153KB)
Multiple instance (MI) learning is a recent learning paradigm that is more flexible than standard supervised learning algorithms in the handling of label ambiguity. It has been used in a wide range of applications including image classification, object detection and object tracking. Typically, MI algorithms are trained in a batch setting in which the whole training set has to be available before training starts. However, in applications such as tracking, the classifier needs to be trained contin... Read More »
Dynamic surface matching by geodesic mapping for 3D animation transfer
Tung, T. Matsuyama, T.Page(s): 1402 - 1409
Digital Object Identifier : 10.1109/CVPR.2010.5539806
AbstractPlus | Full Text: PDF (4746KB)
Probabilistic temporal inference on reconstructed 3D scenes
Schindler, G. Dellaert, F.Page(s): 1410 - 1417
Digital Object Identifier : 10.1109/CVPR.2010.5539803
AbstractPlus | Full Text: PDF (1730KB)
Modern structure from motion techniques are capable of building city-scale 3D reconstructions from large image collections, but have mostly ignored the problem of large-scale structural changes over time. We present a general framework for estimating temporal variables in structure from motion problems, including an unknown date for each camera and an unknown time interval for each structural element. Given a collection of images with mostly unknown or uncertain dates, we use this framework to a... Read More »
Piecewise planar and non-planar stereo for urban scene reconstruction
Gallup, D. Frahm, J.-M. Pollefeys, M.Page(s): 1418 - 1425
Digital Object Identifier : 10.1109/CVPR.2010.5539804
AbstractPlus | Full Text: PDF (7532KB) | Multimedia
Piecewise planar models for stereo have recently become popular for modeling indoor and urban outdoor scenes. The strong planarity assumption overcomes the challenges presented by poorly textured surfaces, and results in low complexity 3D models for rendering, storage, and transmission. However, such a model performs poorly in the presence of non-planar objects, for example, bushes, trees, and other clutter present in many scenes. We present a stereo method capable of handling more general scene... Read More »
Disambiguating visual relations using loop constraints
Zach, C. Klopschitz, M. Pollefeys, M.Page(s): 1426 - 1433
Digital Object Identifier : 10.1109/CVPR.2010.5539801
AbstractPlus | Full Text: PDF (3029KB)
Repetitive and ambiguous visual structures in general pose a severe problem in many computer vision applications. Identification of incorrect geometric relations between images solely based on low level features is not always possible, and a more global reasoning approach about the consistency of the estimated relations is required. We propose to utilize the typically observed redundancy in the hypothesized relations for such reasoning, and focus on the graph structure induced by those relations... Read More »
Towards Internet-scale multi-view stereo
Furukawa, Y. Curless, B. Seitz, S.M. Szeliski, R.Page(s): 1434 - 1441
Digital Object Identifier : 10.1109/CVPR.2010.5539802
AbstractPlus | Full Text: PDF (6110KB) | Multimedia
This paper introduces an approach for enabling existing multi-view stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstructions. This overlapping clustering problem is formulated as a constrained optimization and solved iteratively. The merging algorithm, designed to be parallel and out-of-core, incorporates robust fil... Read More »
Reconstruction of display and eyes from a single image
Schnieders, D. Xingdou Fu Wong, K.-Y.K.Page(s): 1442 - 1449
Digital Object Identifier : 10.1109/CVPR.2010.5539799
AbstractPlus | Full Text: PDF (3027KB)
This paper introduces a novel method for reconstructing human eyes and visual display from reflections on the cornea. This problem is difficult because the camera is not directly facing the display, but instead captures the eyes of a person in front of the display. Reconstruction of eyes and display is useful for point-of-gaze estimation, which can be approximated from the 3D positions of the iris and display. It is shown that iris boundaries (limbus) and display reflections in a single intrinsi... Read More »
Outlier removal using duality
Olsson, C. Eriksson, A. Hartley, R.Page(s): 1450 - 1457
Digital Object Identifier : 10.1109/CVPR.2010.5539800
AbstractPlus | Full Text: PDF (3388KB)
In this paper we consider the problem of outlier removal for large scale multiview reconstruction problems. An efficient and very popular method for this task is RANSAC. However, as RANSAC only works on a subset of the images, mismatches in longer point tracks may go undetected. To deal with this problem we would like to have, as a post processing step to RANSAC, a method that works on the entire (or a larger) part of the sequence. In this paper we consider two algorithms for doing this. The fir... Read More »
A constant-space belief propagation algorithm for stereo matching
Qingxiong Yang Liang Wang Ahuja, N.Page(s): 1458 - 1465
Digital Object Identifier : 10.1109/CVPR.2010.5539797
AbstractPlus | Full Text: PDF (813KB)
In this paper, we consider the problem of stereo matching using loopy belief propagation. Unlike previous methods which focus on the original spatial resolution, we hierarchically reduce the disparity search range. By fixing the number of disparity levels on the original resolution, our method solves the message updating problem in a time linear in the number of pixels contained in the image and requires only constant memory space. Specifically, for a 800 × 600 image with 300 disparities,... Read More »
Evaluation of stereo confidence indoors and outdoors
Xiaoyan Hu Mordohai, P.Page(s): 1466 - 1473
Digital Object Identifier : 10.1109/CVPR.2010.5539798
AbstractPlus | Full Text: PDF (2389KB)
We present an extensive evaluation of 13 confidence metrics for stereo matching that compares the most widely used metrics as well as four novel techniques proposed here. We begin by categorizing the methods according to which aspects of stereo computation they take into account and, then, assess their strengths and weaknesses. The evaluation is conducted on indoor and outdoor datasets with ground truth and measures the capability of each confidence metric to rank depth estimates according to th... Read More »
Pushing the envelope of modern methods for bundle adjustment
Yekeun Jeong Nister, D. Steedly, D. Szeliski, R. In-So KweonPage(s): 1474 - 1481
Digital Object Identifier : 10.1109/CVPR.2010.5539795
AbstractPlus | Full Text: PDF (785KB) | Multimedia
In this paper, we present results and experiments with several methods for bundle adjustment, producing the fastest bundle adjuster ever published. The fastest methods work with the well known reduced camera system and handle the block-sparse pattern arising in the reduced camera system in a natural way. Adapting to the naturally arising block-sparsity allows the use of BLAS3, efficient memory handling, fast variable ordering, and customized sparse solving all at the same time. We present two me... Read More »
Quasi-dense 3D reconstruction using tensor-based multiview stereo
Tai-Pang Wu Sai-Kit Yeung Jiaya Jia Chi-Keung TangPage(s): 1482 - 1489
Digital Object Identifier : 10.1109/CVPR.2010.5539796
AbstractPlus | Full Text: PDF (5536KB) | Multimedia
Accurate 3D face reconstruction from weakly calibrated wide baseline images with profile contours
Yuping Lin Medioni, G. Jongmoo ChoiPage(s): 1490 - 1497
Digital Object Identifier : 10.1109/CVPR.2010.5539793
AbstractPlus | Full Text: PDF (766KB)
We propose a method to generate a highly accurate 3D face model from a set of wide-baseline images in a weakly calibrated setup. Our approach is purely data driven, and produces faithful 3D models without any pre-defined models, unlike other statistical model-based approaches. Our results do not rely upon a critical initialization step nor parameters for optimization steps. We process 5 images (including profile views), infer the accurate poses of cameras in all views, and then infer a dense 3D ... Read More »
Live dense reconstruction with a single moving camera
Newcombe, R.A. Davison, A.J.Page(s): 1498 - 1505
Digital Object Identifier : 10.1109/CVPR.2010.5539794
AbstractPlus | Full Text: PDF (9016KB)
We present a method which enables rapid and dense reconstruction of scenes browsed by a single live camera. We take point-based real-time structure from motion (SFM) as our starting point, generating accurate 3D camera pose estimates and a sparse point cloud. Our main novel contribution is to use an approximate but smooth base mesh generated from the SFM to predict the view at a bundle of poses around automatically selected reference frames spanning the scene, and then warp the base mesh into hi... Read More »
Multi-view scene flow estimation: A view centered variational approach
Basha, T. Moses, Y. Kiryati, N.Page(s): 1506 - 1513
Digital Object Identifier : 10.1109/CVPR.2010.5539791
AbstractPlus | Full Text: PDF (515KB)
We present a novel method for recovering the 3D structure and scene flow from calibrated multi-view sequences. We propose a 3D point cloud parametrization of the 3D structure and scene flow that allows us to directly estimate the desired unknowns. A unified global energy functional is proposed to incorporate the information from the available sequences and simultaneously recover both depth and scene flow. The functional enforces multi-view geometric consistency and imposes brightness constancy a... Read More »
Egomotion using assorted features
Pradeep, V. Jongwoo LimPage(s): 1514 - 1521
Digital Object Identifier : 10.1109/CVPR.2010.5539792
AbstractPlus | Full Text: PDF (3157KB)
We describe a novel and robust minimal solver for performing online visual odometry with a stereo rig. The proposed method can compute the underlying camera motion given any arbitrary, mixed combination of point and line correspondences across two stereo views. This facilitates a hybrid visual odometry pipeline that is enhanced by well-localized and reliably-tracked line features while retaining the well-known advantages of point features. Utilizing trifocal tensor geometry and quaternion repres... Read More »
Monocular SLAM with locally planar landmarks via geometric rao-blackwellized particle filtering on Lie groups
Junghyun Kwon Kyoung Mu LeePage(s): 1522 - 1529
Digital Object Identifier : 10.1109/CVPR.2010.5539789
AbstractPlus | Full Text: PDF (1405KB) | Multimedia
We propose a novel geometric Rao-Blackwellized particle filtering framework for monocular SLAM with locally planar landmarks. We represent the states for the camera pose and the landmark plane normal as SE(3) and SO(3), respectively, which are both Lie groups. The measurement error is also represented as another Lie group SL(3) corresponding to the space of homography matrices. We then formulate the unscented transformation on Lie groups for optimal importance sampling and landmark estimation vi... Read More »
Ray Markov Random Fields for image-based 3D modeling: Model and efficient inference
Shubao Liu Cooper, D.B.Page(s): 1530 - 1537
Digital Object Identifier : 10.1109/CVPR.2010.5539790
AbstractPlus | Full Text: PDF (6107KB)
In this paper, we present an approach to multi-view image-based 3D reconstruction by statistically inversing the ray-tracing based image generation process. The proposed algorithm is fast, accurate and does not need any initialization. The geometric representation is a discrete volume divided into voxels, with each voxel associated with two properties: opacity (shape) and color (appearance). The problem is then formulated as inferring each voxel's most probable opacity and color through MAP esti... Read More »
3D curve sketch: Flexible curve-based stereo reconstruction and calibration
Fabbri, R. Kimia, B.Page(s): 1538 - 1545
Digital Object Identifier : 10.1109/CVPR.2010.5539787
AbstractPlus | Full Text: PDF (716KB) | Multimedia
Interest point-based multiview 3D reconstruction and calibration methods have been very successful in select applications but are not applicable when an abundance of feature points are not available. They also lead to an unorganized point cloud reconstruction where the geometry of the scene is not explicit. The multiview stereo methods on the other hand yield dense surface geometry but require a highly controlled or calibrated setting. We propose and develop a novel framework for 3D reconstructi... Read More »
Scalable active matching
Handa, A. Chli, M. Strasdat, H. Davison, A.J.Page(s): 1546 - 1553
Digital Object Identifier : 10.1109/CVPR.2010.5539788
AbstractPlus | Full Text: PDF (504KB) | Multimedia
In matching tasks in computer vision, and particularly in real-time tracking from video, there are generally strong priors available on absolute and relative correspondence locations thanks to motion and scene models. While these priors are often partially used post-hoc to resolve matching consensus in algorithms like RANSAC, it was recently shown that fully integrating them in an `Active Matching' (AM) approach permits efficient guided image processing with rigorous decisions guided by Informat... Read More »
Triangulation made easy
Lindstrom, P.Page(s): 1554 - 1561
Digital Object Identifier : 10.1109/CVPR.2010.5539785
AbstractPlus | Full Text: PDF (1940KB)
We describe a simple and efficient algorithm for two-view triangulation of 3D points from approximate 2D matches based on minimizing the L2 reprojection error. Our iterative algorithm improves on the one by Kanatani et al. by ensuring that in each iteration the epipolar constraint is satisfied. In the case where the two cameras are pointed in the same direction, the method provably converges to an optimal solution in exactly two iterations. For more general camera poses, two iteration... Read More »
Simultaneous surveillance camera calibration and foot-head homology estimation from human detections
Micusik, B. Pajdla, T.Page(s): 1562 - 1569
Digital Object Identifier : 10.1109/CVPR.2010.5539786
AbstractPlus | Full Text: PDF (333KB)
Surface stereo with soft segmentation
Bleyer, M. Rother, C. Kohli, P.Page(s): 1570 - 1577
Digital Object Identifier : 10.1109/CVPR.2010.5539783
AbstractPlus | Full Text: PDF (828KB) | Multimedia
This paper proposes a new stereo model which encodes the simple assumption that the scene is composed of a few, smooth surfaces. A key feature of our model is the surface-based representation, where each pixel is assigned to a 3D surface (planes or B-splines). This representation enables several important contributions: Firstly, we formulate a higher-order prior which states that pixels of similar appearance are likely to belong to the same 3D surface. This enables to incorporate the very popula... Read More »
Admissible linear map models of linear cameras
Batog, G. Goaoc, X. Ponce, J.Page(s): 1578 - 1585
Digital Object Identifier : 10.1109/CVPR.2010.5539784
AbstractPlus | Full Text: PDF (200KB) | Multimedia
This paper presents a complete analytical characterization of a large class of central and non-central imaging devices dubbed linear cameras by Ponce. Pajdla has shown that a subset of these, the oblique cameras, can be modelled by a certain type of linear map. We give here a full tabulation of all admissible maps that induce cameras in the general sense of Grossberg and Nayar, and show that these cameras are exactly the linear ones. Combining these two models with a new notion of intrinsic para... Read More »
Exploiting global connectivity constraints for reconstruction of 3D line segments from images
Jain, A. Kurz, C. Thormählen, T. Seidel, H.Page(s): 1586 - 1593
Digital Object Identifier : 10.1109/CVPR.2010.5539781
AbstractPlus | Full Text: PDF (3634KB)
Given a set of 2D images, we propose a novel approach for the reconstruction of straight 3D line segments that represent the underlying geometry of static 3D objects in the scene. Such an algorithm is especially useful for the automatic 3D reconstruction of man-made environments. The main contribution of our approach is the generation of an improved reconstruction by imposing global topological constraints given by connections between neighbouring lines. Additionally, our approach does not emplo... Read More »
Improving the efficiency of hierarchical structure-and-motion
Gherardi, R. Farenzena, M. Fusiello, A.Page(s): 1594 - 1600
Digital Object Identifier : 10.1109/CVPR.2010.5539782
AbstractPlus | Full Text: PDF (1072KB)
We present a completely automated Structure and Motion pipeline capable of working with uncalibrated images with varying internal parameters and no ancillary information. The system is based on a novel hierarchical scheme which reduces the total complexity by one order of magnitude. We assess the quality of our approach analytically by comparing the recovered point clouds with laser scans, which serves as ground truth data. Read More »
Multiview constraints in frequency space and camera calibration from unsynchronized images
Matsumoto, H. Sato, J. Sakaue, F.Page(s): 1601 - 1608
Digital Object Identifier : 10.1109/CVPR.2010.5539779
AbstractPlus | Full Text: PDF (589KB)
In this paper, we propose a method for calibrating relative position and orientation of multiple unsynchronized cameras from general moving points. If the sampling times of multiple cameras are different from each other, there is no corresponding point in the images of moving points and we cannot calibrate these cameras. In this paper we analyze geometric relationships of multiple cameras in the frequency space, and show that they enable us to calibrate multiple unsynchronized cameras accurately... Read More »
Common visual pattern discovery via spatially coherent correspondences
Hairong Liu Shuicheng YanPage(s): 1609 - 1616
Digital Object Identifier : 10.1109/CVPR.2010.5539780
AbstractPlus | Full Text: PDF (2137KB)
We investigate how to discover all common visual patterns within two sets of feature points. Common visual patterns generally share similar local features as well as similar spatial layout. In this paper these two types of information are integrated and encoded into the edges of a graph whose nodes represent potential correspondences, and the common visual patterns then correspond to those strongly connected subgraphs. All such strongly connected subgraphs correspond to large local maxima of a q... Read More »
Unsupervised detection and segmentation of identical objects
Minsu Cho Young Min Shin Kyoung Mu LeePage(s): 1617 - 1624
Digital Object Identifier : 10.1109/CVPR.2010.5539777
AbstractPlus | Full Text: PDF (17587KB)
We address an unsupervised object detection and segmentation problem that goes beyond the conventional assumptions of one-to-one object correspondences or model-test settings between images. Our method can detect and segment identical objects directly from a single image or a handful of images without any supervision. To detect and segment all the object-level correspondences from the given images, a novel multi-layer match-growing method is proposed that starts from initial local feature matche... Read More »
A novel riemannian framework for shape analysis of 3D objects
Kurtek, S. Klassen, E. Zhaohua Ding Srivastava, A.Page(s): 1625 - 1632
Digital Object Identifier : 10.1109/CVPR.2010.5539778
AbstractPlus | Full Text: PDF (1930KB)
In this paper we introduce a novel Riemannian framework for shape analysis of parameterized surfaces. We derive a distance function between any two surfaces that is invariant to rigid motion, global scaling, and re-parametrization. It is the last part that presents the main difficulty. Our solution to this problem is twofold: (1) we define a special representation, called a q-map, to represent each surface, and (2) we develop a gradient-based algorithm to optimize over different re-parameterizat... Read More »
Global and efficient self-similarity for object classification and detection
Deselaers, T. Ferrari, V.Page(s): 1633 - 1640
Digital Object Identifier : 10.1109/CVPR.2010.5539775
AbstractPlus | Full Text: PDF (1326KB)
Self-similarity is an attractive image property which has recently found its way into object recognition in the form of local self-similarity descriptors. In this paper we explore global self-similarity (GSS) and its advantages over local self-similarity (LSS). We make three contributions: (a) we propose computationally efficient algorithms to extract GSS descriptors for classification. These capture the spatial arrangements of self-similarities within the entire image; (b) we show how to use th... Read More »
Object matching with a locally affine-invariant constraint
Hongsheng Li Kim, E. Xiaolei Huang Lei HePage(s): 1641 - 1648
Digital Object Identifier : 10.1109/CVPR.2010.5539776
AbstractPlus | Full Text: PDF (4106KB)
Unsupervised learning of invariant features using video
Stavens, D. Thrun, S.Page(s): 1649 - 1656
Digital Object Identifier : 10.1109/CVPR.2010.5539773
AbstractPlus | Full Text: PDF (6934KB)
We present an algorithm that learns invariant features from real data in an entirely unsupervised fashion. The principal benefit of our method is that it can be applied without human intervention to a particular application or data set, learning the specific invariances necessary for excellent feature performance on that data. Our algorithm relies on the ability to track image patches over time using optical flow. With the wide availability of high frame rate video (eg: on the web, from a robot)... Read More »
Scale-hierarchical 3D object recognition in cluttered scenes
Bariya, P. Nishino, K.Page(s): 1657 - 1664
Digital Object Identifier : 10.1109/CVPR.2010.5539774
AbstractPlus | Full Text: PDF (4982KB)
3D object recognition in scenes with occlusion and clutter is a difficult task. In this paper, we introduce a method that exploits the geometric scale-variability to aid in this task. Our key insight is to leverage the rich discriminative information provided by the scale variation of local geometric structures to constrain the massive search space of potential correspondences between model and scene points. In particular, we exploit the geometric scale variability in the form of the intrinsic g... Read More »
Linked edges as stable region boundaries
Donoser, M. Riemenschneider, H. Bischof, H.Page(s): 1665 - 1672
Digital Object Identifier : 10.1109/CVPR.2010.5539833
AbstractPlus | Full Text: PDF (5889KB)
Many of the recently popular shape based category recognition methods require stable, connected and labeled edges as input. This paper introduces a novel method to find the most stable region boundaries in grayscale images for this purpose. In contrast to common edge detection algorithms as Canny, which only analyze local discontinuities in image brightness, our method integrates mid-level information by analyzing regions that support the local gradient magnitudes. We use a component tree where ... Read More »
Many-to-one contour matching for describing and discriminating object shape
Srinivasan, P. Qihui Zhu Jianbo ShiPage(s): 1673 - 1680
Digital Object Identifier : 10.1109/CVPR.2010.5539834
AbstractPlus | Full Text: PDF (4884KB)
We present an object recognition system that locates an object, identifies its parts, and segments out its contours. A key distinction of our approach is that we use long, salient, bottom-up image contours to learn object shape, and to achieve object detection with the learned shape. Most learning methods rely on one-to-one matching of contours to a model. However, bottom-up image contours often fragment unpredictably. We resolve this difficulty by using many-to-one matching of image contours to... Read More »
3D model based vehicle classification in aerial imagery
Khan, S.M. Hui Cheng Matthies, D. Sawhney, H.Page(s): 1681 - 1687
Digital Object Identifier : 10.1109/CVPR.2010.5539835
AbstractPlus | Full Text: PDF (800KB)
We present an approach that uses detailed 3D models to detect and classify objects into fine levels of vehicle categories. Unlike other approaches that use silhouette information to fit a 3D model, our approach uses complete appearance from the image. Each 3D model has a set of salient location markers that are determined a-priori. These salient locations represent a sub-sampling of 3D locations that make up the model. Scene conditions are simulated in the rendering of 3D models and the salient ... Read More »
Multi-view object class detection with a 3D geometric model
Liebelt, J. Schmid, C.Page(s): 1688 - 1695
Digital Object Identifier : 10.1109/CVPR.2010.5539836
AbstractPlus | Full Text: PDF (2415KB)
This paper presents a new approach for multi-view object class detection. Appearance and geometry are treated as separate learning tasks with different training data. Our approach uses a part model which discriminatively learns the object appearance with spatial pyramids from a database of real images, and encodes the 3D geometry of the object class with a generative representation built from a database of synthetic models. The geometric information is linked to the 2D training data and allows t... Read More »
Fast directional chamfer matching
Ming-Yu Liu Tuzel, O. Veeraraghavan, A. Chellappa, R.Page(s): 1696 - 1703
Digital Object Identifier : 10.1109/CVPR.2010.5539837
AbstractPlus | Full Text: PDF (2794KB)
We study the object localization problem in images given a single hand-drawn example or a gallery of shapes as the object model. Although many shape matching algorithms have been proposed for the problem over the decades, chamfer matching remains to be the preferred method when speed and robustness are considered. In this paper, we significantly improve the accuracy of chamfer matching while reducing the computational time from linear to sublinear (shown empirically). Specifically, we incorporat... Read More »
Scale-invariant heat kernel signatures for non-rigid shape recognition
Bronstein, M.M. Kokkinos, I.Page(s): 1704 - 1711
Digital Object Identifier : 10.1109/CVPR.2010.5539838
AbstractPlus | Full Text: PDF (2117KB)
One of the biggest challenges in non-rigid shape retrieval and comparison is the design of a shape descriptor that would maintain invariance under a wide class of transformations the shape can undergo. Recently, heat kernel signature was introduced as an intrinsic local shape descriptor based on diffusion scale-space analysis. In this paper, we develop a scale-invariant version of the heat kernel descriptor. Our construction is based on a logarithmically sampled scale-space in which shape scalin... Read More »
Object recognition as ranking holistic figure-ground hypotheses
Fuxin Li Carreira, J. Sminchisescu, C.Page(s): 1712 - 1719
Digital Object Identifier : 10.1109/CVPR.2010.5539839
AbstractPlus | Full Text: PDF (771KB)
We present an approach to visual object-class recognition and segmentation based on a pipeline that combines multiple, holistic figure-ground hypotheses generated in a bottom-up, object independent process. Decisions are performed based on continuous estimates of the spatial overlap between image segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption that good object-level segments can be obtained in a feed-forward fashion... Read More »
Finding nemo: Deformable object class modelling using curve matching
Prasad, M. Fitzgibbon, A. Zisserman, A. Van Gool, L.Page(s): 1720 - 1727
Digital Object Identifier : 10.1109/CVPR.2010.5539840
AbstractPlus | Full Text: PDF (5973KB)
Automatic attribution of ancient Roman imperial coins
Arandjelović, O.Page(s): 1728 - 1734
Digital Object Identifier : 10.1109/CVPR.2010.5539841
AbstractPlus | Full Text: PDF (1583KB)
Classification of coins is an important but laborious aspect of numismatics - the field that studies coins and currency. It is particularly challenging in the case of ancient coins. Due to the way they were manufactured, as well as wear from use and exposure to chemicals in the soil, the same ancient coin type can exhibit great variability in appearance. We demonstrate that geometry-free models of appearance do not perform better than chance on this task and that only a small improvement is gain... Read More »
Multiple object detection by sequential monte carlo and Hierarchical Detection Network
Sofka, M. Jingdan Zhang Zhou, S.K. Comaniciu, D.Page(s): 1735 - 1742
Digital Object Identifier : 10.1109/CVPR.2010.5539842
AbstractPlus | Full Text: PDF (2317KB)
In this paper, we propose a novel framework for detecting multiple objects in 2D and 3D images. Since a joint multi-object model is difficult to obtain in most practical situations, we focus here on detecting the objects sequentially, one-by-one. The interdependence of object poses and strong prior information embedded in our domain of medical images results in better performance than detecting the objects individually. Our approach is based on Sequential Estimation techniques, frequently applie... Read More »
Putting local features on a manifold
Torki, M. Elgammal, A.Page(s): 1743 - 1750
Digital Object Identifier : 10.1109/CVPR.2010.5539843
AbstractPlus | Full Text: PDF (1358KB) | Multimedia
Local features have proven very useful for recognition. Manifold learning has proven to be a very powerful tool in data analysis. However, manifold learning application for images are mainly based on holistic vectorized representations of images. The challenging question that we address in this paper is how can we learn image manifolds from a punch of local features in a smooth way that captures the feature similarity and spatial arrangement variability between images. We introduce a novel frame... Read More »
A generative perspective on MRFs in low-level vision
Schmidt, U. Qi Gao Roth, S.Page(s): 1751 - 1758
Digital Object Identifier : 10.1109/CVPR.2010.5539844
AbstractPlus | Full Text: PDF (1628KB) | Multimedia
Markov random fields (MRFs) are popular and generic probabilistic models of prior knowledge in low-level vision. Yet their generative properties are rarely examined, while application-specific models and non-probabilistic learning are gaining increased attention. In this paper we revisit the generative aspects of MRFs, and analyze the quality of common image priors in a fully application-neutral setting. Enabled by a general class of MRFs with flexible potentials and an efficient Gibbs sampler, ... Read More »
Manifold blurring mean shift algorithms for manifold denoising
Weiran Wang Carreira-Perpiñán, M.A.Page(s): 1759 - 1766
Digital Object Identifier : 10.1109/CVPR.2010.5539845
AbstractPlus | Full Text: PDF (8904KB) | Multimedia
We propose a new family of algorithms for denoising data assumed to lie on a low-dimensional manifold. The algorithms are based on the blurring mean-shift update, which moves each data point towards its neighbors, but constrain the motion to be orthogonal to the manifold. The resulting algorithms are nonparametric, simple to implement and very effective at removing noise while preserving the curvature of the manifold and limiting shrinkage. They deal well with extreme outliers and with variation... Read More »
Increasing depth resolution of electron microscopy of neural circuits using sparse tomographic reconstruction
Veeraraghavan, A. Genkin, A.V. Vitaladevuni, S. Scheffer, L. Shan Xu Hess, H. Fetter, R. Cantoni, M. Knott, G. Chklovskii, D.Page(s): 1767 - 1774
Digital Object Identifier : 10.1109/CVPR.2010.5539846
AbstractPlus | Full Text: PDF (3619KB)
Future progress in neuroscience hinges on reconstruction of neuronal circuits to the level of individual synapses. Because of the specifics of neuronal architecture, imaging must be done with very high resolution and throughput. While Electron Microscopy (EM) achieves the required resolution in the transverse directions, its depth resolution is a severe limitation. Computed tomography (CT) may be used in conjunction with electron microscopy to improve the depth resolution, but this severely limi... Read More »
SVM for edge-preserving filtering
Qingxiong Yang Shengnan Wang Ahuja, N.Page(s): 1775 - 1782
Digital Object Identifier : 10.1109/CVPR.2010.5539847
AbstractPlus | Full Text: PDF (1773KB)
In this paper, we propose a new method to construct an edge-preserving filter which has very similar response to the bilateral filter. The bilateral filter is a normalized convolution in which the weighting for each pixel is determined by the spatial distance from the center pixel and its relative difference in intensity range. The spatial and range weighting functions are typically Gaussian in the literature. In this paper, we cast the filtering problem as a vector-mapping approximation and sol... Read More »
Modeling and estimating persistent motion with geometric flows
Dahua Lin Grimson, E. Fisher, J.Page(s): 1 - 8
Digital Object Identifier : 10.1109/CVPR.2010.5539848
AbstractPlus | Full Text: PDF (1745KB)
We propose a principled framework to model persistent motion in dynamic scenes. In contrast to previous efforts on object tracking and optical flow estimation that focus on local motion, we primarily aim at inferring a global model of persistent and collective dynamics. With this in mind, we first introduce the concept of geometric flow that describes motion simultaneously over space and time, and derive a vector space representation based on Lie algebra. We then extend it to model complex motio... Read More »
Robust video denoising using low rank matrix completion
Hui Ji Chaoqiang Liu Zuowei Shen Yuhong XuPage(s): 1791 - 1798
Digital Object Identifier : 10.1109/CVPR.2010.5539849
AbstractPlus | Full Text: PDF (4533KB)
Most existing video denoising algorithms assume a single statistical model of image noise, e.g. additive Gaussian white noise, which often is violated in practice. In this paper, we present a new patch-based video denoising algorithm capable of removing serious mixed noise from the video data. By grouping similar patches in both spatial and temporal domain, we formulate the problem of removing mixed noise as a low-rank matrix completion problem, which leads to a denoising scheme without strong a... Read More »
Personalization of image enhancement
Sing Bing Kang Kapoor, A. Lischinski, D.Page(s): 1799 - 1806
Digital Object Identifier : 10.1109/CVPR.2010.5539850
AbstractPlus | Full Text: PDF (4800KB)
Adaptive linear predictors for real-time tracking
Holzer, S. Ilic, S. Navab, N.Page(s): 1807 - 1814
Digital Object Identifier : 10.1109/CVPR.2010.5539851
AbstractPlus | Full Text: PDF (1703KB) | Multimedia
Enlarging or reducing the template size by adding new parts, or removing parts of the template, according to their suitability for tracking, requires the ability to deal with the variation of the template size. For instance, real-time template tracking using linear predictors, although fast and reliable, requires using templates of fixed size and does not allow on-line modification of the predictor. To solve this problem we propose the Adaptive Linear Predictors (ALPs) which enable fast online m... Read More »
Transform coding for fast approximate nearest neighbor search in high dimensions
Brandt, J.Page(s): 1815 - 1822
Digital Object Identifier : 10.1109/CVPR.2010.5539852
AbstractPlus | Full Text: PDF (1341KB)
We examine the problem of large scale nearest neighbor search in high dimensional spaces and propose a new approach based on the close relationship between nearest neighbor search and that of signal representation and quantization. Our contribution is a very simple and efficient quantization technique using transform coding and product quantization. We demonstrate its effectiveness in several settings, including large-scale retrieval, nearest neighbor classification, feature matching, and simila... Read More »
Multilinear pose and body shape estimation of dressed subjects from image sets
Hasler, N. Ackermann, H. Rosenhahn, B. Thormählen, T. Seidel, H.Page(s): 1823 - 1830
Digital Object Identifier : 10.1109/CVPR.2010.5539853
AbstractPlus | Full Text: PDF (7207KB)
In this paper we propose a multilinear model of human pose and body shape which is estimated from a database of registered 3D body scans in different poses. The model is generated by factorizing the measurements into pose and shape dependent components. By combining it with an ICP based registration method, we are able to estimate pose and body shape of dressed subjects from single images. If several images of the subject are available, shape and poses can be optimized simultaneously for all inp... Read More »
Linear view synthesis using a dimensionality gap light field prior
Levin, A. Durand, F.Page(s): 1831 - 1838
Digital Object Identifier : 10.1109/CVPR.2010.5539854
AbstractPlus | Full Text: PDF (598KB)
Acquiring and representing the 4D space of rays in the world (the light field) is important for many computer vision and graphics applications. Yet, light field acquisition is costly due to their high dimensionality. Existing approaches either capture the 4D space explicitly, or involve an error-sensitive depth estimation process. This paper argues that the fundamental difference between different acquisition and rendering techniques is a difference between prior assumptions on the light field. ... Read More »
Multi-target tracking of time-varying spatial patterns
Jingchen Liu Yanxi LiuPage(s): 1839 - 1846
Digital Object Identifier : 10.1109/CVPR.2010.5539855
AbstractPlus | Full Text: PDF (3702KB) | Multimedia
Time-varying spatial patterns are common, but few computational tools exist for discovering and tracking multiple, sometimes overlapping, spatial structures of targets. We propose a multi-target tracking framework that takes advantage of spatial patterns inside the targets even though the number, the form and the regularity of such patterns vary with time. RANSAC-based model fitting algorithms are developed to automatically recognize (or dismiss) (il)legitimate patterns. Patterns are represented... Read More »
Abrupt motion tracking via adaptive stochastic approximation Monte Carlo sampling
Xiuzhuang Zhou Yao LuPage(s): 1847 - 1854
Digital Object Identifier : 10.1109/CVPR.2010.5539856
AbstractPlus | Full Text: PDF (807KB) | Multimedia
Robust tracking of abrupt motion is a challenging task in computer vision due to the large motion uncertainty. In this paper, we propose a stochastic approximation Monte Carlo (SAMC) based tracking scheme for abrupt motion problem in Bayesian filtering framework. In our tracking scheme, the particle weight is dynamically estimated by learning the density of states in simulations, and thus the local-trap problem suffered by the conventional MCMC sampling-based methods could be essentially avoided... Read More »
Boosting for transfer learning with multiple sources
Yi Yao Doretto, G.Page(s): 1855 - 1862
Digital Object Identifier : 10.1109/CVPR.2010.5539857
AbstractPlus | Full Text: PDF (762KB)
Transfer learning allows leveraging the knowledge of source domains, available a priori, to help training a classifier for a target domain, where the available data is scarce. The effectiveness of the transfer is affected by the relationship between source and target. Rather than improving the learning, brute force leveraging of a source poorly related to the target may decrease the classifier performance. One strategy to reduce this negative transfer is to import knowledge from multiple sources... Read More »
Energy minimization for linear envelope MRFs
Kohli, P. Kumar, M.P.Page(s): 1863 - 1870
Digital Object Identifier : 10.1109/CVPR.2010.5539858
AbstractPlus | Full Text: PDF (417KB)
Markov random fields with higher order potentials have emerged as a powerful model for several problems in computer vision. In order to facilitate their use, we propose a new representation for higher order potentials as upper and lower envelopes of linear functions. Our representation concisely models several commonly used higher order potentials, thereby providing a unified framework for minimizing the corresponding Gibbs energy functions. We exploit this framework by converting lower envelope... Read More »
Unified graph matching in Euclidean spaces
McAuley, J.J. de Campos, T. Caetano, T.S.Page(s): 1871 - 1878
Digital Object Identifier : 10.1109/CVPR.2010.5539859
AbstractPlus | Full Text: PDF (792KB)
Graph matching is a classical problem in pattern recognition with many applications, particularly when the graphs are embedded in Euclidean spaces, as is often the case for computer vision. There are several variants of the matching problem, concerned with isometries, isomorphisms, homeomorphisms, and node attributes; different approaches exist for each variant. We show how structured estimation methods from machine learning can be used to combine such variants into a single version of graph mat... Read More »
On-line semi-supervised multiple-instance boosting
Zeisl, B. Leistner, C. Saffari, A. Bischof, H.Page(s): 1879 - 1879
Digital Object Identifier : 10.1109/CVPR.2010.5539860
AbstractPlus | Full Text: PDF (893KB)
Robust RVM regression using sparse outlier model
Mitra, K. Veeraraghavan, A. Chellappa, R.Page(s): 1887 - 1894
Digital Object Identifier : 10.1109/CVPR.2010.5539861
AbstractPlus | Full Text: PDF (480KB)
Kernel regression techniques such as Relevance Vector Machine (RVM) regression, Support Vector Regression and Gaussian processes are widely used for solving many computer vision problems such as age, head pose, 3D human pose and lighting estimation. However, the presence of outliers in the training dataset makes the estimates from these regression techniques unreliable. In this paper, we propose robust versions of the RVM regression that can handle outliers in the training dataset. We decompose ... Read More »
Parametric dimensionality reduction by unsupervised regression
Carreira-Perpiñán, M.A. Zhengdong LuPage(s): 1895 - 1902
Digital Object Identifier : 10.1109/CVPR.2010.5539862
AbstractPlus | Full Text: PDF (7650KB) | Multimedia
We introduce a parametric version (pDRUR) of the recently proposed Dimensionality Reduction by Unsupervised Regression algorithm. pDRUR alternately minimizes reconstruction error by fitting parametric functions given latent coordinates and data, and by updating latent coordinates given functions (with a Gauss-Newton method decoupled over coordinates). Both the fit and the update become much faster while attaining results of similar quality, and afford dealing with far larger datasets (10... Read More »
Spherical embeddings for non-Euclidean dissimilarities
Wilson, R.C. Hancock, E.R. Pekalska, E. Duin, R.P.W.Page(s): 1903 - 1910
Digital Object Identifier : 10.1109/CVPR.2010.5539863
AbstractPlus | Full Text: PDF (328KB)
Many computer vision and pattern recognition problems may be posed by defining a way of measuring dissimilarities between patterns. For many types of data, these dissimilarities are not Euclidean, and may not be metric. In this paper, we provide a means of embedding such data. We aim to embed the data on a hypersphere whose radius of curvature is determined by the dissimilarity data. The hypersphere can be either of positive curvature (elliptic) or of negative curvature (hyperbolic). We give an ... Read More »
Moving vistas: Exploiting motion for describing scenes
Shroff, N. Turaga, P. Chellappa, R.Page(s): 1911 - 1918
Digital Object Identifier : 10.1109/CVPR.2010.5539864
AbstractPlus | Full Text: PDF (1006KB)
Scene recognition in an unconstrained setting is an open and challenging problem with wide applications. In this paper, we study the role of scene dynamics for improved representation of scenes. We subsequently propose dynamic attributes which can be augmented with spatial attributes of a scene for semantically meaningful categorization of dynamic scenes. We further explore accurate and generalizable computational models for characterizing the dynamics of unconstrained scenes. The large intra-cl... Read More »
Part and appearance sharing: Recursive Compositional Models for multi-view
Long Zhu Yuanhao Chen Torralba, A. Freeman, W. Yuille, A.Page(s): 1919 - 1926
Digital Object Identifier : 10.1109/CVPR.2010.5539865
AbstractPlus | Full Text: PDF (1644KB)
We propose Recursive Compositional Models (RCMs) for simultaneous multi-view multi-object detection and parsing (e.g. view estimation and determining the positions of the object subparts). We represent the set of objects by a family of RCMs where each RCM is a probability distribution defined over a hierarchical graph which corresponds to a specific object and viewpoint. An RCM is constructed from a hierarchy of subparts/subgraphs which are learnt from training data. Part-sharing is used so that... Read More »
Randomized hybrid linear modeling by local best-fit flats
Teng Zhang Szlam, A. Yi Wang Lerman, G.Page(s): 1927 - 1934
Digital Object Identifier : 10.1109/CVPR.2010.5539866
AbstractPlus | Full Text: PDF (272KB)
The hybrid linear modeling problem is to identify a set of d-dimensional affine sets in RD. It arises, for example, in object tracking and structure from motion. The hybrid linear model can be considered as the second simplest (behind linear) manifold model of data. In this paper we will present a very simple geometric method for hybrid linear modeling based on selecting a set of local best fit flats that minimize a global ℓ1 error measure. The size of the local neig... Read More »
Improving state-of-the-art OCR through high-precision document-specific modeling
Kae, A. Huang, G. Doersch, C. Learned-Miller, E.Page(s): 1935 - 1942
Digital Object Identifier : 10.1109/CVPR.2010.5539867
AbstractPlus | Full Text: PDF (460KB)
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not scanned at high resolution. Many current approaches rely on stored font models that are vulnerable to cases in which the document is noisy or is written in a font dissimilar to the stored fonts. We address these problems by learning character models directly from the document itself, rather than using pre-stored font models. This method has had some success in the past, but we are able to achieve... Read More »
Discriminative clustering for image co-segmentation
Joulin, A. Bach, F. Ponce, J.Page(s): 1943 - 1950
Digital Object Identifier : 10.1109/CVPR.2010.5539868
AbstractPlus | Full Text: PDF (2659KB)
Purely bottom-up, unsupervised segmentation of a single image into foreground and background regions remains a challenging task for computer vision. Co-segmentation is the problem of simultaneously dividing multiple images into regions (segments) corresponding to different object classes. In this paper, we combine existing tools for bottom-up image segmentation such as normalized cuts, with kernel methods commonly used in object recognition. These two sets of techniques are used within a discrim... Read More »
What's going on? Discovering spatio-temporal dependencies in dynamic scenes
Kuettel, D. Breitenstein, M.D. Van Gool, L. Ferrari, V.Page(s): 1951 - 1958
Digital Object Identifier : 10.1109/CVPR.2010.5539869
AbstractPlus | Full Text: PDF (2771KB)
We present two novel methods to automatically learn spatio-temporal dependencies of moving agents in complex dynamic scenes. They allow to discover temporal rules, such as the right of way between different lanes or typical traffic light sequences. To extract them, sequences of activities need to be learned. While the first method extracts rules based on a learned topic model, the second model called DDP-HMM jointly learns co-occurring activities and their time dependencies. To this end we emplo... Read More »
Visual event recognition in videos by learning from web data
Lixin Duan Dong Xu Tsang, I.W. Jiebo LuoPage(s): 1959 - 1966
Digital Object Identifier : 10.1109/CVPR.2010.5539870
AbstractPlus | Full Text: PDF (1166KB)
Temporal causality for the analysis of visual events
Prabhakar, K. Sangmin Oh Ping Wang Abowd, G.D. Rehg, J.M.Page(s): 1967 - 1974
Digital Object Identifier : 10.1109/CVPR.2010.5539871
AbstractPlus | Full Text: PDF (3512KB) | Multimedia
We present a novel approach to the causal temporal analysis of event data from video content. Our key observation is that the sequence of visual words produced by a space-time dictionary representation of a video sequence can be interpreted as a multivariate point-process. By using a spectral version of the pairwise test for Granger causality, we can identify patterns of interactions between words and group them into independent causal sets. We demonstrate qualitatively that this produces semant... Read More »
Anomaly detection in crowded scenes
Mahadevan, V. Weixin Li Bhalodia, V. Vasconcelos, N.Page(s): 1975 - 1981
Digital Object Identifier : 10.1109/CVPR.2010.5539872
AbstractPlus | Full Text: PDF (660KB)
A novel framework for anomaly detection in crowded scenes is presented. Three properties are identified as important for the design of a localized video representation suitable for anomaly detection in such scenes: (1) joint modeling of appearance and dynamics of the scene, and the abilities to detect (2) temporal, and (3) spatial abnormalities. The model for normal crowd behavior is based on mixtures of dynamic textures and outliers under this model are labeled as anomalies. Temporal anomalies ... Read More »
Illumination compensation based change detection using order consistency
Parameswaran, V. Singh, M. Ramesh, V.Page(s): 1982 - 1989
Digital Object Identifier : 10.1109/CVPR.2010.5539873
AbstractPlus | Full Text: PDF (560KB)
We present a change detection method resistant to global and local illumination variations for use in visual surveillance scenarios. Approaches designed thus far for robustness to illumination change are generally based either on color normalization, texture (e.g. edges, rank order statistics, etc.), or illumination compensation. Normalization based methods sacrifice discriminability while texture based methods cannot operate on texture-less regions. Both types of method can produce large missin... Read More »
Efficient action spotting based on a spacetime oriented structure representation
Derpanis, K.G. Sizintsev, M. Cannons, K. Wildes, R.P.Page(s): 1990 - 1997
Digital Object Identifier : 10.1109/CVPR.2010.5539874
AbstractPlus | Full Text: PDF (638KB) | Multimedia
This paper addresses action spotting, the spatiotemporal detection and localization of human actions in video. A novel compact local descriptor of video dynamics in the context of action spotting is introduced based on visual spacetime oriented energy measurements. This descriptor is efficiently computed directly from raw image intensity data and thereby forgoes the problems typically associated with flow-based features. An important aspect of the descriptor is that it allows for the comparison ... Read More »
Cross-dataset action detection
Liangliang Cao Zicheng Liu Huang, T.S.Page(s): 1998 - 2005
Digital Object Identifier : 10.1109/CVPR.2010.5539875
AbstractPlus | Full Text: PDF (551KB)
In recent years, many research works have been carried out to recognize human actions from video clips. To learn an effective action classifier, most of the previous approaches rely on enough training labels. When being required to recognize the action in a different dataset, these approaches have to re-train the model using new labels. However, labeling video sequences is a very tedious and time-consuming task, especially when detailed spatial locations and time durations are required. In this ... Read More »
Learning 3D action models from a few 2D videos for view invariant action recognition
Natarajan, P. Singh, V.K. Nevatia, R.Page(s): 20006 - 2013
Digital Object Identifier : 10.1109/CVPR.2010.5539876
AbstractPlus | Full Text: PDF (2007KB)
Most existing approaches for learning action models work by extracting suitable low-level features and then training appropriate classifiers. Such approaches require large amounts of training data and do not generalize well to variations in viewpoint, scale and across datasets. Some work has been done recently to learn multi-view action models from Mocap data, but obtaining such data is time consuming and requires costly infrastructure. We present a method that addresses both these issues by lea... Read More »
Exploiting simple hierarchies for unsupervised human behavior analysis
Nater, F. Grabner, H. Van Gool, L.Page(s): 2014 - 2021
Digital Object Identifier : 10.1109/CVPR.2010.5539877
AbstractPlus | Full Text: PDF (1901KB) | Multimedia
We propose a data-driven, hierarchical approach for the analysis of human actions in visual scenes. In particular, we focus on the task of in-house assisted living. In such scenarios the environment and the setting may vary considerably which limits the performance of methods with pre-trained models. Therefore our model of normality is established in a completely unsupervised manner and is updated automatically for scene-specific adaptation. The hierarchical representation on both an appearance ... Read More »
Clustering dynamic textures with the hierarchical EM algorithm
Chan, A.B. Coviello, E. Lanckriet, G.R.G.Page(s): 2022 - 2029
Digital Object Identifier : 10.1109/CVPR.2010.5539878
AbstractPlus | Full Text: PDF (244KB) | Multimedia
The dynamic texture (DT) is a probabilistic generative model, defined over space and time, that represents a video as the output of a linear dynamical system (LDS). The DT model has been applied to a wide variety of computer vision problems, such as motion segmentation, motion classification, and video registration. In this paper, we derive a new algorithm for clustering DT models that is based on the hierarchical EM algorithm. The proposed clustering algorithm is capable of both clustering DTs ... Read More »
Recognizing human actions from still images with latent poses
Weilong Yang Yang Wang Mori, G.Page(s): 2030 - 2037
Digital Object Identifier : 10.1109/CVPR.2010.5539879
AbstractPlus | Full Text: PDF (6276KB)
We consider the problem of recognizing human actions from still images. We propose a novel approach that treats the pose of the person in the image as latent variables that will help with recognition. Different from other work that learns separate systems for pose estimation and action recognition, then combines them in an ad-hoc fashion, our system is trained in an integrated fashion that jointly considers poses and actions. Our learning objective is designed to directly exploit the pose inform... Read More »
Group motion segmentation using a Spatio-Temporal Driving Force Model
Ruonan Li Chellappa, R.Page(s): 2038 - 2045
Digital Object Identifier : 10.1109/CVPR.2010.5539880
AbstractPlus | Full Text: PDF (1132KB)
Learning a hierarchy of discriminative space-time neighborhood features for human action recognition
Kovashka, A. Grauman, K.Page(s): 2046 - 2053
Digital Object Identifier : 10.1109/CVPR.2010.5539881
AbstractPlus | Full Text: PDF (855KB)
Recent work shows how to use local spatio-temporal features to learn models of realistic human actions from video. However, existing methods typically rely on a predefined spatial binning of the local descriptors to impose spatial information beyond a pure “bag-of-words” model, and thus may fail to capture the most informative space-time relationships. We propose to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category. Given ... Read More »
Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes
Shandong Wu Moore, B.E. Shah, M.Page(s): 2054 - 2060
Digital Object Identifier : 10.1109/CVPR.2010.5539882
AbstractPlus | Full Text: PDF (852KB)
A novel method for crowd flow modeling and anomaly detection is proposed for both coherent and incoherent scenes. The novelty is revealed in three aspects. First, it is a unique utilization of particle trajectories for modeling crowded scenes, in which we propose new and efficient representative trajectories for modeling arbitrarily complicated crowd flows. Second, chaotic dynamics are introduced into the crowd context to characterize complicated crowd motions by regulating a set of chaotic inva... Read More »
A Hough transform-based voting framework for action recognition
Yao, A. Gall, J. Van Gool, L.Page(s): 2061 - 2068
Digital Object Identifier : 10.1109/CVPR.2010.5539883
AbstractPlus | Full Text: PDF (874KB)
We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding votes in a spatio-temporal-action Hough space. The leaves of the trees form a discriminative multi-class codebook that share features between the action classes and vote for action centers in a probabilistic manner. Using low-level features such as gradients and optical flow, we de... Read More »
Scene understanding by statistical modeling of motion patterns
Saleemi, I. Hartung, L. Shah, M.Page(s): 2069 - 2076
Digital Object Identifier : 10.1109/CVPR.2010.5539884
AbstractPlus | Full Text: PDF (3334KB)
We present a novel method for the discovery and statistical representation of motion patterns in a scene observed by a static camera. Related methods involving learning of patterns of activity rely on trajectories obtained from object detection and tracking systems, which are unreliable in complex scenes of crowded motion. We propose a mixture model representation of salient patterns of optical flow, and present an algorithm for learning these patterns from dense optical flow in a hierarchical, ... Read More »
Spike train driven dynamical models for human actions
Raptis, M. Wnuk, K. Soatto, S.Page(s): 2077 - 2084
Digital Object Identifier : 10.1109/CVPR.2010.5539885
AbstractPlus | Full Text: PDF (385KB)
We investigate dynamical models of human motion that can support both synthesis and analysis tasks. Unlike coarser discriminative models that work well when action classes are nicely separated, we seek models that have fine-scale representational power and can therefore model subtle differences in the way an action is performed. To this end, we model an observed action as an (unknown) linear time-invariant dynamical model of relatively small order, driven by a sparse bounded input signal. Our mo... Read More »
Parallel and distributed graph cuts by dual decomposition
Strandmark, P. Kahl, F.Page(s): 2085 - 2092
Digital Object Identifier : 10.1109/CVPR.2010.5539886
AbstractPlus | Full Text: PDF (1715KB)
Graph cuts methods are at the core of many state-of-the-art algorithms in computer vision due to their efficiency in computing globally optimal solutions. In this paper, we solve the maximum flow/minimum cut problem in parallel by splitting the graph into multiple parts and hence, further increase the computational efficacy of graph cuts. Optimality of the solution is guaranteed by dual decomposition, or more specifically, the solutions to the subproblems are constrained to be equal on the overl... Read More »
Object separation in x-ray image sets
Heitz, G. Chechik, G.Page(s): 2093 - 2100
Digital Object Identifier : 10.1109/CVPR.2010.5539887
AbstractPlus | Full Text: PDF (2735KB)
In the segmentation of natural images, most algorithms rely on the concept of occlusion. In x-ray images, however, this assumption is violated, since x-ray photons penetrate most materials. In this paper, we introduce SATISφ, a method for separating objects in a set of x-ray images using the property of additivity in log space, where the log-attenuation at a pixel is the sum of the log-attenuations of all objects that the corresponding x-ray passes through. Our method leverages multiple p... Read More »
Learning full pairwise affinities for spectral segmentation
Tae Hoon Kim Kyoung Mu Lee Sang Uk LeePage(s): 2101 - 2108
Digital Object Identifier : 10.1109/CVPR.2010.5539888
AbstractPlus | Full Text: PDF (1806KB) | Multimedia
This paper studies the problem of learning a full range of pairwise affinities gained by integrating local grouping cues for spectral segmentation. The overall quality of the spectral segmentation depends mainly on the pairwise pixel affinities. By employing a semi-supervised learning technique, optimal affinities are learnt from the test image without iteration. We first construct a multi-layer graph with pixels and regions, generated by the mean shift algorithm, as nodes. By applying the semi-... Read More »
Isoperimetric cut on a directed graph
Mo Chen Ming Liu Jianzhuang Liu Xiaoou TangPage(s): 2109 - 2116
Digital Object Identifier : 10.1109/CVPR.2010.5539889
AbstractPlus | Full Text: PDF (719KB)
In this paper, we propose a novel probabilistic view of the spectral clustering algorithm. In our framework, the spectral clustering algorithm can be viewed as assigning class labels to samples to minimize the Bayes classification error rate by using a kernel density estimator (KDE). From this perspective, we propose to construct directed graphs using variable bandwidth KDEs. Such a variable bandwidth KDE based directed graph has the advantage that it encodes the local density information of the... Read More »
“Lattice Cut” - Constructing superpixels using layer constraints
Moore, A.P. Prince, S.J.D. Warrell, J.Page(s): 2117 - 2124
Digital Object Identifier : 10.1109/CVPR.2010.5539890
AbstractPlus | Full Text: PDF (3715KB)
A diffusion approach to seeded image segmentation
Juyong Zhang Jianmin Zheng Jianfei CaiPage(s): 2125 - 2132
Digital Object Identifier : 10.1109/CVPR.2010.5539891
AbstractPlus | Full Text: PDF (764KB)
Seeded image segmentation is a popular type of supervised image segmentation in computer vision and image processing. Previous methods of seeded image segmentation treat the image as a weighted graph and minimize an energy function on the graph to produce a segmentation. In this paper, we propose to conduct the seeded image segmentation according to the result of a heat diffusion process in which the seeded pixels are considered to be the heat sources and the heat diffuses on the image starting ... Read More »
Discrete minimum ratio curves and surfaces
Nicolls, F. Torr, P.H.S.Page(s): 2133 - 2140
Digital Object Identifier : 10.1109/CVPR.2010.5539892
AbstractPlus | Full Text: PDF (375KB)
Graph cuts have proven useful for image segmentation and for volumetric reconstruction in multiple view stereo. However, solutions are biased: the cost function tends to favour either a short boundary (in 2D) or a boundary with a small area (in 3D). This bias can be avoided by instead minimising the cut ratio, which normalises the cost by a measure of the boundary size. This paper uses ideas from discrete differential geometry to develop a linear programming formulation for finding a minimum rat... Read More »
Efficient hierarchical graph-based video segmentation
Grundmann, M. Kwatra, V. Mei Han Essa, I.Page(s): 2141 - 2148
Digital Object Identifier : 10.1109/CVPR.2010.5539893
AbstractPlus | Full Text: PDF (9159KB) | Multimedia
We present an efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, ... Read More »
A spatially varying PSF-based prior for alpha matting
Rhemann, C. Rother, C. Kohli, P. Gelautz, M.Page(s): 2149 - 2156
Digital Object Identifier : 10.1109/CVPR.2010.5539894
AbstractPlus | Full Text: PDF (2406KB) | Multimedia
In this paper we considerably improve on a state-of-the-art alpha matting approach by incorporating a new prior which is based on the image formation process. In particular, we model the prior probability of an alpha matte as the convolution of a high-resolution binary segmentation with the spatially varying point spread function (PSF) of the camera. Our main contribution is a new and efficient de-convolution approach that recovers the prior model, given an approximate alpha matte. By assuming t... Read More »
Simultaneous foreground, background, and alpha estimation for image matting
Price, B.L. Morse, B.S. Cohen, S.Page(s): 2157 - 2164
Digital Object Identifier : 10.1109/CVPR.2010.5539895
AbstractPlus | Full Text: PDF (3884KB)
Image matting is the process of extracting a soft segmentation of an object in an image as defined by the matting equation. Most current techniques focus largely on computing the alpha values of unknown pixels and treat computation of the foreground and background colors as an afterthought, if at all. However, for many applications, such as compositing an object into a new scene or deleting an object from the scene, the foreground and background colors are vital for an acceptable answer. We prop... Read More »
Fast matting using large kernel matting Laplacian matrices
Kaiming He Jian Sun Xiaoou TangPage(s): 2165 - 2172
Digital Object Identifier : 10.1109/CVPR.2010.5539896
AbstractPlus | Full Text: PDF (3361KB)
Image matting is of great importance in both computer vision and graphics applications. Most existing state-of-the-art techniques rely on large sparse matrices such as the matting Laplacian. However, solving these linear systems is often time-consuming, which is unfavored for the user interaction. In this paper, we propose a fast method for high quality matting. We first derive an efficient algorithm to solve a large kernel matting Laplacian. A large kernel propagates information more quickly an... Read More »
Fast approximate energy minimization with label costs
Delong, A. Osokin, A. Isack, H.N. Boykov, Y.Page(s): 2173 - 2180
Digital Object Identifier : 10.1109/CVPR.2010.5539897
AbstractPlus | Full Text: PDF (6263KB)
The α-expansion algorithm has had a significant impact in computer vision due to its generality, effectiveness, and speed. Thus far it can only minimize energies that involve unary, pairwise, and specialized higher-order terms. Our main contribution is to extend α-expansion so that it can simultaneously optimize “label costs” as well. An energy with label costs can penalize a solution based on the set of labels that appear in it. The simplest special case is to penali... Read More »
Parallel graph-cuts by adaptive bottom-up merging
Jiangyu Liu Jian SunPage(s): 2181 - 2188
Digital Object Identifier : 10.1109/CVPR.2010.5539898
AbstractPlus | Full Text: PDF (466KB)
Graph-cuts optimization is prevalent in vision and graphics problems. It is thus of great practical importance to parallelize the graph-cuts optimization using today's ubiquitous multi-core machines. However, the current best serial algorithm by Boykov and Kolmogorov (called the BK algorithm) still has the superior empirical performance. It is non-trivial to parallelize as expensive synchronization overhead easily offsets the advantage of parallelism. In this paper, we propose a novel adaptive b... Read More »
Transductive segmentation of live video with non-stationary background
Fan Zhong Xueying Qin Qunsheng PengPage(s): 2189 - 2196
Digital Object Identifier : 10.1109/CVPR.2010.5539899
AbstractPlus | Full Text: PDF (997KB) | Multimedia
Online foreground extraction is very difficult due to the complexity of real scenes. Almost all the previous methods assume that the background is stationary, which not only incur unreliable result due to background activities like dynamic shadow, moving background objects etc., but also makes them hard to be extended to the case of non-stationary background. In this paper we assume that the background is continuous instead of stationary, and present a transductive video segmentation method that... Read More »
Morphological snakes
Álvarez, L. Baumela, L. Henríquez, P. Márquez-Neila, P.Page(s): 2197 - 2202
Digital Object Identifier : 10.1109/CVPR.2010.5539900
AbstractPlus | Full Text: PDF (271KB)
Co-clustering of image segments using convex optimization applied to EM neuronal reconstruction
Vitaladevuni, S.N. Basri, R.Page(s): 2203 - 2210
Digital Object Identifier : 10.1109/CVPR.2010.5539901
AbstractPlus | Full Text: PDF (1199KB)
This paper addresses the problem of jointly clustering two segmentations of closely correlated images. We focus in particular on the application of reconstructing neuronal structures in over-segmented electron microscopy images. We formulate the problem of co-clustering as a quadratic semi-assignment problem and investigate convex relaxations using semidefinite and linear programming. We further introduce a linear programming method with manageable number of constraints and present an approach f... Read More »
Multi-domain, higher order level set scheme for 3D image segmentation on the GPU
Sharma, O. Qin Zhang Anton, F. Bajaj, C.Page(s): 2211 - 2216
Digital Object Identifier : 10.1109/CVPR.2010.5539902
AbstractPlus | Full Text: PDF (5521KB)
Level set method based segmentation provides an efficient tool for topological and geometrical shape handling. Conventional level set surfaces are only C0 continuous since the level set evolution involves linear interpolation to compute derivatives. Bajaj et al. present a higher order method to evaluate level set surfaces that are C2 continuous, but are slow due to high computational burden. In this paper, we provide a higher order GPU based solver for fast and efficient se... Read More »
A study on continuous max-flow and min-cut approaches
Jing Yuan Egil Bae Xue-Cheng TaiPage(s): 2217 - 2224
Digital Object Identifier : 10.1109/CVPR.2010.5539903
AbstractPlus | Full Text: PDF (2098KB) | Multimedia
We propose and study novel max-flow models in the continuous setting, which directly map the discrete graph-based max-flow problem to its continuous optimization formulation. We show such a continuous max-flow model leads to an equivalent min-cut problem in a natural way, as the corresponding dual model. In this regard, we revisit basic conceptions used in discrete max-flow / min-cut models and give their new explanations from a variational perspective. We also propose corresponding continuous m... Read More »
Object recognition by discriminative combinations of line segments and ellipses
Chia, A.Y.-S. Rahardja, S. Rajan, D. Leung, M.K.Page(s): 2225 - 2232
Digital Object Identifier : 10.1109/CVPR.2010.5539904
AbstractPlus | Full Text: PDF (7506KB)
We present a contour based approach to object recognition in real-world images. Contours are represented by generic shape primitives of line segments and ellipses. These primitives offer substantial flexibility to model complex shapes. We pair connected primitives as shape tokens, and learn category specific combinations of shape tokens. We do not restrict combinations to have a fixed number of tokens, but allow each combination to flexibly evolve to best represent a category. This, coupled with... Read More »
On detection of multiple object instances using hough transforms
Barinova, O. Lempitsky, V. Kohli, P.Page(s): 2233 - 2240
Digital Object Identifier : 10.1109/CVPR.2010.5539905
AbstractPlus | Full Text: PDF (7645KB)
To detect multiple objects of interest, the methods based on Hough transform use non-maxima supression or mode seeking in order to locate and to distinguish peaks in Hough images. Such postprocessing requires tuning of extra parameters and is often fragile, especially when objects of interest tend to be closely located. In the paper, we develop a new probabilistic framework that is in many ways related to Hough transform, sharing its simplicity and wide applicability. At the same time, the frame... Read More »
Cascade object detection with deformable part models
Felzenszwalb, P.F. Girshick, R.B. McAllester, D.Page(s): 2241 - 2248
Digital Object Identifier : 10.1109/CVPR.2010.5539906
AbstractPlus | Full Text: PDF (611KB)
We describe a general method for building cascade classifiers from part-based deformable models such as pictorial structures. We focus primarily on the case of star-structured models and show how a simple algorithm based on partial hypothesis pruning can speed up object detection by more than one order of magnitude without sacrificing detection accuracy. In our algorithm, partial hypotheses are pruned with a sequence of thresholds. In analogy to probably approximately correct (PAC) learning, we ... Read More »
Food recognition using statistics of pairwise local features
Shulin Yang Mei Chen Pomerleau, D. Sukthankar, R.Page(s): 2249 - 2256
Digital Object Identifier : 10.1109/CVPR.2010.5539907
AbstractPlus | Full Text: PDF (6428KB)
Food recognition is difficult because food items are de-formable objects that exhibit significant variations in appearance. We believe the key to recognizing food is to exploit the spatial relationships between different ingredients (such as meat and bread in a sandwich). We propose a new representation for food items that calculates pairwise statistics between local features computed over a soft pixel-level segmentation of the image into eight ingredient types. We accumulate these statistics in... Read More »
Dominant orientation templates for real-time detection of texture-less objects
Hinterstoisser, S. Lepetit, V. Ilic, S. Fua, P. Navab, N.Page(s): 2257 - 2264
Digital Object Identifier : 10.1109/CVPR.2010.5539908
AbstractPlus | Full Text: PDF (2446KB) | Multimedia
We present a method for real-time 3D object detection that does not require a time consuming training stage, and can handle untextured objects. At its core, is a novel template representation that is designed to be robust to small image transformations. This robustness based on dominant gradient orientations lets us test only a small subset of all possible pixel locations when parsing the image, and to represent a 3D object with a limited set of templates. We show that together with a binary rep... Read More »
The multiscale competitive code via sparse representation for palmprint verification
Wangmeng Zuo Zhouchen Lin Zhenhua Guo Zhang, D.Page(s): 2265 - 2272
Digital Object Identifier : 10.1109/CVPR.2010.5539909
AbstractPlus | Full Text: PDF (481KB) | Multimedia
Palm lines are the most important features for palmprint recognition. They are best considered as typical multiscale features, where the principal lines can be represented at a larger scale while the wrinkles at a smaller scale. Motivated by the success of coding-based palmprint recognition methods, this paper investigates a compact representation of multiscale palm line orientation features, and proposes a novel method called the sparse multiscale competitive code (SMCC). The SMCC method first ... Read More »
Learning a probabilistic model mixing 3D and 2D primitives for view invariant object recognition
Wenze Hu Song-Chun ZhuPage(s): 2273 - 2280
Digital Object Identifier : 10.1109/CVPR.2010.5539910
AbstractPlus | Full Text: PDF (2634KB)
Dense interest points
Tuytelaars, T.Page(s): 2281 - 2288
Digital Object Identifier : 10.1109/CVPR.2010.5539911
AbstractPlus | Full Text: PDF (4128KB)
Local features or image patches have become a standard tool in computer vision, with numerous application domains. Roughly speaking, two different types of patch-based image representations can be distinguished: interest points, such as corners or blobs, whose position, scale and shape are computed by a feature detector algorithm, and dense sampling, where patches of fixed size and shape are placed on a regular grid (possibly repeated over multiple scales). Interest points focus on `interesting'... Read More »
Two perceptually motivated strategies for shape classification
Temlyakov, A. Munsell, B.C. Waggoner, J.W. Song WangPage(s): 2289 - 2296
Digital Object Identifier : 10.1109/CVPR.2010.5539912
AbstractPlus | Full Text: PDF (302KB)
In this paper, we propose two new, perceptually motivated strategies to better measure the similarity of 2D shape instances that are in the form of closed contours. The first strategy handles shapes that can be decomposed into a base structure and a set of inward or outward pointing “strand” structures, where a strand structure represents a very thin, elongated shape part attached to the base structure. The similarity of two such shape contours can be better described by measuring ... Read More »
Large-scale image categorization with explicit data embedding
Perronnin, F. Sánchez, J. Yan LiuPage(s): 2297 - 2304
Digital Object Identifier : 10.1109/CVPR.2010.5539914
AbstractPlus | Full Text: PDF (270KB)
Kernel machines rely on an implicit mapping of the data such that non-linear classification in the original space corresponds to linear classification in the new space. As kernel machines are difficult to scale to large training sets, it has been proposed to perform an explicit mapping of the data and to learn directly linear classifiers in the new space. In this paper, we consider the problem of learning image categorizers on large image sets (e.g. > 100k images) using bag-of-visual-words (B... Read More »
Probabilistic models for supervised dictionary learning
Xiao-Chen Lian Zhiwei Li Changhu Wang Bao-Liang Lu Lei ZhangPage(s): 2305 - 2312
Digital Object Identifier : 10.1109/CVPR.2010.5539915
AbstractPlus | Full Text: PDF (4731KB)
Dictionary generation is a core technique of the bag-of-visual-words (BOV) models when applied to image categorization. Most of previous approaches generate dictionaries by unsupervised clustering techniques, e.g. k-means. However, the features obtained by such kind of dictionaries may not be optimal for image classification. In this paper, we propose a probabilistic model for supervised dictionary learning (SDLM) which seamlessly combines an unsupervised model (a Gaussian Mixture Model) and a s... Read More »
Use bin-ratio information for category and scene classification
Nianhua Xie Haibin Ling Weiming Hu Xiaoqin ZhangPage(s): 2313 - 2319
Digital Object Identifier : 10.1109/CVPR.2010.5539917
AbstractPlus | Full Text: PDF (9415KB)
In this paper we propose using bin-ratio information, which is collected from the ratios between bin values of histograms, for scene and category classification. To use such information, a new histogram dissimilarity, bin-ratio dissimilarity (BRD), is designed. We show that BRD provides several attractive advantages for category and scene classification tasks: First, BRD is robust to cluttering, partial occlusion and histogram normalization; Second, BRD captures rich co-occurrence information wh... Read More »
Learning weights for codebook in image classification and retrieval
Hongping Cai Fei Yan Mikolajczyk, K.Page(s): 2320 - 2327
Digital Object Identifier : 10.1109/CVPR.2010.5539918
AbstractPlus | Full Text: PDF (4909KB)
This paper presents a codebook learning approach for image classification and retrieval. It corresponds to learning a weighted similarity metric to satisfy that the weighted similarity between the same labeled images is larger than that between the differently labeled images with largest margin. We formulate the learning problem as a convex quadratic programming and adopt alternating optimization to solve it efficiently. Experiments on both synthetic and real datasets validate the approach. The ... Read More »
The role of features, algorithms and data in visual recognition
Parikh, D. Zitnick, C.L.Page(s): 2328 - 2335
Digital Object Identifier : 10.1109/CVPR.2010.5539920
AbstractPlus | Full Text: PDF (496KB)
There are many computer vision algorithms developed for visual (scene and object) recognition. Some systems focus on involved learning algorithms, some leverage millions of training images, and some systems focus on modeling relevant information (features) with the goal of effective recognition. However, none of these systems come close to human capabilities. If we study human responses on similar problems we could gain insight into which of the three factors (1) learning algorithm (2) amount of... Read More »
Global Gaussian approach for scene categorization using information geometry
Nakayama, H. Harada, T. Kuniyoshi, Y.Page(s): 2336 - 2343
Digital Object Identifier : 10.1109/CVPR.2010.5539921
AbstractPlus | Full Text: PDF (628KB)
Local features provide powerful cues for generic image recognition. An image is represented by a “bag” of local features, which form a probabilistic distribution in the feature space. The problem is how to exploit the distributions efficiently. One of the most successful approaches is the bag-of-keypoints scheme, which can be interpreted as sparse sampling of high-level statistics, in the sense that it describes a complex structure of a local feature distribution using a relatively... Read More »
Asymmetric region-to-image matching for comparing images with generic object categories
Jaechul Kim Grauman, K.Page(s): 2344 - 2351
Digital Object Identifier : 10.1109/CVPR.2010.5539923
AbstractPlus | Full Text: PDF (1025KB)
We present a feature matching algorithm that leverages bottom-up segmentation. Unlike conventional image-to-image or region-to-region matching algorithms, our method finds corresponding points in an “asymmetric” manner, matching features within each region of a segmented image to a second unsegmented image. We develop a dynamic programming solution to efficiently identify corresponding points for each region, so as to maximize both geometric consistency and appearance similarity. T... Read More »
Attribute-centric recognition for cross-category generalization
Farhadi, A. Endres, I. Hoiem, D.Page(s): 2352 - 2359
Digital Object Identifier : 10.1109/CVPR.2010.5539924
AbstractPlus | Full Text: PDF (2864KB)
Person re-identification by symmetry-driven accumulation of local features
Farenzena, M. Bazzani, L. Perina, A. Murino, V. Cristani, M.Page(s): 2360 - 2367
Digital Object Identifier : 10.1109/CVPR.2010.5539926
AbstractPlus | Full Text: PDF (867KB)
In this paper, we present an appearance-based method for person re-identification. It consists in the extraction of features that model three complementary aspects of the human appearance: the overall chromatic content, the spatial arrangement of colors into stable regions, and the presence of recurrent local motifs with high entropy. All this information is derived from different body parts, and weighted opportunely by exploiting symmetry and asymmetry perceptual principles. In this way, robust... Read More »
Measuring visual saliency by Site Entropy Rate
Wei Wang Yizhou Wang Qingming Huang Wen GaoPage(s): 2368 - 2375
Digital Object Identifier : 10.1109/CVPR.2010.5539927
AbstractPlus | Full Text: PDF (1988KB)
In this paper, we propose a new computational model for visual saliency derived from the information maximization principle. The model is inspired by a few well acknowledged biological facts. To compute the saliency spots of an image, the model first extracts a number of sub-band feature maps using learned sparse codes. It adopts a fully-connected graph representation for each feature map, and runs random walks on the graphs to simulate the signal/information transmission among the interconnecte... Read More »
Context-aware saliency detection
Goferman, S. Zelnik-Manor, L. Tal, A.Page(s): 2376 - 2383
Digital Object Identifier : 10.1109/CVPR.2010.5539929
AbstractPlus | Full Text: PDF (4246KB)
We propose a new type of saliency - context-aware saliency - which aims at detecting the image regions that represent the scene. This definition differs from previous definitions whose goal is to either identify fixation points or detect the dominant object. In accordance with our saliency definition, we present a detection algorithm which is based on four principles observed in the psychological literature. The benefits of the proposed approach are evaluated in two applications where the contex... Read More »
Minimum length in the tangent bundle as a model for curve completion
Ben-Yosef, G. Ben-Shahar, O.Page(s): 2384 - 2391
Digital Object Identifier : 10.1109/CVPR.2010.5539930
AbstractPlus | Full Text: PDF (804KB)
The phenomenon of visual curve completion, where the visual system completes the missing part (e.g., due to occlusion) between two contour fragments, is a major problem in perceptual organization research. Previous computational approaches for the shape of the completed curve typically follow formal descriptions of desired, image-based perceptual properties (e.g, minimum total curvature, roundedness, etc.). Unfortunately, however, it is difficult to determine such desired properties psychophysic... Read More »
Removing rolling shutter wobble
Baker, S. Bennett, E. Sing Bing Kang Szeliski, R.Page(s): 2392 - 2399
Digital Object Identifier : 10.1109/CVPR.2010.5539932
AbstractPlus | Full Text: PDF (1081KB) | Multimedia
We present an algorithm to remove wobble artifacts from a video captured with a rolling shutter camera undergoing large accelerations or jitter. We show how estimating the rapid motion of the camera can be posed as a temporal super-resolution problem. The low-frequency measurements are the motions of pixels from one frame to the next. These measurements are modeled as temporal integrals of the underlying high-frequency jitter of the camera. The estimated high-frequency motion of the camera is th... Read More »
Super resolution using edge prior and single image detail synthesis
Yu-Wing Tai Shuaicheng Liu Brown, M.S. Lin, S.Page(s): 2400 - 2407
Digital Object Identifier : 10.1109/CVPR.2010.5539933
AbstractPlus | Full Text: PDF (8599KB)
Edge-directed image super resolution (SR) focuses on ways to remove edge artifacts in upsampled images. Under large magnification, however, textured regions become blurred and appear homogenous, resulting in a super-resolution image that looks unnatural. Alternatively, learning-based SR approaches use a large database of exemplar images for “hallucinating” detail. The quality of the upsampled image, especially about edges, is dependent on the suitability of the training images. Thi... Read More »
Coded exposure imaging for projective motion deblurring
Yu-Wing Tai Naejin Kong Lin, S. Sung Yong ShinPage(s): 2408 - 2415
Digital Object Identifier : 10.1109/CVPR.2010.5539935
AbstractPlus | Full Text: PDF (2182KB)
We propose a method for deblurring of spatially variant object motion. A principal challenge of this problem is how to estimate the point spread function (PSF) of the spatially variant blur. Based on the projective motion blur model of, we present a blur estimation technique that jointly utilizes a coded exposure camera and simple user interactions to recover the PSF. With this spatially variant PSF, objects that exhibit projective motion can be effectively de-blurred. We validate this method wi... Read More »
DARTs: Efficient scale-space extraction of DAISY keypoints
Marimon, D. Bonnin, A. Adamek, T. Gimeno, R.Page(s): 2416 - 2423
Digital Object Identifier : 10.1109/CVPR.2010.5539936
AbstractPlus | Full Text: PDF (3145KB) | Multimedia
Winder et al. have recently shown the superiority of the DAISY descriptor in comparison to other widely extended descriptors such as SIFT and SURF. Motivated by those results, we present a novel algorithm that extracts viewpoint and illumination invariant keypoints and describes them with a particular implementation of a DAISY-like layout. We demonstrate how to efficiently compute the scale-space and re-use this information for the descriptor. Comparison to similar approaches such as SIFT and SU... Read More »
Generating sharp panoramas from motion-blurred videos
Yunpeng Li Sing Bing Kang Joshi, N. Seitz, S.M. Huttenlocher, D.P.Page(s): 2424 - 2431
Digital Object Identifier : 10.1109/CVPR.2010.5539938
AbstractPlus | Full Text: PDF (1704KB)
In this paper, we show how to generate a sharp panorama from a set of motion-blurred video frames. Our technique is based on joint global motion estimation and multi-frame deblurring. It also automatically computes the duty cycle of the video, namely the percentage of time between frames that is actually exposure time. The duty cycle is necessary for allowing the blur kernels to be accurately extracted and then removed. We demonstrate our technique on a number of videos. Read More »
Secrets of optical flow estimation and their principles
Deqing Sun Roth, S. Black, M.J.Page(s): 2432 - 2439
Digital Object Identifier : 10.1109/CVPR.2010.5539939
AbstractPlus | Full Text: PDF (799KB)
Robust flash deblurring
Shaojie Zhuo Dong Guo Sim, T.Page(s): 2440 - 2447
Digital Object Identifier : 10.1109/CVPR.2010.5539941
AbstractPlus | Full Text: PDF (7659KB)
Motion blur due to camera shake is an annoying yet common problem in low-light photography. In this paper, we propose a novel method to recover a sharp image from a pair of motion blurred and flash images, consecutively captured using a hand-held camera. We first introduce a robust flash gradient constraint by exploiting the correlation between a sharp image and its corresponding flash image. Then we formulate our flash deblurring as solving a maximum-a-posteriori problem under the flash gradien... Read More »
Recovering fluid-type motions using Navier-Stokes potential flow
Feng Li Liwei Xu Guyenne, P. Jingyi YuPage(s): 2448 - 2455
Digital Object Identifier : 10.1109/CVPR.2010.5539942
AbstractPlus | Full Text: PDF (1300KB)
The classical optical flow assumes that a feature point maintains constant brightness across the frames. For fluid-type motions such as smoke or clouds, the constant brightness assumption does not hold, and accurately estimating the motion flow from their images is difficult. In this paper, we introduce a simple but effective Navier-Stokes (NS) potential flow model for recovering fluid-type motions. Our method treats the image as a wavefront surface and models the 3D potential flow beneath the s... Read More »
Sparsity model for robust optical flow estimation at motion discontinuities
Xiaohui Shen Ying WuPage(s): 2456 - 2463
Digital Object Identifier : 10.1109/CVPR.2010.5539944
AbstractPlus | Full Text: PDF (473KB)
This paper introduces a new sparsity prior to the estimation of dense flow fields. Based on this new prior, a complex flow field with motion discontinuities can be accurately estimated by finding the sparsest representation of the flow field in certain domains. In addition, a stronger additional sparsity constraint on the flow gradients is incorporated into the model to cope with the measurement noises. Robust estimation techniques are also employed to identify the outliers and to refine the res... Read More »
Motion estimation with non-local total variation regularization
Werlberger, M. Pock, T. Bischof, H.Page(s): 2464 - 2471
Digital Object Identifier : 10.1109/CVPR.2010.5539945
AbstractPlus | Full Text: PDF (5054KB)
State-of-the-art motion estimation algorithms suffer from three major problems: Poorly textured regions, occlusions and small scale image structures. Based on the Gestalt principles of grouping we propose to incorporate a low level image segmentation process in order to tackle these problems. Our new motion estimation algorithm is based on non-local total variation regularization which allows us to integrate the low level image segmentation process in a unified variational framework. Numerical r... Read More »
Robust classification of objects, faces, and flowers using natural image statistics
Kanan, C. Cottrell, G.Page(s): 2472 - 2479
Digital Object Identifier : 10.1109/CVPR.2010.5539947
AbstractPlus | Full Text: PDF (1649KB) | Multimedia
Classification of images in many category datasbets has rapidly improved in recent years. However, systems that perform well on particular datasets typically have one or more limitations such as a failure to generalize across visual tasks (e.g., requiring a face detector or extensive retuning of parameters), insufficient translation invariance, inability to cope with partial views and occlusion, or significant performance degradation as the number of classes is increased. Here we attempt to over... Read More »
Fast image alignment in the Fourier domain
Ashraf, A.B. Lucey, S. Tsuhan ChenPage(s): 2480 - 2487
Digital Object Identifier : 10.1109/CVPR.2010.5539948
AbstractPlus | Full Text: PDF (259KB)
In this paper we propose a framework for gradient descent image alignment in the Fourier domain. Specifically, we propose an extension to the classical Lucas & Kanade (LK) algorithm where we represent the source and template image's intensity pixels in the complex 2D Fourier domain rather than in the 2D spatial domain. We refer to this approach as the Fourier LK (FLK) algorithm. The FLK formulation is especially advantageous, over traditional LK, when it comes to pre-processing the source an... Read More »
Boundary Learning by Optimization with Topological Constraints
Jain, V. Bollmann, B. Richardson, M. Berger, D.R. Helmstaedter, M.N. Briggman, K.L. Denk, W. Bowden, J.B. Mendenhall, J.M. Abraham, W.C. Harris, K.M. Kasthuri, N. Hayworth, K.J. Schalek, R. Tapia, J.C. Lichtman, J.W. Seung, H.S.Page(s): 2488 - 2495
Digital Object Identifier : 10.1109/CVPR.2010.5539950
AbstractPlus | Full Text: PDF (1277KB) | Multimedia
Recent studies have shown that machine learning can improve the accuracy of detecting object boundaries in images. In the standard approach, a boundary detector is trained by minimizing its pixel-level disagreement with human boundary tracings. This naive metric is problematic because it is overly sensitive to boundary locations. This problem is solved by metrics provided with the Berkeley Segmentation Dataset, but these can be insensitive to topological differences, such as gaps in boundaries. ... Read More »
Beyond trees: MRF inference via outer-planar decomposition
Batra, D. Gallagher, A.C. Parikh, D. Tsuhan ChenPage(s): 2496 - 2503
Digital Object Identifier : 10.1109/CVPR.2010.5539951
AbstractPlus | Full Text: PDF (4006KB)
Maximum a posteriori (MAP) inference in Markov Random Fields (MRFs) is an NP-hard problem, and thus research has focussed on either finding efficiently solvable subclasses (e.g. trees), or approximate algorithms (e.g. Loopy Belief Propagation (BP) and Tree-reweighted (TRW) methods). This paper presents a unifying perspective of these approximate techniques called "Decomposition Methods". These are methods that decompose the given problem over a graph into tractable subproblems over subgraphs and... Read More »
Optical flow estimation with adaptive convolution kernel prior on discrete framework
Kyong Joon Lee Dongjin Kwon Il Dong Yun Sang Uk LeePage(s): 2504 - 2511
Digital Object Identifier : 10.1109/CVPR.2010.5539953
AbstractPlus | Full Text: PDF (2077KB)
We present a new energy model for optical flow estimation on discrete MRF framework. The proposed model yields discrete analog to the prevailing model with diffusion tensor-based regularizer, which has been optimized by variational approach. Inspired from the fact that the regularization process works as a convolution kernel filtering, we formulate the difference between original flow and filtered flow as a smoothness prior. Then the discrete framework enables us to employ a robust penalizer les... Read More »
Analyzing spatially-varying blur
Chakrabarti, A. Zickler, T. Freeman, W.T.Page(s): 2512 - 2519
Digital Object Identifier : 10.1109/CVPR.2010.5539954
AbstractPlus | Full Text: PDF (632KB)
Highly accurate boundary detection and grouping
Kokkinos, I.Page(s): 2520 - 2527
Digital Object Identifier : 10.1109/CVPR.2010.5539956
AbstractPlus | Full Text: PDF (1357KB)
In this work we address boundary detection and boundary grouping. We first pursue a learning-based approach to boundary detection. For this (i) we leverage appearance and context information by extracting descriptors around edgels and use them as features for classification, (ii) we use discriminative dimensionality reduction for efficiency and (iii) we use outlier-resilient boosting to deal with noise in the training set. We then introduce fractional-linear programming to optimize a grouping cr... Read More »
Deconvolutional networks
Zeiler, M.D. Krishnan, D. Taylor, G.W. Fergus, R.Page(s): 2528 - 2535
Digital Object Identifier : 10.1109/CVPR.2010.5539957
AbstractPlus | Full Text: PDF (1983KB)
Building robust low and mid-level image representations, beyond edge primitives, is a long-standing goal in vision. Many existing feature detectors spatially pool edge information which destroys cues such as edge intersections, parallelism and symmetry. We present a learning framework where features that capture these mid-level cues spontaneously emerge from image data. Our approach is based on the convolutional decomposition of images under a spar-sity constraint and is totally unsupervised. By... Read More »
Diffusion filtering without parameter tuning: Models and inference tools
Krajsek, K. Scharr, H.Page(s): 2536 - 2543
Digital Object Identifier : 10.1109/CVPR.2010.5539959
AbstractPlus | Full Text: PDF (1811KB)
Relations between deterministic (e.g. variational or PDE based methods) and Bayesian inference have been known for a long time. However, a classification of deterministic approaches into those methods which can be handled within a Bayesian framework and those with no such statistical counterpart is still missing in literature. After providing such taxonomy, we present a Bayesian framework for embedding the former ones into a statistical context allowing to equip them with advantages of probabili... Read More »
Visual object tracking using adaptive correlation filters
Bolme, D.S. Beveridge, J.R. Draper, B.A. Yui Man LuiPage(s): 2544 - 2550
Digital Object Identifier : 10.1109/CVPR.2010.5539960
AbstractPlus | Full Text: PDF (2733KB) | Multimedia
Although not commonly used, correlation filters can track complex objects through rotations, occlusions and other distractions at over 20 times the rate of current state-of-the-art techniques. The oldest and simplest correlation filters use simple templates and generally fail when applied to tracking. More modern approaches such as ASEF and UMACE perform better, but their training needs are poorly suited to tracking. Visual tracking requires robust filters to be trained from a single frame and d... Read More »
Modeling pixel means and covariances using factorized third-order boltzmann machines
Ranzato, M.A. Hinton, G.E.Page(s): 2551 - 2558
Digital Object Identifier : 10.1109/CVPR.2010.5539962
AbstractPlus | Full Text: PDF (2940KB)
Learning a generative model of natural images is a useful way of extracting features that capture interesting regularities. Previous work on learning such models has focused on methods in which the latent features are used to determine the mean and variance of each pixel independently, or on methods in which the hidden units determine the covariance matrix of a zero-mean Gaussian distribution. In this work, we propose a probabilistic model that combines these two approaches into a single framewo... Read More »
Learning mid-level features for recognition
Boureau, Y.-L. Bach, F. LeCun, Y. Ponce, J.Page(s): 2559 - 2566
Digital Object Identifier : 10.1109/CVPR.2010.5539963
AbstractPlus | Full Text: PDF (213KB)
Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be broken down into two steps: (1) a coding step, which performs a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pooling step, which summarizes the coded features over larger neighborhoods. Several combinations of coding a... Read More »
Face recognition based on image sets
Cevikalp, H. Triggs, B.Page(s): 2567 - 2573
Digital Object Identifier : 10.1109/CVPR.2010.5539965
AbstractPlus | Full Text: PDF (839KB)
We introduce a novel method for face recognition from image sets. In our setting each test and training example is a set of images of an individual's face, not just a single image, so recognition decisions need to be based on comparisons of image sets. Methods for this have two main aspects: the models used to represent the individual image sets; and the similarity metric used to compare the models. Here, we represent images as points in a linear or affine feature space and characterize each ima... Read More »
Unsupervised discovery of facial events
Feng Zhou De la Torre, F. Cohn, J.F.Page(s): 2574 - 2581
Digital Object Identifier : 10.1109/CVPR.2010.5539966
AbstractPlus | Full Text: PDF (7585KB)
Automatic facial image analysis has been a long standing research problem in computer vision. A key component in facial image analysis, largely conditioning the success of subsequent algorithms (e.g. facial expression recognition), is to define a vocabulary of possible dynamic facial events. To date, that vocabulary has come from the anatomically-based Facial Action Coding System (FACS) or more subjective approaches (i.e. emotion-specified expressions). The aim of this paper is to discover facia... Read More »
3D morphable model construction for robust ear and face recognition
Bustard, J.D. Nixon, M.S.Page(s): 2582 - 2589
Digital Object Identifier : 10.1109/CVPR.2010.5539968
AbstractPlus | Full Text: PDF (2066KB)
Recent work suggests that the human ear varies significantly between different subjects and can be used for identification. In principle, therefore, using ears in addition to the face within a recognition system could improve accuracy and robustness, particularly for non-frontal views. The paper describes work that investigates this hypothesis using an approach based on the construction of a 3D morphable model of the head and ear. One issue with creating a model that includes the ear is that exi... Read More »
Bimodal gender recognition from face and fingerprint
Xiong Li Xu Zhao Yun Fu Yuncai LiuPage(s): 2590 - 2597
Digital Object Identifier : 10.1109/CVPR.2010.5539969
AbstractPlus | Full Text: PDF (419KB)
Towards general motion-based face recognition
Ning Ye Sim, T.Page(s): 2598 - 2605
Digital Object Identifier : 10.1109/CVPR.2010.5539971
AbstractPlus | Full Text: PDF (1057KB)
Motion-based face recognition is a young research topic, inspired mainly by psychological studies on motion-based perception of human faces. Unlike its close relative, appearance-based face recognition, motion-based face recognition extracts personal characteristics from facial motion (e.g. smile) and uses the information to recognize human identity. However, existing studies in this field are limited to fixed motion, that is - a subject must perform a specific type of facial motion in order to ... Read More »
Morphable Reflectance Fields for enhancing face recognition
Kumar, R. Jones, M. Marks, T.K.Page(s): 2606 - 2613
Digital Object Identifier : 10.1109/CVPR.2010.5539972
AbstractPlus | Full Text: PDF (1007KB)
In this paper, we present a novel framework to address the confounding effects of illumination variation in face recognition. By augmenting the gallery set with realistically relit images, we enhance recognition performance in a classifier-independent way. We describe a novel method for single-image relighting, Morphable Reflectance Fields (MoRF), which does not require manual intervention and provides relighting superior to that of existing automatic methods. We test our framework through face ... Read More »
Removal of 3D facial expressions: A learning-based approach
Gang Pan Song Han Zhaohui Wu Yuting ZhangPage(s): 2614 - 2621
Digital Object Identifier : 10.1109/CVPR.2010.5539974
AbstractPlus | Full Text: PDF (425KB)
This paper focuses on the task of recovering the neutral 3D face of a person when given his/her 3D face model with facial expression. We propose a learning-based expression removal framework to tackle this task. Our basic idea is to model expression residue from samples, and then use the inferred expression residue from the input expressional face model to recover the neutral one. A two-step non-rigid alignment method is introduced to make all the face models topologically share a common structu... Read More »
Multi-task warped Gaussian process for personalized age estimation
Yu Zhang Dit-Yan YeungPage(s): 2622 - 2629
Digital Object Identifier : 10.1109/CVPR.2010.5539975
AbstractPlus | Full Text: PDF (382KB)
Automatic age estimation from facial images has aroused research interests in recent years due to its promising potential for some computer vision applications. Among the methods proposed to date, personalized age estimation methods generally outperform global age estimation methods by learning a separate age estimator for each person in the training data set. However, since typical age databases only contain very limited training data for each person, training a separate age estimator using onl... Read More »
Learning shift-invariant sparse representation of actions
Yi Li Fermuller, C. Aloimonos, Y. Hui JiPage(s): 2630 - 2637
Digital Object Identifier : 10.1109/CVPR.2010.5539977
AbstractPlus | Full Text: PDF (2394KB)
A central problem in the analysis of motion capture (MoCap) data is how to decompose motion sequences into primitives. Ideally, a description in terms of primitives should facilitate the recognition, synthesis, and characterization of actions. We propose an unsupervised learning algorithm for automatically decomposing joint movements in human motion capture (MoCap) sequences into shift-invariant basis functions. Our formulation models the time series data of joint movements in actions as a spars... Read More »
Exploring facial expressions with compositional features
Peng Yang Qingshan Liu Metaxas, D.N.Page(s): 2638 - 2644
Digital Object Identifier : 10.1109/CVPR.2010.5539978
AbstractPlus | Full Text: PDF (313KB)
Most previous work focuses on how to learn discriminating appearance features over all the face without considering the fact that each facial expression is physically composed of some relative action units (AU). However, the definition of AU is an ambiguous semantic description in Facial Action Coding System (FACS), so it makes accurate AU detection very difficult. In this paper, we adopt a scheme of compromise to avoid AU detection, and try to interpret facial expression by learning some compos... Read More »
An extension of multifactor analysis for face recognition based on submanifold learning
Sung Won Park Savvides, M.Page(s): 2645 - 2652
Digital Object Identifier : 10.1109/CVPR.2010.5539980
AbstractPlus | Full Text: PDF (1171KB)
Lately, Multilinear Principal Component Analysis (MPCA) has been successfully applied to face recognition since MPCA provides analysis of multiple factors of face images such as people's identities, viewpoints, and lighting conditions. MPCA employees multiple linear subspaces constructed by varying factors. In this paper, we propose nonlinear submanifold analysis, which can represent the variation of each factor more accurately than the conventional multilinear subspace analysis. Based on subman... Read More »
Making specific features less discriminative to improve point-based 3D object recognition
Hsiao, E. Collet, A. Hebert, M.Page(s): 2653 - 2660
Digital Object Identifier : 10.1109/CVPR.2010.5539981
AbstractPlus | Full Text: PDF (2684KB)
We present a framework that retains ambiguity in feature matching to increase the performance of 3D object recognition systems. Whereas previous systems removed ambiguous correspondences during matching, we show that ambiguity should be resolved during hypothesis testing and not at the matching phase. To preserve ambiguity during matching, we vector quantize and match model features in a hierarchical manner. This matching technique allows our system to be more robust to the distribution of model... Read More »
Cost-sensitive subspace learning for face recognition
Jiwen Lu Yap-Peng TanPage(s): 2661 - 2666
Digital Object Identifier : 10.1109/CVPR.2010.5539983
AbstractPlus | Full Text: PDF (462KB)
Conventional subspace learning-based face recognition aims to attain low recognition errors and assumes same loss from all misclassifications. In many real-world face recognition applications, however, this assumption may not hold as different misclassifications could lead to different losses. For example, it may cause inconvenience to a gallery person who is mis-recognized as an impostor and not allowed to enter the room by a face recognition-based door-locker, but it could result in a serious ... Read More »
Calibration-free gaze sensing using saliency maps
Sugano, Y. Matsushita, Y. Sato, Y.Page(s): 2667 - 2674
Digital Object Identifier : 10.1109/CVPR.2010.5539984
AbstractPlus | Full Text: PDF (5851KB)
A novel Markov random field based deformable model for face recognition
Shu Liao Chung, A.C.S.Page(s): 2675 - 2682
Digital Object Identifier : 10.1109/CVPR.2010.5539986
AbstractPlus | Full Text: PDF (1493KB)
In this paper, a new scheme to address the face recognition problem is proposed. Different from traditional face recognition approaches which represent each facial image by a single feature vector as the classification problem, the proposed method establishes a new way to formulate the face recognition problem as a deformable image registration problem. The main contributions of the paper lie in the following aspects: (i) Each pixel is represented by an anatomical feature signature calculated fr... Read More »
Pose-robust albedo estimation from a single image
Biswas, S. Chellappa, R.Page(s): 2683 - 2690
Digital Object Identifier : 10.1109/CVPR.2010.5539987
AbstractPlus | Full Text: PDF (1794KB) | Multimedia
We present a stochastic filtering approach to perform albedo estimation from a single non-frontal face image. Albedo estimation has far reaching applications in various computer vision tasks like illumination-insensitive matching, shape recovery, etc. We extend the formulation proposed in that assumes face in known pose and present an algorithm that can perform albedo estimation from a single image even when pose information is inaccurate. 3D pose of the input face image is obtained as a byprodu... Read More »
Discriminative K-SVD for dictionary learning in face recognition
Qiang Zhang Baoxin LiPage(s): 2691 - 2698
Digital Object Identifier : 10.1109/CVPR.2010.5539989
AbstractPlus | Full Text: PDF (171KB)
In a sparse-representation-based face recognition scheme, the desired dictionary should have good representational power (i.e., being able to span the subspace of all faces) while supporting optimal discrimination of the classes (i.e., different human subjects). We propose a method to learn an over-complete dictionary that attempts to simultaneously achieve the above two goals. The proposed method, discriminative K-SVD (D-KSVD), is based on extending the K-SVD algorithm by incorporating the clas... Read More »
Adaptive generic learning for face recognition from a single sample per person
Yu Su Shiguang Shan Xilin Chen Wen GaoPage(s): 2699 - 2706
Digital Object Identifier : 10.1109/CVPR.2010.5539990
AbstractPlus | Full Text: PDF (413KB)
Real-world face recognition systems often have to face the single sample per person (SSPP) problem, that is, only a single training sample for each person is enrolled in the database. In this case, many of the popular face recognition methods fail to work well due to the inability to learn the discriminatory information specific to the persons to be identified. To address this problem, in this paper, we propose an Adaptive Generic Learning (AGL) method, which adapts a generic discriminant model ... Read More »
Face recognition with learning-based descriptor
Zhimin Cao Qi Yin Xiaoou Tang Jian SunPage(s): 2707 - 2714
Digital Object Identifier : 10.1109/CVPR.2010.5539992
AbstractPlus | Full Text: PDF (646KB)
We present a novel approach to address the representation issue and the matching issue in face recognition (verification). Firstly, our approach encodes the micro-structures of the face by a new learning-based encoding method. Unlike many previous manually designed encoding methods (e.g., LBP or SIFT), we use unsupervised learning techniques to learn an encoder from the training examples, which can automatically achieve very good tradeoff between discriminative power and invariance. Then we appl... Read More »
Automatic point-based facial trait judgments evaluation
Rojas Q, M. Masip, D. Todorov, A. Vitrià, J.Page(s): 2715 - 2720
Digital Object Identifier : 10.1109/CVPR.2010.5539993
AbstractPlus | Full Text: PDF (586KB)
Humans constantly evaluate the personalities of other people using their faces. Facial trait judgments have been studied in the psychological field, and have been determined to influence important social outcomes of our lives, such as elections outcomes and social relationships. Recent work on textual descriptions of faces has shown that trait judgments are highly correlated. Further, behavioral studies suggest that two orthogonal dimensions, valence and dominance, can describe the basis of the ... Read More »
Bidirectional relighting for 3D-aided 2D face recognition
Toderici, G. Passalis, G. Zafeiriou, S. Tzimiropoulos, G. Petrou, M. Theoharis, T. Kakadiaris, I.A.Page(s): 2721 - 2728
Digital Object Identifier : 10.1109/CVPR.2010.5539995
AbstractPlus | Full Text: PDF (2018KB)
In this paper, we present a new method for bidirectional relighting for 3D-aided 2D face recognition under large pose and illumination changes. During subject enrollment, we build subject-specific 3D annotated models by using the subjects' raw 3D data and 2D texture. During authentication, the probe 2D images are projected onto a normalized image space using the subject-specific 3D model in the gallery. Then, a bidirectional relighting algorithm and two similarity metrics (a view-dependent compl... Read More »
Facial point detection using boosted regression and graph models
Valstar, M. Martinez, B. Binefa, X. Pantic, M.Page(s): 2729 - 2736
Digital Object Identifier : 10.1109/CVPR.2010.5539996
AbstractPlus | Full Text: PDF (597KB)
Finding fiducial facial points in any frame of a video showing rich naturalistic facial behaviour is an unsolved problem. Yet this is a crucial step for geometric-feature-based facial expression analysis, and methods that use appearance-based features extracted at fiducial facial point locations. In this paper we present a method based on a combination of Support Vector Regression and Markov Random Fields to drastically reduce the time needed to search for a point's location and increase the acc... Read More »
Action unit detection with segment-based SVMs
Simon, T. Minh Hoai Nguyen De La Torre, F. Cohn, J.F.Page(s): 2737 - 2744
Digital Object Identifier : 10.1109/CVPR.2010.5539998
AbstractPlus | Full Text: PDF (829KB)
Automatic facial action unit (AU) detection from video is a long-standing problem in computer vision. Two main approaches have been pursued: (1) static modeling - typically posed as a discriminative classification problem in which each video frame is evaluated independently; (2) temporal modeling - frames are segmented into sequences and typically modeled with a variant of dynamic Bayesian networks. We propose a segment-based approach, kSeg-SVM, that incorporates benefits of both approaches and ... Read More »
Gesture recognition by learning local motion signatures
Kaâniche, M.B. Brémond, F.Page(s): 2745 - 2752
Digital Object Identifier : 10.1109/CVPR.2010.5539999
AbstractPlus | Full Text: PDF (188KB)
Rapid face recognition using hashing
Qinfeng Shi Hanxi Li Chunhua ShenPage(s): 2753 - 2760
Digital Object Identifier : 10.1109/CVPR.2010.5540001
AbstractPlus | Full Text: PDF (548KB)
We propose a face recognition approach based on hashing. The approach yields comparable recognition rates with the random ℓ1 approach, which is considered the state-of-the-art. But our method is much faster: it is up to 150 times faster than on the YaleB dataset. We show that with hashing, the sparse representation can be recovered with a high probability because hashing preserves the restrictive isometry property. Moreover, we present a theoretical analysis on the recognition ... Read More »
Non-rigid structure from locally-rigid motion
Taylor, J. Jepson, A.D. Kutulakos, K.N.Page(s): 2761 - 2768
Digital Object Identifier : 10.1109/CVPR.2010.5540002
AbstractPlus | Full Text: PDF (4941KB) | Multimedia
We introduce locally-rigid motion, a general framework for solving the M-point, N-view structure-from-motion problem for unknown bodies deforming under orthography. The key idea is to first solve many local 3-point, N-view rigid problems independently, providing a “soup” of specific, plausibly rigid, 3D triangles. The main advantage here is that the extraction of 3D triangles requires only very weak assumptions: (1) deformations can be locally approximated by near-rigid motion of t... Read More »
Bundled depth-map merging for multi-view stereo
Jianguo Li Li, E. Yurong Chen Lin Xu Yimin ZhangPage(s): 2769 - 2776
Digital Object Identifier : 10.1109/CVPR.2010.5540004
AbstractPlus | Full Text: PDF (759KB) | Multimedia
Depth-map merging is one typical technique category for multi-view stereo (MVS) reconstruction. To guarantee accuracy, existing algorithms usually require either sub-pixel level stereo matching precision or continuous depth-map estimation. The merging of inaccurate depth-maps remains a challenging problem. This paper introduces a bundle optimization method for robust and accurate depth-map merging. In the method, depth-maps are generated using DAISY feature, followed by two stages of bundle opti... Read More »
Multi-view structure computation without explicitly estimating motion
Hongdong LiPage(s): 2777 - 2784
Digital Object Identifier : 10.1109/CVPR.2010.5540005
AbstractPlus | Full Text: PDF (254KB)
Most existing structure-from-motion methods follow a common two-step scheme, where relative camera motions are estimated in the first step and 3D structure is computed afterward in the second step. This paper presents a novel scheme which bypasses the motion-estimation step, and goes directly to structure computation step. By introducing graph rigidity theory to Sfm problems, we demonstrate that such a scheme is not only theoretically possible, but also technically feasible and effective. We als... Read More »
ABSORB: Atlas building by Self-Organized Registration and Bundling
Hongjun Jia Guorong Wu Qian Wang Dinggang ShenPage(s): 2785 - 2790
Digital Object Identifier : 10.1109/CVPR.2010.5540007
AbstractPlus | Full Text: PDF (484KB)
A novel groupwise registration framework, called Atlas Building by Self-Organized Registration and Bundling (ABSORB), is proposed in this paper. In this framework, the global structure of relative subject image distribution is preserved during the registration by constraining each subject to deform locally within the learned manifold. A self-organized registration is employed to deform each subject towards a subset of its neighbors that are closer to the global center. Some subjects close enough... Read More »
Stratified learning of local anatomical context for lung nodules in CT images
Wu, D. Le Lu Jinbo Bi Shinagawa, Y. Boyer, K. Krishnan, A. Salganicoff, M.Page(s): 2791 - 2798
Digital Object Identifier : 10.1109/CVPR.2010.5540008
AbstractPlus | Full Text: PDF (1875KB)
The automatic detection of lung nodules attached to other pulmonary structures is a useful yet challenging task in lung CAD systems. In this paper, we propose a stratified statistical learning approach to recognize whether a candidate nodule detected in CT images connects to any of three other major lung anatomies, namely vessel, fissure and lung wall, or is solitary with background parenchyma. First, we develop a fully automated voxel-by-voxel labeling/segmentation method of nodule, vessel, fis... Read More »
Delineating trees in noisy 2D images and 3D image-stacks
González, G. Türetken, E. Fleuret, F. Fua, P.Page(s): 2799 - 2806
Digital Object Identifier : 10.1109/CVPR.2010.5540010
AbstractPlus | Full Text: PDF (5881KB)
We present a novel approach to fully automated delineation of tree structures in noisy 2D images and 3D image stacks. Unlike earlier methods that rely mostly on local evidence, our method builds a set of candidate trees over many different subsets of points likely to belong to the final one and then chooses the best one according to a global objective function. Since we are not systematically trying to span all nodes, our algorithm is able to eliminate noise while retaining the right tree struct... Read More »
Sign ambiguity resolution for phase demodulation in interferometry with application to prelens tear film analysis
Dijia Wu Boyer, K.L.Page(s): 2807 - 2814
Digital Object Identifier : 10.1109/CVPR.2010.5540011
AbstractPlus | Full Text: PDF (8109KB)
We present a novel method to solve sign ambiguity for phase demodulation from a single interferometric image that possibly contains closed fringes. The problem is formulated in a binary pairwise energy minimization framework based on phase gradient orientation continuity. The objective function is non-submodular and therefore its minimization is an NP-hard problem, for which we devise a multigrid hierarchy of quadratic pseudoboolean optimization problems that can be improved iteratively to appro... Read More »
Multiple dynamic models for tracking the left ventricle of the heart from ultrasound data using particle filters and deep learning architectures
Carneiro, G. Nascimento, J.C.Page(s): 2815 - 2822
Digital Object Identifier : 10.1109/CVPR.2010.5540013
AbstractPlus | Full Text: PDF (455KB)
The problem of automatic tracking and segmentation of the left ventricle (LV) of the heart from ultrasound images can be formulated with an algorithm that computes the expected segmentation value in the current time step given all previous and current observations using a filtering distribution. This filtering distribution depends on the observation and transition models, and since it is hard to compute the expected value using the whole parameter space of segmentations, one has to resort to Mon... Read More »
Multilinear feature extraction and classification of multi-focal images, with applications in nematode taxonomy
Min Liu Roy-Chowdhury, A.K.Page(s): 2823 - 2830
Digital Object Identifier : 10.1109/CVPR.2010.5540014
AbstractPlus | Full Text: PDF (973KB)
Search strategies for multiple landmark detection by submodular maximization
Liu, D. Zhou, K.S. Bernhardt, D. Comaniciu, D.Page(s): 2831 - 2838
Digital Object Identifier : 10.1109/CVPR.2010.5540016
AbstractPlus | Full Text: PDF (1414KB)
A fundamental issue in multiple landmark detection is the reduction of computational cost. This problem has previously been addressed mainly by reducing the complexity of each individual landmark detector. We address the problem by optimizing the search strategy of multiple landmarks. When the relative positions of landmarks are constrained, the search space can be reduced, thereby reducing the computation. The proposed method leverages the theory of submodular functions to provide a constant fa... Read More »
Compression of surface registrations using Beltrami coefficients
Lok Ming Lui Tsz Wai Wong Thompson, P. Chan, T. Xianfeng Gu Shing-Tung YauPage(s): 2839 - 2846
Digital Object Identifier : 10.1109/CVPR.2010.5540017
AbstractPlus | Full Text: PDF (4710KB)
Surface registration is widely used in machine vision and medical imaging, where 1-1 correspondences between surfaces are computed to study their variations. Surface maps are usually stored as the 3D coordinates each vertex is mapped to, which often requires lots of storage memory. This causes inconvenience in data transmission and data storage, especially when a large set of surfaces are analyzed. To tackle this problem, we propose a novel representation of surface diffeomorphisms using Beltram... Read More »
Natural gradients for deformable registration
Zikic, D. Kamen, A. Navab, N.Page(s): 2847 - 2854
Digital Object Identifier : 10.1109/CVPR.2010.5540019
AbstractPlus | Full Text: PDF (668KB)
We apply the concept of natural gradients to deformable registration. The motivation stems from the lack of physical interpretation for gradients of image-based difference measures. The main idea is to endow the space of deformations with a distance metric which reflects the variation of the difference measure between two deformations. This is in contrast to standard approaches which assume the Euclidean frame. The modification of the distance metric is realized by treating the deformations as a... Read More »
Curious snakes: A minimum latency solution to the cluttered background problem in active contours
Sundaramoorthi, G. Soatto, S. Yezzi, A.J.Page(s): 2855 - 2862
Digital Object Identifier : 10.1109/CVPR.2010.5540020
AbstractPlus | Full Text: PDF (1906KB)
We present a region-based active contour detection algorithm for objects that exhibit relatively homogeneous photometric characteristics (e.g. smooth color or gray levels), embedded in complex background clutter. Current methods either frame this problem in Bayesian classification terms, where precious modeling resources are expended representing the complex background away from decision boundaries, or use heuristics to limit the search to local regions around the object of interest. We propose ... Read More »
Anatomical parts-based regression using non-negative matrix factorization
Joshi, S. Karthikeyan, S. Manjunath, B.S. Grafton, S. Kiehl, K.A.Page(s): 2863 - 2870
Digital Object Identifier : 10.1109/CVPR.2010.5540022
AbstractPlus | Full Text: PDF (524KB)
Non-negative matrix factorization (NMF) is an excellent tool for unsupervised parts-based learning, but proves to be ineffective when parts of a whole follow a specific pattern. Analyzing such local changes is particularly important when studying anatomical transformations. We propose a supervised method that incorporates a regression constraint into the NMF framework and learns maximally changing parts in the basis images, called Regression based NMF (RNMF). The algorithm is made robust against... Read More »
Metric-induced optimal embedding for intrinsic 3D shape analysis
Rongjie Lai Yonggang Shi Scheibel, K. Fears, S. Woods, R. Toga, A.W. Chan, T.F.Page(s): 2871 - 2878
Digital Object Identifier : 10.1109/CVPR.2010.5540023
AbstractPlus | Full Text: PDF (1692KB)
For various 3D shape analysis tasks, the Laplace-Beltrami(LB) embedding has become increasingly popular as it enables the efficient comparison of shapes based on intrinsic geometry. One fundamental difficulty in using the LB embedding, however, is the ambiguity in the eigen-system, and it is conventionally only handled in a heuristic way. In this work, we propose a novel and intrinsic metric, the spectral l2-distance, to overcome this difficulty. We prove mathematically that this new ... Read More »
Simultaneous searching of globally optimal interacting surfaces with shape priors
Qi Song Xiaodong Wu Yunlong Liu Sonka, M. Garvin, M.Page(s): 2879 - 2886
Digital Object Identifier : 10.1109/CVPR.2010.5540025
AbstractPlus | Full Text: PDF (736KB)
Multiple surface searching with only image intensity information is a difficult job in the presence of high noise and weak edges. We present in this paper a novel method for globally optimal multi-surface searching with a shape prior represented by convex pairwise energies. A 3-D graph-theoretic framework is employed. An arc-weighted graph is constructed based on a shape model built from training datasets. A wide spectrum of constraints is then incorporated. The shape prior term penalizes the lo... Read More »
Group MRF for fMRI activation detection
Ng, B. Abugharbieh, R. Hamarneh, G.Page(s): 2887 - 2894
Digital Object Identifier : 10.1109/CVPR.2010.5540026
AbstractPlus | Full Text: PDF (2467KB)
Noise confounds present serious complications to accurate data analysis in functional magnetic resonance imaging (fMRI). Simply relying on contextual image information often results in unsatisfactory segmentation of active brain regions. To remedy this, we propose a novel Group Markov Random Field (Group MRF) that extends the neighborhood system to other subjects to incorporate group information in modeling each subject's brain activation. Our approach has the distinct advantage of being able to... Read More »
Localizing non-overlapping surveillance cameras under the L-Infinity norm
Micusik, B. Pflugfelder, R.Page(s): 2895 - 2901
Digital Object Identifier : 10.1109/CVPR.2010.5540028
AbstractPlus | Full Text: PDF (812KB)
This paper presents a new approach to the problem of camera localization with non-overlapping camera views, particularly relevant for video surveillance systems. We show how to recast localization as quasi-convex optimization under the L-Infinity norm. Thereby we add the problem of reconstructing camera centers and 3D points for non-overlapping cameras with known internal parameters and known rotations to the class of known geometric problems solvable with Second Order Cone Programming. The 3D p... Read More »
Neuron geometry extraction by perceptual grouping in ssTEM images
Kaynig, V. Fuchs, T. Buhmann, J.M.Page(s): 2902 - 2909
Digital Object Identifier : 10.1109/CVPR.2010.5540029
AbstractPlus | Full Text: PDF (2366KB)
An automatic unsupervised classification of MR images in Alzheimer's disease
Xiaojing Long Wyatt, C.Page(s): 2910 - 2917
Digital Object Identifier : 10.1109/CVPR.2010.5540031
AbstractPlus | Full Text: PDF (165KB)
Image-analysis methods play an important role in helping detect brain changes in and diagnosis of Alzheimer's Disease (AD). In this paper, we propose an automatic unsupervised classification approach to distinguish brain magnetic resonance (MR) images of AD patients from those of elderly normal controls. The symmetric log-domain diffeomorphic demons algorithm, with the properties of symmetry and invertibility, is used to compute the pair-wise registration, whose deformation field is then used to... Read More »
Masked FFT registration
Padfield, D.Page(s): 2918 - 2925
Digital Object Identifier : 10.1109/CVPR.2010.5540032
AbstractPlus | Full Text: PDF (3346KB) | Multimedia
Registration is a ubiquitous task for image analysis applications. Generally, the requirements of registration algorithms include fast computation and large capture range. For these purposes, registration in the Fourier domain using normalized cross correlation is well suited and has been extensively studied in the literature. Another common requirement is masking, which is necessary for applications where certain regions of the image that would adversely affect the registration result should be... Read More »
Lymph node detection in 3-D chest CT using a spatial prior probability
Feulner, J. Zhou, S.K. Huber, M. Hornegger, J. Comaniciu, D. Cavallaro, A.Page(s): 2926 - 2932
Digital Object Identifier : 10.1109/CVPR.2010.5540034
AbstractPlus | Full Text: PDF (2687KB)
Lymph nodes have high clinical relevance but detection is challenging as they are hard to see due to low contrast and irregular shape. In this paper, a method for fully automatic mediastinal lymph node detection in 3-D computed tomography (CT) images of the chest area is proposed. Discriminative learning is used to detect lymph nodes based on their appearance. Because lymph nodes can easily be confused with other structures, it is vital to incorporate as much anatomical knowledge as possible to ... Read More »
Image atlas construction via intrinsic averaging on the manifold of images
Yuchen Xie Ho, J. Vemuri, B.C.Page(s): 2933 - 2939
Digital Object Identifier : 10.1109/CVPR.2010.5540035
AbstractPlus | Full Text: PDF (1217KB)
In this paper, we propose a novel algorithm for computing an atlas from a collection of images. In the literature, atlases have almost always been computed as some types of means such as the straightforward Euclidean means or the more general Karcher means on Riemannian manifolds. In the context of images, the paper's main contribution is a geometric framework for computing image atlases through a two-step process: the localization of mean and the realization of it as an image. In the localizati... Read More »
Heterogeneous Conditional Random Field: Realizing joint detection and segmentation of cell regions in microscopic images
Pan, J. Kanade, T. Mei ChenPage(s): 2940 - 2947
Digital Object Identifier : 10.1109/CVPR.2010.5540037
AbstractPlus | Full Text: PDF (4264KB)
Detecting and segmenting cell regions in microscopic images is a challenging task, because cells typically do not have rich features, and their shapes and appearances are highly irregular and flexible. Furthermore, cells often form clusters, rendering the existing joint detection and segmentation algorithms unable to segment out individual cells. We address these difficulties by proposing a Heterogeneous Conditional Random Field (HCRF), in which different nodes have different state sets. The sta... Read More »
Model-based respiratory motion compensation for image-guided cardiac interventions
Schneider, M. Sundar, H. Rui Liao Hornegger, J. Chenyang XuPage(s): 2948 - 2954
Digital Object Identifier : 10.1109/CVPR.2010.5540038
AbstractPlus | Full Text: PDF (6834KB) | Multimedia
In this paper we propose and validate a PCA-based respiratory motion model for motion compensation during image-guided cardiac interventions. In a preparatory training phase, a preoperative 3-D segmentation of the coronary arteries is automatically registered with a cardiac gated biplane cineangiogram, and used to build a respiratory motion model. This motion model is subsequently used as a prior within the intraoperative registration process for motion compensation to restrict the search space.... Read More »
Proximate sensing: Inferring what-is-where from georeferenced photo collections
Leung, D. Newsam, S.Page(s): 2955 - 2962
Digital Object Identifier : 10.1109/CVPR.2010.5540040
AbstractPlus | Full Text: PDF (2207KB)
The primary and novel contribution of this work is the conjecture that large collections of georeferenced photo collections can be used to derive maps of what-is-where on the surface of the earth. We investigate the application of what we term “proximate sensing” to the problem of land cover classification for a large geographic region. We show that our approach is able to achieve almost 75% classification accuracy in a binary land cover labelling problem using images from a photo ... Read More »
Detecting text in natural scenes with stroke width transform
Epshtein, B. Ofek, E. Wexler, Y.Page(s): 2963 - 2970
Digital Object Identifier : 10.1109/CVPR.2010.5540041
AbstractPlus | Full Text: PDF (1737KB)
We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and ... Read More »
Reading between the lines: Object localization using implicit cues from image tags
Sung Ju Hwang Grauman, K.Page(s): 2971 - 2978
Digital Object Identifier : 10.1109/CVPR.2010.5540043
AbstractPlus | Full Text: PDF (1655KB)
Current uses of tagged images typically exploit only the most explicit information: the link between the nouns named and the objects present somewhere in the image. We propose to leverage “unspoken” cues that rest within an ordered list of image tags so as to improve object localization. We define three novel implicit features from an image's tags - the relative prominence of each object as signified by its order of mention, the scale constraints implied by unnamed objects, and the... Read More »
Beyond active noun tagging: Modeling contextual interactions for multi-class active learning
Siddiquie, B. Gupta, A.Page(s): 2979 - 2986
Digital Object Identifier : 10.1109/CVPR.2010.5540044
AbstractPlus | Full Text: PDF (2310KB)
ARISTA - image search to annotation on billions of web photos
Xin-Jing Wang Lei Zhang Ming Liu Yi Li Wei-Ying MaPage(s): 2987 - 2994
Digital Object Identifier : 10.1109/CVPR.2010.5540046
AbstractPlus | Full Text: PDF (405KB)
Though it has cost great research efforts for decades, object recognition is still a challenging problem. Traditional methods based on machine learning or computer vision are still in the stage of tackling hundreds of object categories. In recent years, non-parametric approaches have demonstrated great success, which understand the content of an image by propagating labels of its similar images in a large-scale dataset. However, due to the limited dataset size and imperfect image crawling strate... Read More »
Breaking the interactive bottleneck in multi-class classification with active selection and binary feedback
Joshi, A.J. Porikli, F. Papanikolopoulos, N.Page(s): 2995 - 3002
Digital Object Identifier : 10.1109/CVPR.2010.5540047
AbstractPlus | Full Text: PDF (1088KB) | Multimedia
Multi-class classification schemes typically require human input in the form of precise category names or numbers for each example to be annotated - providing this can be impractical for the user when a large (and possibly unknown) number of categories are present. In this paper, we propose a multi-class active learning model that requires only binary (yes/no type) feedback from the user. For instance, given two images the user only has to say whether they belong to the same class or not. We fir... Read More »
Efficient histogram-based sliding window
Yichen Wei Litian TaoPage(s): 3003 - 3010
Digital Object Identifier : 10.1109/CVPR.2010.5540049
AbstractPlus | Full Text: PDF (800KB)
Many computer vision problems rely on computing histogram-based objective functions with a sliding window. A main limiting factor is the high computational cost. Existing computational methods have a complexity linear in the histogram dimension. In this paper, we propose an efficient method that has a constant complexity in the histogram dimension and therefore scales well with high dimensional histograms. This is achieved by harnessing the spatial coherence of natural images and computing the o... Read More »
Pareto-optimal dictionaries for signatures
Calonder, M. Lepetit, V. Fua, P.Page(s): 3011 - 3018
Digital Object Identifier : 10.1109/CVPR.2010.5540050
AbstractPlus | Full Text: PDF (442KB)
We present an effective method to optimize over the parameters of an image patch descriptor to obtain one that is computationally more efficient while maintaining a high recognition rate. We formulate the optimization problem in a multi-objective manner, which balances two conflicting goals while removing the need for traditional weighting coefficients. To this end we introduce the Pareto efficiency criterion, which helps finding solutions that increase one objective without decreasing the other... Read More »
Region moments: Fast invariant descriptors for detecting small image structures
Doretto, G. Yi YaoPage(s): 3019 - 3026
Digital Object Identifier : 10.1109/CVPR.2010.5540052
AbstractPlus | Full Text: PDF (1204KB)
This paper presents region moments, a class of appearance descriptors based on image moments applied to a pool of image features. A careful design of the moments and the image features, makes the descriptors scale and rotation invariant, and therefore suitable for vehicle detection from aerial video, where targets appear at different scales and orientations. Region moments are linearly related to the image features. Thus, comparing descriptors by computing costly geodesic distances and non-linea... Read More »
Optimizing one-shot recognition with micro-set learning
Tang, K.D. Tappen, M.F. Sukthankar, R. Lampert, C.H.Page(s): 3027 - 3034
Digital Object Identifier : 10.1109/CVPR.2010.5540053
AbstractPlus | Full Text: PDF (1554KB)
For object category recognition to scale beyond a small number of classes, it is important that algorithms be able to learn from a small amount of labeled data per additional class. One-shot recognition aims to apply the knowledge gained from a set of categories with plentiful data to categories for which only a single exemplar is available for each. As with earlier efforts motivated by transfer learning, we seek an internal representation for the domain that generalizes across classes. However,... Read More »
Far-sighted active learning on a budget for image and video recognition
Vijayanarasimhan, S. Jain, P. Grauman, K.Page(s): 3035 - 3042
Digital Object Identifier : 10.1109/CVPR.2010.5540055
AbstractPlus | Full Text: PDF (317KB)
Active learning methods aim to select the most informative unlabeled instances to label first, and can help to focus image or video annotations on the examples that will most improve a recognition system. However, most existing methods only make myopic queries for a single label at a time, retraining at each iteration. We consider the problem where at each iteration the active learner must select a set of examples meeting a given budget of supervision, where the budget is determined by the funds... Read More »
A square-root sampling approach to fast histogram-based search
Huang-Wei Chang Hwann-Tzong ChenPage(s): 3043 - 3049
Digital Object Identifier : 10.1109/CVPR.2010.5540056
AbstractPlus | Full Text: PDF (426KB)
We present an efficient pixel-sampling technique for histogram-based search. Given a template image as a query, a typical histogram-based algorithm aims to find the location of the target in another large test image, by evaluating a similarity measure for comparing the feature histogram of the template with that of each possible subwindow in the test image. The computational cost would be high if each subwindow needs to compute its histogram and evaluate the similarity measure. In this paper, we... Read More »
Fast pattern matching using orthogonal Haar transform
Wanli Ouyang Renqi Zhang Wai-Kuen ChamPage(s): 3050 - 3057
Digital Object Identifier : 10.1109/CVPR.2010.5540058
AbstractPlus | Full Text: PDF (488KB) | Multimedia
Pattern matching is a widely used procedure in signal processing, computer vision, image and video processing. Recently, methods using Walsh Hadamard Transform (WHT) and Gray-Code kernels (GCK) are successfully applied for fast transform domain pattern matching. This paper introduces strip sum on the image. The sum of pixels in a rectangle can be computed by one addition using the strip sum. Then we propose to use the orthogonal Haar transform (OHT) for pattern matching. Applied for pattern matc... Read More »
One-shot multi-set non-rigid feature-spatial matching
Torki, M. Elgammal, A.Page(s): 3058 - 3065
Digital Object Identifier : 10.1109/CVPR.2010.5540059
AbstractPlus | Full Text: PDF (810KB) | Multimedia
Relaxing the 3L algorithm for an accurate implicit polynomial fitting
Rouhani, M. Sappa, A.D.Page(s): 3066 - 3072
Digital Object Identifier : 10.1109/CVPR.2010.5540061
AbstractPlus | Full Text: PDF (4825KB)
This paper presents a novel method to increase the accuracy of linear fitting of implicit polynomials. The proposed method is based on the 3L algorithm philosophy. The novelty lies on the relaxation of the additional constraints, already imposed by the 3L algorithm. Hence, the accuracy of the final solution is increased due to the proper adjustment of the expected values in the aforementioned additional constraints. Although iterative, the proposed approach solves the fitting problem within a li... Read More »
Online visual vocabulary pruning using pairwise constraints
Mallapragada, P.K. Rong Jin Jain, A.K.Page(s): 3073 - 3080
Digital Object Identifier : 10.1109/CVPR.2010.5540062
AbstractPlus | Full Text: PDF (1317KB)
Given a pair of images represented using bag-of-visual-words and a label corresponding to whether the images are “related”(must-link constraint) or “unrelated” (cannot-link constraint), we address the problem of selecting a subset of visual words that are salient in explaining the relation between the image pair. In particular, a subset of features is selected such that the distance computed using these features satisfies the given pairwise constraints. An efficient o... Read More »
Safety in numbers: Learning categories from few examples with multi model knowledge transfer
Tommasi, T. Orabona, F. Caputo, B.Page(s): 3081 - 3088
Digital Object Identifier : 10.1109/CVPR.2010.5540064
AbstractPlus | Full Text: PDF (290KB)
Learning object categories from small samples is a challenging problem, where machine learning tools can in general provide very few guarantees. Exploiting prior knowledge may be useful to reproduce the human capability of recognizing objects even from only one single view. This paper presents an SVM-based model adaptation algorithm able to select and weight appropriately prior knowledge coming from different categories. The method relies on the solution of a convex optimization problem which en... Read More »
Rapid and accurate developmental stage recognition of C. elegans from high-throughput image data
White, A.G. Cipriani, P.G. Huey-Ling Kao Lees, B. Geiger, D. Sontag, E. Gunsalus, K.C. Piano, F.Page(s): 3089 - 3096
Digital Object Identifier : 10.1109/CVPR.2010.5540065
AbstractPlus | Full Text: PDF (626KB)
We present a hierarchical principle for object recognition and its application to automatically classify developmental stages of C. elegans animals from a population of mixed stages. The object recognition machine consists of four hierarchical layers, each composed of units upon which evaluation functions output a label score, followed by a grouping mechanism that resolves ambiguities in the score by imposing local consistency constraints. Each layer then outputs groups of units, from which the ... Read More »
Tiered scene labeling with dynamic programming
Felzenszwalb, P.F. Veksler, O.Page(s): 3097 - 3104
Digital Object Identifier : 10.1109/CVPR.2010.5540067
AbstractPlus | Full Text: PDF (865KB)
Dynamic programming (DP) has been a useful tool for a variety of computer vision problems. However its application is usually limited to problems with a one dimensional or low treewidth structure, whereas most domains in vision are at least 2D. In this paper we show how to apply DP for pixel labeling of 2D scenes with simple “tiered” structure. While there are many variations possible, for the applications we consider the following tiered structure is appropriate. An image is first... Read More »
Segmentation of building facades using procedural shape priors
Teboul, O. Simon, L. Koutsourakis, P. Paragios, N.Page(s): 3105 - 3112
Digital Object Identifier : 10.1109/CVPR.2010.5540068
AbstractPlus | Full Text: PDF (2322KB)
In this paper we propose a novel approach to the perceptual interpretation of building facades that combines shape grammars, supervised classification and random walks. Procedural modeling is used to model the geometric and the photometric variation of buildings. This is fused with visual classification techniques (randomized forests) that provide a crude probabilistic interpretation of the observation space in order to measure the appropriateness of a procedural generation with respect to the i... Read More »
Layered object detection for multi-class segmentation
Yi Yang Hallman, S. Ramanan, D. Fowlkes, C.Page(s): 3113 - 3120
Digital Object Identifier : 10.1109/CVPR.2010.5540070
AbstractPlus | Full Text: PDF (7041KB)
We formulate a layered model for object detection and multi-class segmentation. Our system uses the output of a bank of object detectors in order to define shape priors for support masks and then estimates appearance, depth ordering and labeling of pixels in the image. We train our system on the PASCAL segmentation challenge dataset and show good test results with state of the art performance in several categories including segmenting humans. Read More »
Rectification of figures and photos in document images using bounding box interface
Hyung Il Koo Nam Ik ChoPage(s): 3121 - 3128
Digital Object Identifier : 10.1109/CVPR.2010.5540071
AbstractPlus | Full Text: PDF (3599KB)
This paper proposes an algorithm for the segmentation and rectification of figures and photos in document images. The algorithm requires just a rough user-provided bounding box for the objects in a single-view image. On receiving the user's bounding box, it takes about 1-2 seconds to segment and rectify mega-pixel sized figures. The main feature of the algorithm is a novel segmentation method that exploits the properties of printed figures. Specifically, a set of boundary candidates is generated... Read More »
Geodesic star convexity for interactive image segmentation
Gulshan, V. Rother, C. Criminisi, A. Blake, A. Zisserman, A.Page(s): 3129 - 3136
Digital Object Identifier : 10.1109/CVPR.2010.5540073
AbstractPlus | Full Text: PDF (1269KB)
In this paper we introduce a new shape constraint for interactive image segmentation. It is an extension of Veksler's star-convexity prior, in two ways: from a single star to multiple stars and from Euclidean rays to Geodesic paths. Global minima of the energy function are obtained subject to these new constraints. We also introduce Geodesic Forests, which exploit the structure of shortest paths in implementing the extended constraints. The star-convexity prior is used here in an interactive set... Read More »
Figure-ground segmentation improves handled object recognition in egocentric video
Xiaofeng Ren Chunhui GuPage(s): 3137 - 3144
Digital Object Identifier : 10.1109/CVPR.2010.5540074
AbstractPlus | Full Text: PDF (1343KB) | Multimedia
Learning kernels for variants of normalized cuts: Convex relaxations and applications
Mukherjee, L. Singh, V. Peng, J. Hinrichs, C.Page(s): 3145 - 3152
Digital Object Identifier : 10.1109/CVPR.2010.5540076
AbstractPlus | Full Text: PDF (347KB)
We propose a new algorithm for learning kernels for variants of the Normalized Cuts (NCuts) objective - i.e., given a set of training examples with known partitions, how should a basis set of similarity functions be combined to induce NCuts favorable distributions. Such a procedure facilitates design of good affinity matrices. It also helps assess the importance of different feature types for discrimination. Rather than formulating the learning problem in terms of the spectral relaxation, the al... Read More »
Globally optimal pixel labeling algorithms for tree metrics
Felzenszwalb, P.F. Pap, G. Tardos, E. Zabih, R.Page(s): 3153 - 3160
Digital Object Identifier : 10.1109/CVPR.2010.5540077
AbstractPlus | Full Text: PDF (669KB)
We consider pixel labeling problems where the label set forms a tree, and where the observations are also labels. Such problems arise in feature-space analysis with a very large label set, for instance in color image segmentation. In this case a tree of labels can be constructed via hierarchical clustering of the observations. This leads to an obvious distance function between two labels, namely their distance within the tree; such tree metrics have been extensively studied outside of computer v... Read More »
Geodesic graph cut for interactive image segmentation
Price, B.L. Morse, B. Cohen, S.Page(s): 3161 - 3168
Digital Object Identifier : 10.1109/CVPR.2010.5540079
AbstractPlus | Full Text: PDF (3313KB) | Multimedia
Interactive segmentation is useful for selecting objects of interest in images and continues to be a topic of much study. Methods that grow regions from foreground/background seeds, such as the recent geodesic segmentation approach, avoid the boundary-length bias of graph-cut methods but have their own bias towards minimizing paths to the seeds, resulting in increased sensitivity to seed placement. The lack of edge modeling in geodesic or similar approaches limits their ability to precisely loca... Read More »
iCoseg: Interactive co-segmentation with intelligent scribble guidance
Batra, D. Kowdle, A. Parikh, D. Jiebo Luo Tsuhan ChenPage(s): 3169 - 3176
Digital Object Identifier : 10.1109/CVPR.2010.5540080
AbstractPlus | Full Text: PDF (5908KB) | Multimedia
This paper presents an algorithm for Interactive Co-segmentation of a foreground object from a group of related images. While previous approaches focus on unsupervised co-segmentation, we use successful ideas from the interactive object-cutout literature. We develop an algorithm that allows users to decide what foreground is, and then guide the output of the co-segmentation algorithm towards it via scribbles. Interestingly, keeping a user in the loop leads to simpler and highly parallelizable en... Read More »
Variational segmentation of elongated volumetric structures
Reinbacher, C. Pock, T. Bauer, C. Bischof, H.Page(s): 3177 - 3184
Digital Object Identifier : 10.1109/CVPR.2010.5539771
AbstractPlus | Full Text: PDF (2268KB)
We present an interactive approach for segmenting thin volumetric structures. The proposed segmentation model is based on an anisotropic weighted Total Variation energy with a global volumetric constraint and is minimized using an efficient numerical approach and a convex relaxation. The algorithm is globally optimal w.r.t. the relaxed problem for any volumetric constraint. The binary solution of the relaxed problem equals the globally optimal solution of the original problem. Implemented on tod... Read More »
Collect-cut: Segmentation with top-down cues discovered in multi-object images
Yong Jae Lee Grauman, K.Page(s): 3185 - 3192
Digital Object Identifier : 10.1109/CVPR.2010.5539772
AbstractPlus | Full Text: PDF (817KB) | Multimedia
We present a method to segment a collection of unlabeled images while exploiting automatically discovered appearance patterns shared between them. Given an unlabeled pool of multi-object images, we first detect any visual clusters present among their sub-regions, where inter-region similarity is measured according to both appearance and contextual layout. Then, using each initial segment as a seed, we solve a graph cuts problem to refine its boundary - enforcing preferences to include nearby reg... Read More »
Authority-shift clustering: Hierarchical clustering by authority seeking on graphs
Minsu Cho Kyoung MuLeePage(s): 3193 - 3200
Digital Object Identifier : 10.1109/CVPR.2010.5540081
AbstractPlus | Full Text: PDF (3975KB)
In this paper, a novel hierarchical clustering method using link analysis techniques is introduced. The algorithm is formulated as an authority seeking procedure on graphs, which computes the shifts toward nodes with high authority scores. For the authority shift, we adopted the personalized PageRank score of the graph. Based on the concept of authority seeking, we achieve hierarchical clustering by iteratively propagating the authority scores to other nodes and shifting authority nodes. This sc... Read More »
Nonparametric higher-order learning for interactive segmentation
Tae Hoon Kim Kyoung Mu Lee Sang Uk LeePage(s): 3201 - 3208
Digital Object Identifier : 10.1109/CVPR.2010.5540078
AbstractPlus | Full Text: PDF (2748KB) | Multimedia
In this paper, we deal with a generative model for multilabel, interactive segmentation. To estimate the pixel likelihoods for each label, we propose a new higher-order formulation additionally imposing the soft label consistency constraint whereby the pixels in the regions, generated by unsupervised image segmentation algorithms, tend to have the same label. In contrast with previous works which focus on the parametric model of the higher-order cliques for adding this soft constraint, we addres... Read More »
GPCA with denoising: A moments-based convex approach
Ozay, N. Sznaier, M. Lagoa, C. Camps, O.Page(s): 3209 - 3216
Digital Object Identifier : 10.1109/CVPR.2010.5540075
AbstractPlus | Full Text: PDF (1670KB) | Multimedia
This paper addresses the problem of segmenting a combination of linear subspaces and quadratic surfaces from sample data points corrupted by (not necessarily small) noise. Our main result shows that this problem can be reduced to minimizing the rank of a matrix whose entries are affine in the optimization variables, subject to a convex constraint imposing that these variables are the moments of an (unknown) probability distribution function with finite support. Exploiting the linear matrix inequ... Read More »
Efficiently selecting regions for scene understanding
Kumar, M.P. Koller, D.Page(s): 3217 - 3224
Digital Object Identifier : 10.1109/CVPR.2010.5540072
AbstractPlus | Full Text: PDF (827KB)
Finding image distributions on active curves
Ayed, I.B. Mitiche, A. Salah, M.B. Shuo LiPage(s): 3225 - 3232
Digital Object Identifier : 10.1109/CVPR.2010.5540069
AbstractPlus | Full Text: PDF (334KB)
This study investigates an active curve functional which measures a similarity between the distribution of an image feature on the curve and a model distribution learned a priori. The curve evolution equation resulting from the minimization of this contour-based functional can be viewed as a geodesic active contour with a variable stopping function. The variable stopping function depends on the distribution of image feature on the curve and, therefore, can deal with difficult cases where the des... Read More »
A shape-driven MRF model for the segmentation of organs in medical images
Chittajallu, D.R. Shah, S.K. Kakadiaris, I.A.Page(s): 3233 - 3240
Digital Object Identifier : 10.1109/CVPR.2010.5540066
AbstractPlus | Full Text: PDF (733KB)
In this paper, we present a knowledge-driven Markov Random Field (MRF) model for the segmentation of organs in medical images with particular emphasis on the incorporation of shape constraints into the segmentation problem. We cast the problem of image segmentation as the Maximum A Posteriori (MAP) estimation of a Markov Random Field which, in essence, is equivalent to the minimization of the corresponding Gibbs energy function. We then incorporate a set of constraints into the Gibbs energy func... Read More »
Constrained parametric min-cuts for automatic object segmentation
Carreira, J. Sminchisescu, C.Page(s): 3241 - 3248
Digital Object Identifier : 10.1109/CVPR.2010.5540063
AbstractPlus | Full Text: PDF (2500KB)
We present a novel framework for generating and ranking plausible objects hypotheses in an image using bottom-up processes and mid-level cues. The object hypotheses are represented as figure-ground segmentations, and are extracted automatically, without prior knowledge about properties of individual object classes, by solving a sequence of constrained parametric min-cut problems (CPMC) on a regular image grid. We then learn to rank the object hypotheses by training a continuous model to predict ... Read More »
Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning
Vezhnevets, A. Buhmann, J.M.Page(s): 3249 - 3256
Digital Object Identifier : 10.1109/CVPR.2010.5540060
AbstractPlus | Full Text: PDF (895KB)
We address the task of learning a semantic segmentation from weakly supervised data. Our aim is to devise a system that predicts an object label for each pixel by making use of only image level labels during training - the information whether a certain object is present or not in the image. Such coarse tagging of images is faster and easier to obtain as opposed to the tedious task of pixelwise labeling required in state of the art systems. We cast this task naturally as a multiple instance learn... Read More »
Fast global optimization of curvature
El-Zehiry, N.Y. Grady, L.Page(s): 3257 - 3264
Digital Object Identifier : 10.1109/CVPR.2010.5540057
AbstractPlus | Full Text: PDF (509KB)
Two challenges in computer vision are to accommodate noisy data and missing data. Many problems in computer vision, such as segmentation, filtering, stereo, reconstruction, inpainting and optical flow seek solutions that match the data while satisfying an additional regularization, such as total variation or boundary length. A regularization which has received less attention is to minimize the curvature of the solution. One reason why this regularization has received less attention is due to the... Read More »
Label propagation in video sequences
Badrinarayanan, V. Galasso, F. Cipolla, R.Page(s): 3265 - 3272
Digital Object Identifier : 10.1109/CVPR.2010.5540054
AbstractPlus | Full Text: PDF (1428KB) | Multimedia
This paper proposes a probabilistic graphical model for the problem of propagating labels in video sequences, also termed the label propagation problem. Given a limited amount of hand labelled pixels, typically the start and end frames of a chunk of video, an EM based algorithm propagates labels through the rest of the frames of the video sequence. As a result, the user obtains pixelwise labelled video sequences along with the class probabilities at each pixel. Our novel algorithm provides an es... Read More »
Vessel scale-selection using MRF optimization
Mirzaalian, H. Hamarneh, G.Page(s): 3273 - 3279
Digital Object Identifier : 10.1109/CVPR.2010.5540051
AbstractPlus | Full Text: PDF (350KB)
Many feature detection algorithms rely on the choice of scale. In this paper, we complement standard scale-selection algorithms with spatial regularization. To this end, we formulate scale-selection as a graph labeling problem and employ Markov random field multi-label optimization. We focus on detecting the scales of vascular structures in medical images. We compare the detected vessel scales using our method to those obtained using the selection approach of the well-known vesselness filter (Fr... Read More »
Harmony potentials for joint classification and segmentation
Gonfaus, J.M. Boix, X. van de Weijer, J. Bagdanov, A.D. Serrat, J. Gonzàlez, J.Page(s): 3280 - 3287
Digital Object Identifier : 10.1109/CVPR.2010.5540048
AbstractPlus | Full Text: PDF (5716KB)
Hierarchical conditional random fields have been successfully applied to object segmentation. One reason is their ability to incorporate contextual information at different scales. However, these models do not allow multiple labels to be assigned to a single node. At higher scales in the image, this yields an oversimplified model, since multiple classes can be reasonable expected to appear within one region. This simplified model especially limits the impact that observations at larger scales ma... Read More »
Graph cut segmentation with a global constraint: Recovering region distribution via a bound of the Bhattacharyya measure
Ayed, I.B. Hua-mei Chen Punithakumar, K. Ross, I. Shuo LiPage(s): 3288 - 3295
Digital Object Identifier : 10.1109/CVPR.2010.5540045
AbstractPlus | Full Text: PDF (1489KB) | Multimedia
This study investigates an efficient algorithm for image segmentation with a global constraint based on the Bhattacharyya measure. The problem consists of finding a region consistent with an image distribution learned a priori. We derive an original upper bound of the Bhattacharyya measure by introducing an auxiliary labeling. From this upper bound, we reformulate the problem as an optimization of an auxiliary function by graph cuts. Then, we demonstrate that the proposed procedure converges and... Read More »
Interest seam image
Xiao Zhang Gang Hua Lei Zhang Heung-Yeung ShumPage(s): 3296 - 3303
Digital Object Identifier : 10.1109/CVPR.2010.5540042
AbstractPlus | Full Text: PDF (5178KB)
Aggregating local descriptors into a compact image representation
Jégou, H. Douze, M. Schmid, C. Pérez, P.Page(s): 3304 - 3311
Digital Object Identifier : 10.1109/CVPR.2010.5540039
AbstractPlus | Full Text: PDF (1689KB)
We address the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory usage of the representation. We first propose a simple yet efficient way of aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation. We then show how to jointly optimize the dimension reduction and the indexing algorithm, so that it... Read More »
Automatic image annotation using group sparsity
Shaoting Zhang Junzhou Huang Yuchi Huang Yang Yu Hongsheng Li Metaxas, D.N.Page(s): 3312 - 3319
Digital Object Identifier : 10.1109/CVPR.2010.5540036
AbstractPlus | Full Text: PDF (344KB)
Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, but properties of features have not been well investigated. In most cases, a group of features is preselected, yet important feature properties are not well used to select features. In this paper, we introduce a regularization based feature selection algorithm to leverage... Read More »
Nonparametric Label-to-Region by search
Xiaobai Liu Shuicheng Yan Jiebo Luo Jinhui Tang Zhongyang Huango Hai JinPage(s): 3320 - 3327
Digital Object Identifier : 10.1109/CVPR.2010.5540033
AbstractPlus | Full Text: PDF (2413KB)
In this work, we investigate how to propagate annotated labels for a given single image from the image-level to their corresponding semantic regions, namely Label-to-Region (L2R), by utilizing the auxiliary knowledge from Internet image search with the annotated image labels as queries. A nonparametric solution is proposed to perform L2R for single image with complete labels. First, each label of the image is used as query for online image search engines to obtain a set of semantically related a... Read More »
CRAM: Compact representation of actions in movies
Rodriguez, M.Page(s): 3328 - 3335
Digital Object Identifier : 10.1109/CVPR.2010.5540030
AbstractPlus | Full Text: PDF (6878KB)
Thousands of hours of video are recorded every second across the world. Due to the fact that searching for a particular event of interest within hours of video is time consuming, most captured videos are never examined, and are only used in a post-factum manner. In this work, we introduce activity-specific video summaries, which provide an effective means of browsing and indexing video based on a set of events of interest. Our method automatically generates a compact video representation of a lo... Read More »
Building and using a semantivisual image hierarchy
Li-Jia Li Chong Wang Yongwhan Lim Blei, D.M. Li Fei-FeiPage(s): 3336 - 3343
Digital Object Identifier : 10.1109/CVPR.2010.5540027
AbstractPlus | Full Text: PDF (9358KB)
A semantically meaningful image hierarchy can ease the human effort in organizing thousands and millions of pictures (e.g., personal albums), and help to improve performance of end tasks such as image annotation and classification. Previous work has focused on using either low-level image features or textual tags to build image hierarchies, resulting in limited success in their general usage. In this paper, we propose a method to automatically discover the “semantivisual” image hie... Read More »
Weakly-supervised hashing in kernel space
Yadong Mu Jialie Shen Shuicheng YanPage(s): 3344 - 3351
Digital Object Identifier : 10.1109/CVPR.2010.5540024
AbstractPlus | Full Text: PDF (281KB)
The explosive growth of the vision data motivates the recent studies on efficient data indexing methods such as locality-sensitive hashing (LSH). Most existing approaches perform hashing in an unsupervised way. In this paper we move one step forward and propose a supervised hashing method, i.e., the LAbel-regularized Max-margin Partition (LAMP) algorithm. The proposed method generates hash functions in weakly-supervised setting, where a small portion of sample pairs are manually labeled to be &#... Read More »
Spatial-bag-of-features
Yang Cao Changhu Wang Zhiwei Li Liqing Zhang Lei ZhangPage(s): 3352 - 3359
Digital Object Identifier : 10.1109/CVPR.2010.5540021
AbstractPlus | Full Text: PDF (8919KB)
In this paper, we study the problem of large scale image retrieval by developing a new class of bag-of-features to encode geometric information of objects within an image. Beyond existing orderless bag-of-features, local features of an image are first projected to different directions or points to generate a series of ordered bag-of-features, based on which different families of spatial bag-of-features are designed to capture the invariance of object translation, rotation, and scaling. Then the ... Read More »
Locality-constrained Linear Coding for image classification
Jinjun Wang Jianchao Yang Kai Yu Fengjun Lv Huang, T. Yihong GongPage(s): 3360 - 3367
Digital Object Identifier : 10.1109/CVPR.2010.5540018
AbstractPlus | Full Text: PDF (3566KB)
The traditional SPM approach based on bag-of-features (BoF) requires nonlinear classifiers to achieve good image classification performance. This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM. LLC utilizes the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation. With linear... Read More »
Semantic context modeling with maximal margin Conditional Random Fields for automatic image annotation
Yu Xiang Xiangdong Zhou Zuotao Liu Tat-Seng Chua Chong-Wah NgoPage(s): 3368 - 3375
Digital Object Identifier : 10.1109/CVPR.2010.5540015
AbstractPlus | Full Text: PDF (1205KB) | Multimedia
Context modeling for Vision Recognition and Automatic Image Annotation (AIA) has attracted increasing attentions in recent years. For various contextual information and resources, semantic context has been exploited in AIA and brings promising results. However, previous works either casted the problem into structural classification or adopted multi-layer modeling, which suffer from the problems of scalability or model efficiency. In this paper, we propose a novel discriminative Conditional Rando... Read More »
Image retrieval via probabilistic hypergraph ranking
Yuchi Huang Qingshan Liu Shaoting Zhang Metaxas, D.N.Page(s): 3376 - 3383
Digital Object Identifier : 10.1109/CVPR.2010.5540012
AbstractPlus | Full Text: PDF (447KB)
Large-scale image retrieval with compressed Fisher vectors
Perronnin, F. Yan Liu Sánchez, J. Poirier, H.Page(s): 3384 - 3391
Digital Object Identifier : 10.1109/CVPR.2010.5540009
AbstractPlus | Full Text: PDF (2109KB)
The problem of large-scale image search has been traditionally addressed with the bag-of-visual-words (BOV). In this article, we propose to use as an alternative the Fisher kernel framework. We first show why the Fisher representation is well-suited to the retrieval problem: it describes an image by what makes it different from other images. One drawback of the Fisher vector is that it is high-dimensional and, as opposed to the BOV, it is dense. The resulting memory and computational costs do no... Read More »
Optimizing kd-trees for scalable visual descriptor indexing
You Jia Jingdong Wang Gang Zeng Hongbin Zha Xian-Sheng HuaPage(s): 3392 - 3399
Digital Object Identifier : 10.1109/CVPR.2010.5540006
AbstractPlus | Full Text: PDF (168KB)
In this paper, we attempt to scale up the kd-tree indexing methods for large-scale vision applications, e.g., indexing a large number of SIFT features and other types of visual descriptors. To this end, we propose an effective approach to generate near-optimal binary space partitioning and need low time cost to access the nodes in the query stage. First, we relax the coordinate-axis-alignment constraint in partition axis selection used in conventional kd-trees, and form a partition axis with the... Read More »
Content-aware Ranking for visual search
Bo Geng Linjun Yang Chao Xu Xian-Sheng HuaPage(s): 3400 - 3407
Digital Object Identifier : 10.1109/CVPR.2010.5540003
AbstractPlus | Full Text: PDF (411KB)
The ranking models of existing image/video search engines are generally based on associated text while the visual content is actually neglected. Imperfect search results frequently appear due to the mismatch between the textual features and the actual visual content. Visual reranking, in which visual information is applied to refine text based search results, has been proven to be effective. However, the improvement brought by visual reranking is limited, and the main reason is that the errors i... Read More »
Topic regression multi-modal Latent Dirichlet Allocation for image annotation
Putthividhy, D. Attias, H.T. Nagarajan, S.S.Page(s): 3408 - 3415
Digital Object Identifier : 10.1109/CVPR.2010.5540000
AbstractPlus | Full Text: PDF (1591KB)
We present topic-regression multi-modal Latent Dirich-let Allocation (tr-mmLDA), a novel statistical topic model for the task of image and video annotation. At the heart of our new annotation model lies a novel latent variable regression approach to capture correlations between image or video features and annotation texts. Instead of sharing a set of latent topics between the 2 data modalities as in the formulation of correspondence LDA in, our approach introduces a regression module to correlat... Read More »
Unsupervised discovery of co-occurrence in sparse high dimensional data
Chum, O. Matas, J.Page(s): 3416 - 3423
Digital Object Identifier : 10.1109/CVPR.2010.5539997
AbstractPlus | Full Text: PDF (2259KB)
An efficient min-Hash based algorithm for discovery of dependencies in sparse high-dimensional data is presented. The dependencies are represented by sets of features co-occurring with high probability and are called co-ocsets. Sparse high dimensional descriptors, such as bag of words, have been proven very effective in the domain of image retrieval. To maintain high efficiency even for very large data collection, features are assumed independent. We show experimentally that co-ocsets are not ra... Read More »
Semi-supervised hashing for scalable image retrieval
Jun Wang Kumar, S. Shih-Fu ChangPage(s): 3424 - 3431
Digital Object Identifier : 10.1109/CVPR.2010.5539994
AbstractPlus | Full Text: PDF (512KB)
Large scale image search has recently attracted considerable attention due to easy availability of huge amounts of data. Several hashing methods have been proposed to allow approximate but highly efficient search. Unsupervised hashing methods show good performance with metric distances but, in image search, semantic similarity is usually given in terms of labeled pairs of images. There exist supervised hashing methods that can handle such semantic similarity but they are prone to overfitting whe... Read More »
Image webs: Computing and exploiting connectivity in image collections
Heath, K. Gelfand, N. Ovsjanikov, M. Aanjaneya, M. Guibas, L.J.Page(s): 3432 - 3439
Digital Object Identifier : 10.1109/CVPR.2010.5539991
AbstractPlus | Full Text: PDF (3464KB) | Multimedia
The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called Image Webs to represent such connections. While earlier efforts studied local neighborhoods of such graphs, we are interested in understanding global structure and exploiting connectivity at larger sca... Read More »
Tag-based web photo retrieval improved by batch mode re-tagging
Lin Chen Dong Xu Tsang, I.W. Jiebo LuoPage(s): 3440 - 3446
Digital Object Identifier : 10.1109/CVPR.2010.5539988
AbstractPlus | Full Text: PDF (1738KB)
Web photos in social media sharing websites such as Flickr are generally accompanied by rich but noisy textual descriptions (tags, captions, categories, etc.). In this paper, we proposed a tag-based photo retrieval framework to improve the retrieval performance for Flickr photos by employing a novel batch mode re-tagging method. The proposed batch mode re-tagging method can automatically refine noisy tags of a group of Flickr photos uploaded by the same user within a short period by leveraging m... Read More »
Finding meaning on YouTube: Tag recommendation and category discovery
Toderici, G. Aradhye, H. Pasca, M. Sbaiz, L. Yagnik, J.Page(s): 3447 - 3454
Digital Object Identifier : 10.1109/CVPR.2010.5539985
AbstractPlus | Full Text: PDF (613KB)
We present a system that automatically recommends tags for YouTube videos solely based on their audiovisual content. We also propose a novel framework for unsupervised discovery of video categories that exploits knowledge mined from the World-Wide Web text documents/searches. First, video content to tag association is learned by training classifiers that map audiovisual content-based features from millions of videos on YouTube.com to existing uploader-supplied tags for these videos. When a new v... Read More »
Discovering scene categories by information projection and cluster sampling
Dengxin Dai Tianfu Wut Song-Chun ZhuPage(s): 3455 - 3462
Digital Object Identifier : 10.1109/CVPR.2010.5539982
AbstractPlus | Full Text: PDF (938KB)
Total Bregman divergence and its applications to shape retrieval
Meizhu Liu Vemuri, B.C. Amari, S.-I. Nielsen, F.Page(s): 3463 - 3468
Digital Object Identifier : 10.1109/CVPR.2010.5539979
AbstractPlus | Full Text: PDF (1101KB)
Shape database search is ubiquitous in the world of bio-metric systems, CAD systems etc. Shape data in these domains is experiencing an explosive growth and usually requires search of whole shape databases to retrieve the best matches with accuracy and efficiency for a variety of tasks. In this paper, we present a novel divergence measure between any two given points in Rn or two distribution functions. This divergence measures the orthogonal distance between the tangent to the convex... Read More »
Scalable face image retrieval with identity-based quantization and multi-reference re-ranking
Zhong Wu Qifa Ke Jian Sun Heung-Yeung ShumPage(s): 3469 - 3476
Digital Object Identifier : 10.1109/CVPR.2010.5539976
AbstractPlus | Full Text: PDF (4772KB)
State-of-the-art image retrieval systems achieve scalability by using bag-of-words representation and textual retrieval methods, but their performance degrades quickly in the face image domain, mainly because they 1) produce visual words with low discriminative power for face images, and 2) ignore the special properties of the faces. The leading features for face recognition can achieve good retrieval performance, but these features are not suitable for inverted indexing as they are high-dimensi... Read More »
Compact projection: Simple and efficient near neighbor search with practical memory requirements
Kerui Min Linjun Yang Wright, J. Lei Wu Xian-Sheng Hua Yi MaPage(s): 3477 - 3484
Digital Object Identifier : 10.1109/CVPR.2010.5539973
AbstractPlus | Full Text: PDF (388KB)
Image similarity search is a fundamental problem in computer vision. Efficient similarity search across large image databases depends critically on the availability of compact image representations and good data structures for indexing them. Numerous approaches to the problem of generating and indexing image codes have been presented in the literature, but existing schemes generally lack explicit estimates of the number of bits needed to effectively index a given large image database. We present... Read More »
SUN database: Large-scale scene recognition from abbey to zoo
Jianxiong Xiao Hays, J. Ehinger, K.A. Oliva, A. Torralba, A.Page(s): 3485 - 3492
Digital Object Identifier : 10.1109/CVPR.2010.5539970
AbstractPlus | Full Text: PDF (3065KB)
Scene categorization is a fundamental problem in computer vision. However, scene understanding research has been constrained by the limited scope of currently-used databases which do not capture the full variety of scene categories. Whereas standard databases for object categorization contain hundreds of different classes of objects, the largest available dataset of scene categories contains only 15 classes. In this paper we propose the extensive Scene UNderstanding (SUN) database that contains ... Read More »
Visual classification with multi-task joint sparse representation
Xiao-Tong Yuan Shuicheng YanPage(s): 3493 - 3500
Digital Object Identifier : 10.1109/CVPR.2010.5539967
AbstractPlus | Full Text: PDF (1049KB)
We address the problem of computing joint sparse representation of visual signal across multiple kernel-based representations. Such a problem arises naturally in supervised visual recognition applications where one aims to reconstruct a test sample with multiple features from as few training subjects as possible. We cast the linear version of this problem into a multi-task joint covariate selection model, which can be very efficiently optimized via ker-nelizable accelerated proximal gradient met... Read More »
Classification and clustering via dictionary learning with structured incoherence and shared features
Ramirez, I. Sprechmann, P. Sapiro, G.Page(s): 3501 - 3508
Digital Object Identifier : 10.1109/CVPR.2010.5539964
AbstractPlus | Full Text: PDF (2727KB)
A clustering framework within the sparse modeling and dictionary learning setting is introduced in this work. Instead of searching for the set of centroid that best fit the data, as in k-means type of approaches that model the data as distributions around discrete points, we optimize for a set of dictionaries, one for each cluster, for which the signals are best reconstructed in a sparse coding manner. Thereby, we are modeling the data as a union of learned low dimensional subspaces, and data po... Read More »
The automatic design of feature spaces for local image descriptors using an ensemble of non-linear feature extractors
Carneiro, G.Page(s): 3509 - 3516
Digital Object Identifier : 10.1109/CVPR.2010.5539961
AbstractPlus | Full Text: PDF (1647KB)
The design of feature spaces for local image descriptors is an important research subject in computer vision due to its applicability in several problems, such as visual classification and image matching. In order to be useful, these descriptors have to present a good trade off between discriminating power and robustness to typical image deformations. The feature spaces of the most useful local descriptors have been manually designed based on the goal above, but this design often limits the use ... Read More »
Supervised translation-invariant sparse coding
Jianchao Yang Kai Yu Huang, T.Page(s): 3517 - 3524
Digital Object Identifier : 10.1109/CVPR.2010.5539958
AbstractPlus | Full Text: PDF (235KB)
In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via back-projection, by minimizing the training error of classifying the image level features, which are extracted by max pooling over the sparse codes within a spatial pyramid. Such a max pooling procedure across multiple spatial scales offer the model translation invariant properties, similar to the Convolutiona... Read More »
Comparative object similarity for improved recognition with few or no examples
Gang Wang Forsyth, D. Hoiem, D.Page(s): 3525 - 3532
Digital Object Identifier : 10.1109/CVPR.2010.5539955
AbstractPlus | Full Text: PDF (1325KB)
Learning models for recognizing objects with few or no training examples is important, due to the intrinsic long-tailed distribution of objects in the real world. In this paper, we propose an approach to use comparative object similarity. The key insight is that: given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. We develo... Read More »
Bayes optimal kernel discriminant analysis
Di You Martinez, A.M.Page(s): 3533 - 3538
Digital Object Identifier : 10.1109/CVPR.2010.5539952
AbstractPlus | Full Text: PDF (188KB)
Efficient additive kernels via explicit feature maps
Vedaldi, A. Zisserman, A.Page(s): 3539 - 3546
Digital Object Identifier : 10.1109/CVPR.2010.5539949
AbstractPlus | Full Text: PDF (226KB)
Maji and Berg have recently introduced an explicit feature map approximating the intersection kernel. This enables efficient learning methods for linear kernels to be applied to the non-linear intersection kernel, expanding the applicability of this model to much larger problems. In this paper we generalize this idea, and analyse a large family of additive kernels, called homogeneous, in a unified framework. The family includes the intersection, Hellinger's, and χ2 kernels comm... Read More »
Data driven mean-shift belief propagation for non-gaussian MRFs
Minwoo Park Kashyap, S. Collins, R.T. Yanxi LiuPage(s): 3547 - 3554
Digital Object Identifier : 10.1109/CVPR.2010.5539946
AbstractPlus | Full Text: PDF (691KB) | Multimedia
We introduce a novel data-driven mean-shift belief propagation (DDMSBP) method for non-Gaussian MRFs, which often arise in computer vision applications. With the aid of scale space theory, optimization of non-Gaussian, multimodal MRF models using DDMSBP becomes less sensitive to local maxima. This is a significant improvement over standard BP inference, and extends the range of methods that are computationally tractable. In particular, when pair-wise potentials are Gaussians, the time complexity... Read More »
Local features are not lonely – Laplacian sparse coding for image classification
Shenghua Gao Tsang, I.W. Liang-Tien Chia Peilin ZhaoPage(s): 3555 - 3561
Digital Object Identifier : 10.1109/CVPR.2010.5539943
AbstractPlus | Full Text: PDF (884KB)
Sparse coding which encodes the original signal in a sparse signal space, has shown its state-of-the-art performance in the visual codebook generation and feature quantization process of BoW based image representation. However, in the feature quantization process of sparse coding, some similar local features may be quantized into different visual words of the codebook due to the sensitiveness of quantization. In this paper, to alleviate the impact of this problem, we propose a Laplacian sparse c... Read More »
Factorization towards a classifier
Qiang Chen Shuicheng Yan Tian-Tsong NgPage(s): 3562 - 3569
Digital Object Identifier : 10.1109/CVPR.2010.5539940
AbstractPlus | Full Text: PDF (314KB)
In practice, nonnegative data factorization is often performed for data dimensionality reduction prior to a classification task using a classifier which is effective in low dimensional space such as nearest neighbor classifier. In this work, we propose a novel formulation to learn a multi-class classifier directly through a supervised nonnegative data factorization. This new formulation has the following properties: 1) the nonnegative data matrix is approximated as the product of a nonnegative b... Read More »
Online multi-class LPBoost
Saffari, A. Godec, M. Pock, T. Leistner, C. Bischof, H.Page(s): 3570 - 3577
Digital Object Identifier : 10.1109/CVPR.2010.5539937
AbstractPlus | Full Text: PDF (559KB) | Multimedia
Online boosting is one of the most successful online learning algorithms in computer vision. While many challenging online learning problems are inherently multi-class, online boosting and its variants are only able to solve binary tasks. In this paper, we present Online Multi-Class LPBoost (OMCLP) which is directly applicable to multi-class problems. From a theoretical point of view, our algorithm tries to maximize the multi-class soft-margin of the samples. In order to solve the LP problem in ... Read More »
Sparse representation using nonnegative curds and whey
Yanan Liu Fei Wu Zhihua Zhang Yueting Zhuang Shuicheng YanPage(s): 3578 - 3585
Digital Object Identifier : 10.1109/CVPR.2010.5539934
AbstractPlus | Full Text: PDF (1593KB)
It has been of great interest to find sparse and/or nonnegative representations in computer vision literature. In this paper we propose a novel method to such a purpose and refer to it as nonnegative curds and whey (NNCW). The NNCW procedure consists of two stages. In the first stage we consider a set of sparse and nonnegative representations of a test image, each of which is a linear combination of the images within a certain class, by solving a set of regression-type nonnegative matrix factori... Read More »
Multi-structure model selection via kernel optimisation
Tat-Jun Chin Suter, D. Hanzi WangPage(s): 3586 - 3593
Digital Object Identifier : 10.1109/CVPR.2010.5539931
AbstractPlus | Full Text: PDF (844KB)
Our goal is to fit the multiple instances (or structures) of a generic model existing in data. Here we propose a novel model selection scheme to estimate the number of genuine structures present. In contrast to conventional model selection approaches, our method is driven by kernel-based learning. The input data is first clustered based on their potential to have emerged from the same structure. However the number of clusters is deliberately overestimated to obtain a set of initial model fits on... Read More »
Data fusion through cross-modality metric learning using similarity-sensitive hashing
Bronstein, M.M. Bronstein, A.M. Michel, F. Paragios, N.Page(s): 3594 - 3601
Digital Object Identifier : 10.1109/CVPR.2010.5539928
AbstractPlus | Full Text: PDF (2079KB)
Visual understanding is often based on measuring similarity between observations. Learning similarities specific to a certain perception task from a set of examples has been shown advantageous in various computer vision and pattern recognition problems. In many important applications, the data that one needs to compare come from different representations or modalities, and the similarity between such data operates on objects that may have different and often incommensurable structure and dimensi... Read More »
Pareto discriminant analysis
Abou-Moustafa, K.T. de la Torre, F. Ferrie, F.P.Page(s): 3602 - 3609
Digital Object Identifier : 10.1109/CVPR.2010.5539925
AbstractPlus | Full Text: PDF (242KB)
Linear Discriminant Analysis (LDA) is a popular tool for multiclass discriminative dimensionality reduction. However, LDA suffers from two major problems: (1) It only optimizes the Bayes error for the case of unimodal Gaussian classes with equal covariances (assuming full rank matrices) and, (2) The multiclass extension maximizes the sum of pairwise distances between the classes, and does not “simultaneously” maximize each pairwise distance between the classes. This typically resul... Read More »
Sufficient dimension reduction for visual sequence classification
Shyr, A. Urtasun, R. Jordan, M.I.Page(s): 3610 - 3617
Digital Object Identifier : 10.1109/CVPR.2010.5539922
AbstractPlus | Full Text: PDF (658KB)
Fast sparse representation with prototypes
Jia-Bin Huang Ming-Hsuan YangPage(s): 3618 - 3625
Digital Object Identifier : 10.1109/CVPR.2010.5539919
AbstractPlus | Full Text: PDF (1957KB)
Sparse representation has found applications in numerous domains and recent developments have been focused on the convex relaxation of the lo-norm minimization for sparse coding (i.e., the ℓ1-norm minimization). Nevertheless, the time and space complexities of these algorithms remain significantly high for large-scale problems. As signals in most problems can be modeled by a small set of prototypes, we propose an algorithm that exploits this property and show that the ℓ... Read More »
ℓp norm multiple kernel Fisher discriminant analysis for object and image categorisation
Fei Yan Mikolajczyk, K. Barnard, M. Hongping Cai Kittler, J. Page(s): 3626 - 3632
Digital Object Identifier : 10.1109/CVPR.2010.5539916
AbstractPlus | Full Text: PDF (253KB)
In this paper, we generalise multiple kernel Fisher discriminant analysis (MK-FDA) such that the kernel weights can be regularised with an ℓp norm for any p ≥ 1, in contrast to existing MK-FDA that uses either l1 or l2 norm. We present formulations for both binary and multiclass cases and solve the associated optimisation problems efficiently with semi-infinite programming. We show on three object and image categorisation benchmarks that by learning the intrinsic sparsit... Read More »
Author Index
Page(s): 1 - 120Digital Object Identifier : 10.1109/CVPR.2010.5539913
AbstractPlus | Full Text: PDF (551KB)
'논문관련' 카테고리의 다른 글
IEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELIGENCE / Volume 33 Issue 1 (0) | 2010.12.02 |
---|---|
HCI International NEWS - July 2010 - Number 42 (0) | 2010.07.08 |
EndNote X4 (0) | 2010.07.08 |
인터넷정보과학회 학술발표대회 (0) | 2010.04.15 |
ACM/IFIP/USENIX 11th International Middleware Conference (0) | 2010.04.15 |