Journal of vision | |
A parametric texture model based on deep convolutional features closely matches texture appearance for humans | |
Felix A. Wichmann1  Leon A. Gatys2  Matthias Bethge3  Alexander S. Ecker4  Christina M. Funke5  Thomas S. A. Wallis5  | |
[1] Neural Information Processing Group, Faculty of Science, Eberhard Karls Universität Tübingen, Bernstein Center for Computational Neuroscience, and the Max Planck Institute for Intelligent Systems, Empirical Inference Department, Tübingen, Germany;Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen and the Bernstein Center for Computational Neuroscience, Tübingen, Germany;Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, Bernstein Center for Computational Neuroscience, Institute for Theoretical Physics, Eberhard Karls Universität Tübingen, and the Max Planck Institute for Biological Cybernetics, Tübingen, Germany;Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, and Bernstein Center for Computational Neuroscience, Tübingen, Germany, and Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA;Werner Reichardt Center for Integrative Neuroscience, Eberhard Karls Universität Tübingen, and the Bernstein Center for Computational Neuroscience, Tübingen, Germany | |
关键词: psychophysics; texture perception; inspection; parafoveal region; avian crop; metals; pixel; perception; | |
DOI : 10.1167/17.12.5 | |
学科分类:眼科学 | |
来源: Association for Research in Vision and Ophthalmology | |
【 摘 要 】
Our visual environment is full of textureââstuffâ like cloth, bark, or gravel as distinct from âthingsâ like dresses, trees, or pathsâand humans are adept at perceiving subtle variations in material properties. To investigate image features important for texture perception, we psychophysically compare a recent parametric model of texture appearance (convolutional neural network [CNN] model) that uses the features encoded by a deep CNN (VGG-19) with two other models: the venerable Portilla and Simoncelli model and an extension of the CNN model in which the power spectrum is additionally matched. Observers discriminated model-generated textures from original natural textures in a spatial three-alternative oddity paradigm under two viewing conditions: when test patches were briefly presented to the near-periphery (âparafovealâ) and when observers were able to make eye movements to all three patches (âinspectionâ). Under parafoveal viewing, observers were unable to discriminate 10 of 12 original images from CNN model images, and remarkably, the simpler Portilla and Simoncelli model performed slightly better than the CNN model (11 textures). Under foveal inspection, matching CNN features captured appearance substantially better than the Portilla and Simoncelli model (nine compared to four textures), and including the power spectrum improved appearance matching for two of the three remaining textures. None of the models we test here could produce indiscriminable images for one of the 12 textures under the inspection condition. While deep CNN (VGG-19) features can often be used to synthesize textures that humans cannot discriminate from natural textures, there is currently no uniformly best model for all textures and viewing conditions.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201902199979702ZK.pdf | 7129KB | download |