Capsule Network, introduced in 2017 by Sabour, Hinton, and Frost, has sparked great interest in the computer vision and deep learning community and offers a paradigm shift in neural computation. In CapsNet, Sabour et. al. replace classical notions of scalar neural computation with a vectorised approach. This allows CapsNet to describe input images not only by the presence of constituent features but also by the pose of detected features, thus imparting view-point and pose invariance. Hinton’s group and the research community at large have applied CapsNets to a number of specific problems and achieved state-of-the-art performance. In contrast, this thesis studies CapsNet by applying it to complex real world datasets like CIFAR10 and CIFAR100 where the CapsNet’s performance is still unproven.We investigate the operational characteristics of CapsNet for the CIFAR10 problem and identify several practical limitations of Capsules that inhibit their performance in an industrial setting. The contribution of this research is the introduction of residual blocks of primary capsule layers. We developed a novel architecture for CIFAR10 classification, called ResCapsNet, and find that the model increases validation accuracy to 78.54% from 71.04% achieved by the baseline CapsNet, at the marginal cost of increasing the number of parameters from 22 million to 25 million. In addition, to extend the generalization of capsules into deeper networks, we discuss the application of Capsules as hidden layers in CIFAR100 classification and show that Capsules are largely ineffective in a latent unsupervised setting. For active supervision of hidden capsules, we propose methods to train hidden capsules as super-class detectors prior to final classification.
【 预 览 】
附件列表
Files
Size
Format
View
Application of capsule networks for image classification on complex datasets