# American Institute of Mathematical Sciences

November  2020, 3(4): 219-227. doi: 10.3934/mfc.2020013

## Sketch-based image retrieval via CAT loss with elastic net regularization

 1 School of Statistics and Mathematics, Big Data and Educational Statistics Application Laboratory, Collaborative Innovation Development Center of Pearl River Delta Science & Technology Finance Industry, Guangdong University of Finance & Economics, Guangzhou, Guangdong, 510320, China 2 School of Statistics and Mathematics, Guangdong University of Finance & Economics, Guangzhou, Guangdong, 510320, China 3 Information Science School, Guangdong University of Finance & Economics, Guangzhou, Guangdong, 510320, China

* Corresponding author: Jia Cai

Received  December 2019 Revised  March 2020 Published  November 2020 Early access  June 2020

Fund Project: The first author is supported partially by National Natural Science Foundation of China (11871167,11671171), Science and Technology Program of Guangzhou (201707010228), Special Support Plan for High-Level Talents of Guangdong Province (2019TQ05X571), Foundation of Guangdong Educational Committee (2019KZDZX1023), Project of Collaborative Innovation Development Center of Pearl River Delta Science & Technology Finance Industry (19XT01), National Social Science Foundation (19AJY027), Natural Science Foundation of Guangdong (2016A030313710)

Fine-grained sketch-based image retrieval (FG-SBIR) is an important problem that uses free-hand human sketch as queries to perform instance-level retrieval of photos. Human sketches are generally highly abstract and iconic, which makes FG-SBIR a challenging task. Existing FG-SBIR approaches using triplet loss with $\ell_2$ regularization or higher-order energy function to conduct retrieval performance, which neglect the feature gap between different domains (sketches, photos) and need to select the weight layer matrix. This yields high computational complexity. In this paper, we define a new CAT loss function with elastic net regularization based on attention model. It can close the feature gap between different subnetworks and embody the sparsity of the sketches. Experiments demonstrate that the proposed approach is competitive with state-of-the-art methods.

Citation: Jia Cai, Guanglong Xu, Zhensheng Hu. Sketch-based image retrieval via CAT loss with elastic net regularization. Mathematical Foundations of Computing, 2020, 3 (4) : 219-227. doi: 10.3934/mfc.2020013
##### References:

show all references

##### References:
Architecture of the model
Examples of stroke removal
Network structure
 $Index$ Layer Type Filter size Filter number Stride Pad Output size $0$ $Input$ $-$ $-$ $-$ $-$ $225\times225$ $1$ $L1$ $Conv$ $15\times15$ 64 3 0 $71\times71$ $2$ $ReLU$ $-$ $-$ $-$ $-$ $71\times71$ $3$ Maxpool $3\times3$ $-$ 2 0 $35\times35$ $4$ $L2$ $Conv$ $5\times5$ 128 1 0 $31\times31$ $5$ $ReLU$ $-$ $-$ $-$ $-$ $31\times31$ $6$ Maxpool $3\times3$ $-$ 2 0 $15\times15$ $7$ $L3$ $Conv$ $3\times3$ 256 1 1 $15\times15$ $8$ $ReLU$ $-$ $-$ $-$ $-$ $15\times15$ $9$ $L4$ $Conv$ $3\times3$ 256 1 1 $15\times15$ $10$ $ReLU$ $-$ $-$ $-$ $-$ $15\times15$ $11$ $L5$ $Conv$ $3\times3$ 256 1 1 $15\times15$ $12$ $ReLU$ $-$ $-$ $-$ $-$ $15\times15$ $13$ Maxpool $3\times3$ $-$ 2 0 $7\times7$ $14$ $L6$ $Conv( = FC)$ $7\times7$ 512 1 $0$ $1\times1$ $15$ $ReLU$ $-$ $-$ $-$ $-$ $1\times1$ $16$ Dropout (0.55) $-$ $-$ $-$ $-$ $1\times1$ $17$ $L7$ $Conv( = FC)$ $1\times1$ 256 1 $0$ $1\times1$ $18$ $ReLU$ $-$ $-$ $-$ $-$ $1\times1$ $19$ Dropout (0.55) $-$ $-$ $-$ $-$ $1\times1$
 $Index$ Layer Type Filter size Filter number Stride Pad Output size $0$ $Input$ $-$ $-$ $-$ $-$ $225\times225$ $1$ $L1$ $Conv$ $15\times15$ 64 3 0 $71\times71$ $2$ $ReLU$ $-$ $-$ $-$ $-$ $71\times71$ $3$ Maxpool $3\times3$ $-$ 2 0 $35\times35$ $4$ $L2$ $Conv$ $5\times5$ 128 1 0 $31\times31$ $5$ $ReLU$ $-$ $-$ $-$ $-$ $31\times31$ $6$ Maxpool $3\times3$ $-$ 2 0 $15\times15$ $7$ $L3$ $Conv$ $3\times3$ 256 1 1 $15\times15$ $8$ $ReLU$ $-$ $-$ $-$ $-$ $15\times15$ $9$ $L4$ $Conv$ $3\times3$ 256 1 1 $15\times15$ $10$ $ReLU$ $-$ $-$ $-$ $-$ $15\times15$ $11$ $L5$ $Conv$ $3\times3$ 256 1 1 $15\times15$ $12$ $ReLU$ $-$ $-$ $-$ $-$ $15\times15$ $13$ Maxpool $3\times3$ $-$ 2 0 $7\times7$ $14$ $L6$ $Conv( = FC)$ $7\times7$ 512 1 $0$ $1\times1$ $15$ $ReLU$ $-$ $-$ $-$ $-$ $1\times1$ $16$ Dropout (0.55) $-$ $-$ $-$ $-$ $1\times1$ $17$ $L7$ $Conv( = FC)$ $1\times1$ 256 1 $0$ $1\times1$ $18$ $ReLU$ $-$ $-$ $-$ $-$ $1\times1$ $19$ Dropout (0.55) $-$ $-$ $-$ $-$ $1\times1$
Comparative results against baselines on QMUL-shoe dataset
 QMUL-shoe $Acc.@1$ $Acc.@10$ HOG+BoW+RankSVM 17.39% 67.83% Deep ISN 20.00% 62.61% Triplet SN 52.17% 92.17% Triplet DSSA 61.74% 94.78% Our model 56.52% 96.52%
 QMUL-shoe $Acc.@1$ $Acc.@10$ HOG+BoW+RankSVM 17.39% 67.83% Deep ISN 20.00% 62.61% Triplet SN 52.17% 92.17% Triplet DSSA 61.74% 94.78% Our model 56.52% 96.52%
Comparative results against baselines on QMUL-chair dataset
 QMUL-chair $Acc.@1$ $Acc.@10$ HOG+BoW+RankSVM 28.87% 67.01% Deep ISN 47.42% 82.47% Triplet SN 72.16% 98.96% Triplet DSSA 81.44% 95.88% Our model 81.44% 98.97%
 QMUL-chair $Acc.@1$ $Acc.@10$ HOG+BoW+RankSVM 28.87% 67.01% Deep ISN 47.42% 82.47% Triplet SN 72.16% 98.96% Triplet DSSA 81.44% 95.88% Our model 81.44% 98.97%
Comparative results against baselines on QMUL-handbag dataset
 QMUL-handbag $Acc.@1$ $Acc.@10$ HOG+BoW+RankSVM 2.38% 10.71% Deep ISN 9.52% 44.05% Triplet SN 39.88% 82.14% Triplet DSSA 49.40% 82.74% Our model 54.76% 88.69%
 QMUL-handbag $Acc.@1$ $Acc.@10$ HOG+BoW+RankSVM 2.38% 10.71% Deep ISN 9.52% 44.05% Triplet SN 39.88% 82.14% Triplet DSSA 49.40% 82.74% Our model 54.76% 88.69%
Contributions of different components
 QMUL-shoe $Acc.@1$ $Acc.@10$ Triplet loss+data aug 50.43% 93.91% CAT loss+no data aug 49.57% 94.78% Our model 54.78% 96.52% QMUL-chair $Acc.@1$ $Acc.@10$ Triplet loss+data aug 78.35% 97.94% CAT loss+no data aug 76.29% 96.91% Our model 81.44% 98.97% QMUL-handbag $Acc.@1$ $Acc.@10$ Triplet loss+data aug 51.19% 86.31% CAT loss+no data aug 51.79% 86.90% Our model 54.76% 88.69%
 QMUL-shoe $Acc.@1$ $Acc.@10$ Triplet loss+data aug 50.43% 93.91% CAT loss+no data aug 49.57% 94.78% Our model 54.78% 96.52% QMUL-chair $Acc.@1$ $Acc.@10$ Triplet loss+data aug 78.35% 97.94% CAT loss+no data aug 76.29% 96.91% Our model 81.44% 98.97% QMUL-handbag $Acc.@1$ $Acc.@10$ Triplet loss+data aug 51.19% 86.31% CAT loss+no data aug 51.79% 86.90% Our model 54.76% 88.69%
 [1] Yuantian Xia, Juxiang Zhou, Tianwei Xu, Wei Gao. An improved deep convolutional neural network model with kernel loss function in image classification. Mathematical Foundations of Computing, 2020, 3 (1) : 51-64. doi: 10.3934/mfc.2020005 [2] Jianguo Dai, Wenxue Huang, Yuanyi Pan. A category-based probabilistic approach to feature selection. Big Data & Information Analytics, 2018  doi: 10.3934/bdia.2017020 [3] Jianping Zhang, Ke Chen, Bo Yu, Derek A. Gould. A local information based variational model for selective image segmentation. Inverse Problems & Imaging, 2014, 8 (1) : 293-320. doi: 10.3934/ipi.2014.8.293 [4] Yangang Chen, Justin W. L. Wan. Numerical method for image registration model based on optimal mass transport. Inverse Problems & Imaging, 2018, 12 (2) : 401-432. doi: 10.3934/ipi.2018018 [5] Wei Zhu, Xue-Cheng Tai, Tony Chan. Augmented Lagrangian method for a mean curvature based image denoising model. Inverse Problems & Imaging, 2013, 7 (4) : 1409-1432. doi: 10.3934/ipi.2013.7.1409 [6] Haiying Liu, Wenjie Bi, Kok Lay Teo, Naxing Liu. Dynamic optimal decision making for manufacturers with limited attention based on sparse dynamic programming. Journal of Industrial & Management Optimization, 2019, 15 (2) : 445-464. doi: 10.3934/jimo.2018050 [7] Austin Lawson, Tyler Hoffman, Yu-Min Chung, Kaitlin Keegan, Sarah Day. A density-based approach to feature detection in persistence diagrams for firn data. Foundations of Data Science, 2021  doi: 10.3934/fods.2021012 [8] Baoli Shi, Zhi-Feng Pang, Jing Xu. Image segmentation based on the hybrid total variation model and the K-means clustering strategy. Inverse Problems & Imaging, 2016, 10 (3) : 807-828. doi: 10.3934/ipi.2016022 [9] Gabrielle Demange. Collective attention and ranking methods. Journal of Dynamics & Games, 2014, 1 (1) : 17-43. doi: 10.3934/jdg.2014.1.17 [10] Angel Angelov, Marcus Wagner. Multimodal image registration by elastic matching of edge sketches via optimal control. Journal of Industrial & Management Optimization, 2014, 10 (2) : 567-590. doi: 10.3934/jimo.2014.10.567 [11] Weihao Shen, Wenbo Xu, Hongyang Zhang, Zexin Sun, Jianxiong Ma, Xinlong Ma, Shoujun Zhou, Shijie Guo, Yuanquan Wang. Automatic segmentation of the femur and tibia bones from X-ray images based on pure dilated residual U-Net. Inverse Problems & Imaging, 2021, 15 (6) : 1333-1346. doi: 10.3934/ipi.2020057 [12] Jian Zhao, Fang Deng, Jian Jia, Chunmeng Wu, Haibo Li, Yuan Shi, Shunli Zhang. A new face feature point matrix based on geometric features and illumination models for facial attraction analysis. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1065-1072. doi: 10.3934/dcdss.2019073 [13] Qiang Yin, Gongfa Li, Jianguo Zhu. Research on the method of step feature extraction for EOD robot based on 2D laser radar. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1415-1421. doi: 10.3934/dcdss.2015.8.1415 [14] Junying Hu, Xiaofei Qian, Jun Pei, Changchun Tan, Panos M. Pardalos, Xinbao Liu. A novel quality prediction method based on feature selection considering high dimensional product quality data. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021099 [15] Zhijian Yang, Ke Li. Longtime dynamics for an elastic waveguide model. Conference Publications, 2013, 2013 (special) : 797-806. doi: 10.3934/proc.2013.2013.797 [16] Ye Yuan, Yan Ren, Xiaodong Liu, Jing Wang. Approach to image segmentation based on interval neutrosophic set. Numerical Algebra, Control & Optimization, 2020, 10 (1) : 1-11. doi: 10.3934/naco.2019028 [17] Jingwei Liang, Jia Li, Zuowei Shen, Xiaoqun Zhang. Wavelet frame based color image demosaicing. Inverse Problems & Imaging, 2013, 7 (3) : 777-794. doi: 10.3934/ipi.2013.7.777 [18] Amir Averbuch, Pekka Neittaanmäki, Valery Zheludev. Periodic spline-based frames for image restoration. Inverse Problems & Imaging, 2015, 9 (3) : 661-707. doi: 10.3934/ipi.2015.9.661 [19] Zhao Yi, Justin W. L. Wan. An inviscid model for nonrigid image registration. Inverse Problems & Imaging, 2011, 5 (1) : 263-284. doi: 10.3934/ipi.2011.5.263 [20] Jianli Xiang, Guozheng Yan. The uniqueness of the inverse elastic wave scattering problem based on the mixed reciprocity relation. Inverse Problems & Imaging, 2021, 15 (3) : 539-554. doi: 10.3934/ipi.2021004

Impact Factor:

## Tools

Article outline

Figures and Tables