\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Geometry and Generalization: Eigenvalues as predictors of where a network will fail to generalize

  • *Corresponding author: Susama Agarwala

    *Corresponding author: Susama Agarwala 
Abstract / Introduction Full Text(HTML) Figure(9) / Table(2) Related Papers Cited by
  • We study the deformation of the input space by a trained autoencoder via the Jacobians of the trained weight matrices. In doing so, we prove bounds for the mean squared errors for points in the input space, under assumptions regarding the orthogonality of the eigenvectors. We also show that the trace and the product of the eigenvalues of the Jacobian matrices is a good predictor of the mean squared errors on test points. This is a dataset independent means of testing an autoencoder's ability to generalize on new input. Namely, no knowledge of the dataset on which the network was trained is needed, only the parameters of the trained model.

    Mathematics Subject Classification: Primary: 53B50, 68Q99; Secondary: 53B12.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Commutative diagram of an autoencoder under the assumption of the data lying along a data manifold

    Figure 2.  The ratio of the $ L_1 $ norm of the difference in eigenvalues to the latent dimension (top) and the ratio of the $ L_2 $ norm of the difference in eigenvalues to the latent dimension (bottom) is small and decreases both in median and standard deviation as latent dimension increases

    Figure 3.  Distributions of the arguments of the eigenvalues of $ J_{ \mathcal{L}, n} $ for $ n \in \{3, 4,5\} $ (top). Distribution of the angle of rotation (absolute value of the arguments) of the eigenvalues of $ J_{ \mathcal{I}, 4} $ (bottom)

    Figure 4.  The arithmetic means of the eigenvalues start out close to one at low latent dimension, and decrease as latent dimension increases. This is true both in aggregate (top) and when the data is broken down by class as well (center). Note that at high latent dimension, the arithmetic means for class 1 is much lower than the rest of the classes. The median arithmetic means of $ \vec{\lambda}_ \mathcal{I} $ tend to be higher than the median arithmetic means of $ \vec{\lambda}_ \mathcal{L} $ (bottom)

    Figure 5.  The geometric means of the eigenvalues start out close to one at low latent dimension, and decrease as latent dimension increases. This is true both in aggregate (top) and when the data is broken down by class as well (center). Note that at high latent dimension, the geometric means for class 1 is much lower than the rest of the classes.The median geometric means of $ \vec{\lambda}_ \mathcal{I} $ tend to be higher than the median geometric means of $ \vec{\lambda}_ \mathcal{L} $ (bottom)

    Figure 6.  The number of eigenvalues which have absolute value less than $ 0.1 $, for $ J_{ \mathcal{L},n} $, $ n\in\{3,4,5\} $ (top), and the number of such eigenvalues, broken down by class, for $ J_{ \mathcal{L},n} $, $ n\in\{3,4,5\} $ (bottom). We observe that very few eigenvalues have absolute value less than $ 0.1 $, except in class 1. Even in class 1, the number of eigenvalues less than $ 0.1 $ is at most $ 2 $, in latent dimension $ 20 $

    Figure 7.  The proportion of points in $ \mathcal{D} $ such that $ \omega_{{\rm{Ł}}, n}(x)<0 $ ($ n \in \{3,4,5\} $) is greater on the training points than on the training points (top). There are more orientation reversing points for $ J_ \mathcal{I} $ than for $ J_ \mathcal{L} $ (bottom)

    Figure 8.  The expected increase (in units of standard deviation) of MSE for a test point given a standard deviation increase in log volume form increases with latent dimension. This figure shows the growth for $ J_{ \mathcal{L}, n} $, with $ n\in \{3,4,5\} $ in aggregate (top) and by class (bottom)

    Figure 9.  The expected increase (in units of standard deviation) of MSE for a test point given a standard deviation increase in trace increases with latent dimension.This figure shows the growth for $ J_{ \mathcal{L}, n} $, with $ n\in \{3,4,5\} $ in aggregate (top) and by class (bottom)

    Table 1.  Coefficients of the linear regression in equation (6) for $ \omega_ \mathcal{I} $ and $ \omega_ \mathcal{L} $ across latent dimensions. Run on data from seed 0, exp-default and 300 epochs

    dim Coefficients for the latent space $\mathcal{L}$ Coefficients for the input space $\mathcal{I}$
    Intercept log $ω\mathcal{L}$ test˙ind test˙log $ω\mathcal{L}$ test˙slope Intercept log $ω\mathcal{I}$ test˙ind test˙log $ω\mathcal{I}$ test˙slope
    2 -0.0144 -0.0897 0.017 0.0066 -0.083 -0.0341 0.067 0.021 -4E-04 0.067
    3 -0.0251 -0.0482 0.04 -0.0059 -0.054 -0.0351 0.08 0.032 0.0225 0.103
    4 -0.0244 -0.0335 0.05 -0.0205 -0.054 -0.0363 0.127 0.05 0.0092 0.136
    5 -0.0319 0.0014 0.092 -0.0364 -0.035 -0.0362 0.147 0.078 0.0505 0.197
    6 -0.0199 0.1632 0.08 -0.0102 0.153 -0.0252 0.234 0.074 0.0394 0.273
    7 -0.0166 0.2333 0.087 -0.0128 0.221 -0.0233 0.295 0.082 0.0225 0.317
    8 -0.0159 0.2641 0.073 -0.0087 0.255 -0.0187 0.312 0.061 0.0179 0.33
    9 -0.0111 0.343 0.066 0.0238 0.367 -0.0138 0.393 0.064 0.0335 0.427
    10 -0.0148 0.3665 0.073 0.0113 0.378 -0.0179 0.382 0.063 0.0114 0.393
    11 -0.0124 0.389 0.063 0.0199 0.409 -0.0134 0.402 0.06 0.011 0.414
    12 -0.0088 0.4204 0.037 0.0012 0.422 -0.0106 0.428 0.038 -0.0043 0.424
    13 -0.0123 0.4223 0.058 0.0044 0.427 -0.0122 0.426 0.058 0.0056 0.431
    14 -0.0075 0.4214 0.052 0.0218 0.443 -0.0106 0.426 0.046 0.0029 0.429
    15 -0.0111 0.453 0.056 0.0109 0.464 -0.012 0.457 0.051 0.0049 0.462
    16 -0.0106 0.4541 0.052 0.0162 0.47 -0.0123 0.458 0.049 0.0039 0.462
    17 -0.0071 0.4947 0.033 0.005 0.5 -0.0075 0.495 0.033 0.0068 0.501
    18 -0.0095 0.5159 0.055 0.013 0.529 -0.0099 0.515 0.053 0.0098 0.525
    19 -0.0095 0.4933 0.053 0.0093 0.503 -0.0092 0.496 0.048 0.0044 0.501
    20 -0.0094 0.4887 0.043 0.0069 0.496 -0.0094 0.492 0.04 0.0115 0.503
     | Show Table
    DownLoad: CSV

    Table 2.  Coefficients of the linear regression in equation (7) for $ {\rm{Tr}} J_ \mathcal{I} $ and $ {\rm{Tr}} J_ \mathcal{L} $ across latent dimensions. Run on data from seed 0, exp-default and 300 epochs

    dim Coefficients for the latent space ${\mathcal{L}}$ Coefficients for the input space ${\mathcal{I}}$
    Intercept Trace test˙ind test˙Trace test˙slope Intercept Trace test˙ind test˙Trace test˙slope
    2 -0.0024 -0.0018 0.016 -0.0526 -0.0544 -0.0024 0.096 0.017 0.0134 0.11
    3 -0.0054 0.0038 0.038 -0.0369 -0.0331 -0.0054 0.083 0.037 0.1177 0.2
    4 -0.0079 0.0151 0.055 -0.0096 0.0055 -0.0079 0.071 0.057 0.4135 0.48
    5 -0.0143 0.0658 0.1 -0.034 0.0318 -0.0126 0.221 0.089 -0.0217 0.2
    6 -0.0121 0.195 0.085 0.0112 0.2061 -0.0117 0.32 0.082 0.0427 0.36
    7 -0.0129 0.2323 0.09 -0.0023 0.2299 -0.0122 0.351 0.086 0.0473 0.4
    8 -0.0108 0.2753 0.075 -0.006 0.2693 -0.0091 0.369 0.063 0.0474 0.42
    9 -0.0103 0.3465 0.073 0.0345 0.3809 -0.0093 0.435 0.065 0.0556 0.49
    10 -0.0113 0.3638 0.08 0.016 0.3797 -0.0099 0.404 0.069 0.0366 0.44
    11 -0.0095 0.3906 0.067 0.0241 0.4147 -0.0092 0.433 0.065 0.0287 0.46
    12 -0.0059 0.4315 0.041 0.0026 0.4341 -0.0056 0.46 0.04 0.0045 0.46
    13 -0.0096 0.4296 0.067 0.0076 0.4372 -0.0091 0.454 0.065 0.0228 0.48
    14 -0.0085 0.4352 0.06 0.0227 0.4579 -0.0073 0.455 0.052 0.0223 0.48
    15 -0.0087 0.4595 0.061 0.017 0.4765 -0.0085 0.483 0.06 0.0186 0.5
    16 -0.0084 0.4722 0.06 0.0254 0.4975 -0.0084 0.489 0.059 0.0223 0.51
    17 -0.0067 0.5127 0.048 0.0083 0.5209 -0.0064 0.513 0.045 0.0131 0.53
    18 -0.0091 0.5361 0.065 0.022 0.5581 -0.0087 0.544 0.062 0.0177 0.56
    19 -0.009 0.5141 0.064 0.016 0.5302 -0.0082 0.53 0.058 0.0147 0.54
    20 -0.0078 0.5106 0.056 0.0239 0.5345 0.0072 0.52 0.052 0.0274 0.55
     | Show Table
    DownLoad: CSV
  • [1] A. Ansuini, A. Laio, J. H. Macke and D. Zoccolan, Intrinsic dimension of data representations in deep neural networks, In Advances in Neural Information Processing Systems, 2019, 6111–6122.
    [2] G. Arvanitidis, L. K. Hansen and S. Hauberg, Latent space oddity: On the curvature of deep generative models, In 6th International Conference on Learning Representations, ICLR, 2018.
    [3] G. Arvanitidis, S. Hauberg and B. Schölkopf, Geometrically enriched latent spaces, arXiv preprint, arXiv: 2008.00565, 2020.
    [4] M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin and N. Usunier, Parseval networks: Improving robustness to adversarial examples, In 34th International Conference on Machine Learning, ICML, 2017.
    [5] D. Eklund and S. Hauberg, Expected path length on random manifolds, arXiv preprint, arXiv: 1908.07377, 2019.
    [6] E. FaccoM. d'ErricoA. Rodriguez and A. Laio, Estimating the intrinsic dimension of datasets by a minimal neighborhood information, Scientific Reports, 7 (2017), 1-8. 
    [7] I. GoodfellowY. Bengio and  A. CourvilleDeep Learning, MIT Press, 2016. 
    [8] P. Guo, A Frobenius norm regularization method for convolutional kernels to avoid unstable gradient problem, arXiv preprint, arXiv: 1907.11235, 2019.
    [9] S. Hauberg, Only Bayes should learn a manifold (on the estimation of differential geometric structure from data), arXiv preprint, arXiv: 1806.04994, 2018.
    [10] D. Kingma and M. Welling, An introduction to variational autoencoders, Foundations and Trend in Machine Learning, 12 (2019), 307-392. 
    [11] T. V. Laarhoven, L2 regularization versus batch and weight normalization, arXiv preprint, arXiv: 1706.05350, 2017.
    [12] J. A. Lee and M. Verleysen, Nonlinear Dimensionality Reduction, Springer, New York, 2007. doi: 10.1007/978-0-387-39351-3.
    [13] C. Lowman, S. Agarwala and B. Dees, Geometry and Generalization, https://github.com/JHUAPL/geometry_and_generalization, June 2021.
    [14] L. McInnesJ. HealyN. Saul and L. Grossberger, UMAP: Uniform manifold approximation and projection, Journal of Open Source Software, 3 (2018), 861. 
    [15] T. Miyato, T. Kataoka, M. Koyama and Y. Yoshida, Spectral normalization for generative adversarial networks, In 6th International Conference on Learning Representations, ICLR, 2018.
    [16] A. RadhakrishnanM. Belkin and C. Uhler, Overparameterized neural networks implement associative memory, PNAS, 117 (2020), 27162-27170.  doi: 10.1073/pnas.2005013117.
    [17] A. RozzaG. LombardiM. RosaE. Casiraghi and P. Campadelli, IDEA: intrinsic dimension estimation algorithm, International Conference on Image Analysis and Processing-ICIAP 2011. Part I, 6978 (2011), 433-442.  doi: 10.1007/978-3-642-24085-0_45.
    [18] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks, 61 (2015), 85-117. 
    [19] K. Su, M. Zhang, J. Li, S. Du, K. Kawarabayashi and S. Jegelka, How neural networks extrapolate: Hrom feedforward to graph neural networks, In 9th International Conference on Learning Representations, ICLR, 2021.
    [20] A. Tosi, S. Hauberg, A. Vellido and N. D. Lawrence, Metrics for probabilistic geometries, In 30th Conference on Uncertainty in Artificial Intelligence (UAI 2014) Conference on Uncertainty in Artificial Intelligence, AUAI Press Corvallis, 2014,800–809.
    [21] L. van der Maaten, Learning a parametric embedding by preserving local structure, In Artificial Intelligence and Statistics, (2009), 384–391.
    [22] Y. Yoshida and T. Miyato, Spectral norm regularization for improving the generalizability of deep learning, arXiv preprint, arXiv: 1705.10941, 2017.
    [23] X. Zhan, Matrix Inequalities, Lecture Notes in Mathematics, vol 1790, Springer, Berlin, Heidelberg, 2002.
  • 加载中

Figures(9)

Tables(2)

SHARE

Article Metrics

HTML views(3766) PDF downloads(268) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return