1-5hit |
Many learning machines that have hierarchical structure or hidden variables are now being used in information science, artificial intelligence, and bioinformatics. However, several learning machines used in such fields are not regular but singular statistical models, hence their generalization performance is still left unknown. To overcome these problems, in the previous papers, we proved new equations in statistical learning, by which we can estimate the Bayes generalization loss from the Bayes training loss and the functional variance, on the condition that the true distribution is a singularity contained in a learning machine. In this paper, we prove that the same equations hold even if a true distribution is not contained in a parametric model. Also we prove that, the proposed equations in a regular case are asymptotically equivalent to the Takeuchi information criterion. Therefore, the proposed equations are always applicable without any condition on the unknown true distribution.
Shinichi NAKAJIMA Sumio WATANABE
In unidentifiable models, the Bayes estimation has the advantage of generalization performance over the maximum likelihood estimation. However, accurate approximation of the posterior distribution requires huge computational costs. In this paper, we consider an alternative approximation method, which we call a subspace Bayes approach. A subspace Bayes approach is an empirical Bayes approach where a part of the parameters are regarded as hyperparameters. Consequently, in some three-layer models, this approach requires much less computational costs than Markov chain Monte Carlo methods. We show that, in three-layer linear neural networks, a subspace Bayes approach is asymptotically equivalent to a positive-part James-Stein type shrinkage estimation, and theoretically clarify its generalization error and training error. We also discuss the domination over the maximum likelihood estimation and the relation to the variational Bayes approach.
Many learning machines such as normal mixtures and layered neural networks are not regular but singular statistical models, because the map from a parameter to a probability distribution is not one-to-one. The conventional statistical asymptotic theory can not be applied to such learning machines because the likelihood function can not be approximated by any normal distribution. Recently, new statistical theory has been established based on algebraic geometry and it was clarified that the generalization and training errors are determined by two birational invariants, the real log canonical threshold and the singular fluctuation. However, their concrete values are left unknown. In the present paper, we propose a new concept, a quasi-regular case in statistical learning theory. A quasi-regular case is not a regular case but a singular case, however, it has the same property as a regular case. In fact, we prove that, in a quasi-regular case, two birational invariants are equal to each other, resulting that the symmetry of the generalization and training errors holds. Moreover, the concrete values of two birational invariants are explicitly obtained, hence the quasi-regular case is useful to study statistical learning theory.
This paper proposes a practical training algorithm for artificial neural networks, by which both the optimally pruned model and the optimally trained parameter for the minimum prediction error can be found simultaneously. In the proposed algorithm, the conventional information criterion is modified into a differentiable function of weight parameters, and then it is minimized while being controlled back to the conventional form. Since this method has several theoretical problems, its effectiveness is examined by computer simulations and by an application to practical ultrasonic image reconstruction.
The test of homogeneity for normal mixtures has been used in various fields, but its theoretical understanding is limited because the parameter set for the null hypothesis corresponds to singular points in the parameter space. In this paper, we shed a light on this issue from a new perspective, variational Bayes, and offer a theory for testing homogeneity based on it. Conventional theory has not reveal the stochastic behavior of the variational free energy, which is necessary for constructing a hypothesis test, has remained unknown. We clarify it for the first time and construct a new test base on it. Numerical experiments show the validity of our results.