Simultaneous optimal predictions under two seemingly unrelated linear random-effects models

This paper considers simultaneous optimal prediction and estimation problems in the context of linear random-effects models. Assume a pair of seemingly unrelated linear random-effects models (SULREMs) with the random-effects and the error terms correlated. Our aim is to find analytical formulas for calculating best linear unbiased predictors (BLUPs) of all unknown parameters in the two models by means of solving a constrained quadratic matrix optimization problem in the Lowner sense. We also present a variety of theoretical and statistical properties of the BLUPs under the two models.


1.
Introduction. Linear regression models are classic issues in statistical theory and are the common roots of many branches of current statistical theory. Although there has been a relatively systematical research results concerning linear regression models and their applications in the past centuries, one can still propose many theoretical and applied problems on linear models and approach these problems by way of various mathematical and statistical tools. In statistical data analysis and inference, people often encounter linear regression models that include random effects, or namely, linear random-effects models (LREMs). Such a kind of models are commonly used to analyze longitudinal and correlated data, which occur in a variety of fields including biostatistics, public health, psychometrics, educational measurement, and sociology. LREMs are available to account for the variability of model parameters due to different factors that influence a response variable. The problems of statistical inference on LREMs is now an important part in the data analysis, and a huge amount of literature on LREMs spreads in the fields of statistics and other disciplines. Seemingly unrelated linear random-effects models (SULREMs) are extensions of LREMs that allow correlated errors between the matrix regression equations in the models.
We now introduce the modeling framework concerning LREMs. In statistical analysis of data collected from different time periods, we may meet with the cases where two or more observable random vectors. For example, let y 1 and y 2 be two n 1 × 1 and n 2 × 1 vectors of observable random variables that have the following two different model structures M 1 : y 1 = X 1 β β β 1 + ε ε ε 1 , (1.1) M 2 : y 2 = X 2 β β β 2 + ε ε ε 2 , (1.2) where y i ∈ R ni×1 are vectors of observable response variables, X i ∈ R ni×pi are known matrices of arbitrary ranks, ε ε ε i ∈ R ni×1 are vectors of unobservable random errors, β β β i ∈ R pi×1 are an unknown vectors of satisfying where A i ∈ R pi×ki are two known matrices of arbitrary ranks, α α α i ∈ R ki×1 are two vectors of fixed but unknown parameters, and γ γ γ i ∈ R pi×1 are two vectors of unobservable random variables. In this situation, (1.1) and (1.2) are also classified as two-level hierarchical linear models; see e.g., [2,8,9,29] for more exposition of the background of hierarchical linear models in statistical data analysis and inference.
On the other hand, LREMs can be viewed as special forms of linear mixed-effects models. To see this, we substitute (1.3) into (1.1) and (1.2) to obtain the following two seemingly unrelated linear mixed-effects models with the fixed-effects vector α α α i and the random-effects vector γ γ γ i , i = 1, 2. The two models are said to be seemingly unrelated because there are no common unknown parameter vectors in them. However, there still exists a possibility to do statistical analysis simultaneously and obtain more accurate inference results concerning the two models under certain assumptions. One of such cases is to assume that the random vectors in the two models are correlated. In this paper, we assume that the expectation and the covariance matrix of the joint vector in (1.4) and (1.5) have the following general forms where Σ Σ Σ ∈ R (n+p)×(n+p) is a known nonnegative definite matrix of arbitrary rank and the submatrices Σ Σ Σ ij are nonzero for i = j, and n = n 1 + n 2 and p = p 1 + p 2 . Here we give no further restrictions to the patterns of the submatrices Σ Σ Σ ij in (1.6) although they are usually taken as certain prescribed forms for a given linear random-effects model in the statistical literature. In other words, if Σ Σ Σ is assumed to be unknown or is given with some parametric forms, such as, i and σ 2 are arbitrary positive scaling factors. In practice, if the matrices or parameters are unknown, people may firstly estimate them using the observed data in (1.1) and (1.2) and then substitute them into the models to make further statistical inference to (1.1) and (1.2). In order to make inference under (1.6), simultaneously, we assemble the two regression equations in (1.4) and (1.5) as follows In this situation, (1.1) and (1.2) can be obtained from the transformed model Cov(y 1 ) = Cov(X 1 γ γ γ 1 + ε ε ε 1 ) = [X 1 , Prediction analysis is a general inference method for predicting the accuracy of quantitative experiments. The method is applicable to experiments in which the data is to be analyzed by means of various optimization methods, such as the least squares method, the weighted least squares method, the best linear unbiased prediction method, etc. A convenient way of simultaneously estimating/predicting all unknown parameters in (1.4), (1.5), and (1.8) is to construct three general vectors as follows (1.14) φ φ φ = Fα α α + Gγ γ γ + Hε ε ε, (1.15) which encompass all the unknown vectors in (1.4), (1.5), and (1.8) as their special cases, where F i ∈ R s×ki , F ∈ R s×(k1+k2) , G i ∈ R s×pi , G ∈ R s×p , H i ∈ R s×ni , and H ∈ R s×n are known matrices of arbitrary ranks. So that (1.14) and (1.15) include all vector operations in (1.1)-(1.9) as their special cases. In these settings, we can readily obtain the following results for i = 1, 2. These preparations show that we can estimate/predict (1.14) and (1.15) from (1.1) and (1.2) separately or simultaneously. This idea for constructing general vectors as given in (1.14) and (1.15) was first given in [26] who showed a lemma on optimization of a matrix function in the Löwner partial ordering and established a unified theory of linear estimations/predictions of all unknown parameters in general linear models with fixed or mixed effects; see also [28,Lemma 4.7]. In addition, the work on separate and simultaneous estimations/predictions of unknown parameters in different models can be found in [1,[3][4][5][6][10][11][12][13][14]16,25,30,[38][39][40]; some new results concerning simultaneous linear estimations/predictions of all unknown parameters in LREMs with original and future observations were obtained in [33,34] by solving certain constrained quadratic matrix-valued function optimization problems in the Löwner partial ordering.
To account for general prediction/estimation problems of unknown parameters in a given linear regression model, it is common practice to first adopt a feasible procedure for obtaining exact expressions of predictors/estimators of the unknown parameters in the model. Tian recently developed an analytical method in [33,34] to solve certain types of constrained quadratic matrix-valued function optimization problem in the Löwner partial ordering, and used the method to examine some simultaneous linear estimations/predictions of all unknown parameters in LREMs with original and future observations; see also [7,11,15,17,18,31,32,[35][36][37] for a series of related approaches.
Similarly to the preceding work on LREMs, we are able to derive a group of theoretical inference conclusions under the assumptions in (1.1)-(1.18), including the analytical formulas for calculating the best linear unbiased predictors (BLUPs) of φ φ φ i and φ φ φ under the general assumptions in (1.1)-(1.18), and various mathematical and statistical properties and performances of these BLUPs under the given assumptions.
2. Notation and preliminaries. Let R m×n denote the collection of all m × n real matrices, A , r(A), and R(A) denote the transpose, the rank, and the range (column space) of a matrix A ∈ R m×n , respectively, I m denote the identity matrix of order m. The Moore-Penrose generalized inverse of A, denoted by A + , is defined to be the unique solution X satisfying the four Penrose matrix equations AGA = A, GAG = G, (AG) = AG, and (GA) = GA. The Moore-Penrose inverse of a matrix A was specially studied and recognized because AA + , A + A, I m − AA + , and I n − A + A are orthogonal projectors onto the ranges and kernels of A and A * , respectively, so that it optimizes a number of interesting properties of many matrix computation problems. In this paper, we denote by A ⊥ = E A = I m − AA + and F A = I n − A + A the two orthogonal projectors induced from A ("⊥" denotes the orthogonal projector onto the orthogonal complement of the range of a matrix), both of which satisfy E A = F A and F A = E A . Two symmetric matrices A and B of the same size are said to satisfy the inequality A B in the Löwner partial ordering if A − B is nonnegative definite. We refer the reader to the literature [19,20,22,27] for expositions of generalized inverse of matrices and applications to linear models.
It is well known that the statistics provides mathematicians with various challenging and exciting problems at different levels, since most of the problems in statistics arise from the real-life activities. To solve these problems, statisticians utilize knowledge from all parts of mathematics, from those very abstract to numerical computation and interpretation of the results. Moreover, every statistician is expected to find an optimal solution to a real problem by appel to various optimization techniques. There are plenty of classic and novel discussions in literature on derivations and representations of BLUPs under linear regression models, which motivate some deeper considerations and explorations of universal-algebraic methods dealing with the BLUP problems. Subsequently, we present two known fundamental and significant results and facts in mathematics concerning analytical solutions of a matrix equation and a constrained quadratic matrix optimization problem, which we shall use as active study tools in the establishment of BLUP theory under the preceding model assumptions.  holds for all solutions of LA = B. In such case, the matrix L 0 satisfying the above inequality is determined by the following consistent matrix equation In this case, the general expression of L 0 and the corresponding f (L 0 ) and f (L) are given by where K = BA + C + D, T = A ⊥ CMC A ⊥ + , and U ∈ R n×p is arbitrary.
3. Consistency of SULREMs and predictability of all unknown parameters in SULREMs. Parametric statistical inference is concerned with predictation/estimation of unknown parameters in a given parametric regression model, which is certain mathematical and computational process of drawing conclusions about scientific truths hidden behind the observed data. There are many modes of performing statistical inference, including statistical modeling, data oriented strategies, and explicit use of designs and randomization in analyses. In our approach to the simultaneous optimal predictions under two seemingly unrelated linear randomeffects models described above, we use the conventional concepts, definitions, and statistical techniques in the mainstream regression theory. We first introduce a couple of notations and definitions on consistency and estimability under the general assumptions in (1.1)-(1.18). Throughout, we denote by for i = 1, 2. These equalities imply that In this case, (1.1) and (1.2) are said to be consistent, respectively (cf. [23,24]).
It is well known in statistics that the establishment of the BLUP theory under linear regression models is quite straightforward, which only requires that expectations and covariance matrices related to unknown parameters in the models are given. We next introduce the definitions of the predictability and the BLUPs of φ φ φ i and φ φ φ in (1.14) and (1.15) for i = 1, 2. Definition 3.2. Let φ φ φ 1 , φ φ φ 2 , and φ φ φ be as given in (1.14) and (1.15), respectively. If there exist L i y i such that hold in the Löwner partial ordering, the linear statistics L i y i are defined to be the BLUPs of φ φ φ i in (1.14), and are denoted by Note from (1.4),(1.5) and (1.14) that Then, the expectations and covariance matrices of L i y i − φ φ φ i can be written as With these definitions and statistical facts in mind, we can convert the constrained covariance matrix minimization problem in (3.2) to a underlying mathematical minimization problem on quadratic matrix-valued function minimization problem of  In these cases, The matrix equations in where U i ∈ R s×ni are arbitrary, i = 1, 2. In addition, the following results hold.
for i = 1, 2. (e) The following BLUP decomposition equalities hold for i = 1, 2. (f) If φ φ φ 1 and φ φ φ 2 are predictable under (1.1) and (1.2), respectively, then T 1 φ φ φ 1 and T 2 φ φ φ 2 are predictable as well under (1.1) and (1.2), respectively, and Proof. It is obvious from (1.14) that From Lemma 2.1, the matrix equations are consistent respectively if and only if (4.1) hold. In these cases, we see from Lemma 2.1 that the first parts of (4.2) are equivalent to finding solutions L i of the consistent matrix equations L i X i = F i such that hold in the Löwner partial ordering. Further from Lemma 2.2, there always exist solutions L i of L i X i = F i such that (4.8) hold, and the L i are determined by the matrix equations    1) and (1.2), respectively. Then, X i β β β i , X i A i α α α i , X i γ γ γ i and ε ε ε i are all predictable and estimable under (1 .1) and (1.2), respectively, and the following decomposition equalities always hold, i = 1, 2, or equivalently, Proof. Setting φ φ φ i = y i , i = 1, 2, in (4.7) yields (4.9)-(4.12).
In the remainder of this section, we derive the BLUPs of φ φ φ 1 , φ φ φ 2 , φ φ φ in (1.14) and (1.15). We denote It follows from (1.8), (1.14), and (1.15) that for i = 1, 2. Hence, the expectations and the covariance matrices of Ky − φ φ φ and K i y − φ φ φ i can be written as   In this case, The matrix equation in (4.20) is consistent as well under (4.19), and the general solution of K and the corresponding BLUP of φ φ φ under (1.8) are given by where U ∈ R s×n is arbitrary.
In this case, The matrix equations in (4.23) are consistent, respectively, under (4.22), and the general solutions K i of the equations and the corresponding BLUPs of φ φ φ i under (1.8) are given by   (i) Xβ β β, X i β β β i , XAα α α, X i α α α i , Xγ γ γ, X i γ γ γ i , ε ε ε, and ε ε ε i are always predictable and estimable under (1.8), respectively, i = 1, 2, and the following BLUP decomposition equalities under (1.8) always hold or equivalently, Proof. It is obvious from (4.15), (4.16), and Lemma 2.1 that  .7), they cannot directly be used to make statistical inference if the covariance matrix structure in (1.6) is totally unknown. Instead, we can obtain various concrete conclusions with respect to the specified covariance matrix structures in (1.6). Moreover, Theorems 4.1-4.3 provide a group of standard criteria for the comparison of the efficiency of other kinds of estimators, such as, the ordinary least-square estimators and weighted leastsquare estimators of β β β i and α α α i in (1.1) and (1.2), as well as, (1.4) and (1.5), which are obtained without using the assumptions in (1.6) and (1.7) and have statistical performances different from the BLUPs/BLUEs of the unknown parameters. We shall discuss these problems in forthcoming papers. 5. Concluding remarks. In summary, we remark that this study concentrates primarily to some current points of interest in simultaneous optimal predictions under two seemingly unrelated linear random-effects models, in which, we set up theoretical analysis to the optimization prediction problems through use of some powerful mathematical and statistical optimization methods. We carefully and sensitively extends some classic conception and knowledge on BLUPs to reflect the wealth of relevant novel results revealed in the past several years. Because all the formulas and facts in the preceding theorems are represented in certain analytical expressions or formulas, they can easily be reduced to various specified conclusions when the model matrices and covariance matrix in (1.1) and (1.2) are given in certain prescribed formulations. For example, both (1.1) and (1.2) encompass certain types of LREMs that have partially common parameter vectors as their special cases, such as, y i = X i β β β + ε ε ε i , β β β = Aα α α + γ γ γ, i = 1, 2, y i = X i β β β i + ε ε ε i , β β β i = A i α α α + γ γ γ i , i = 1, 2 under which more specified statistical inference results can be obtained; please refer to [15,36] for the corresponding work. Besides, similar theoretical approaches can be conducted to some general types of seemingly unrelated regression models, for example, the following two seemingly unrelated linear mixed models y i = X i α α α i + Z i γ γ γ i + ε ε ε i , i = 1, 2, where the two observed random vectors y 1 and y 2 are correlated statistically. We believe that more theoretical results and facts about BLUPs and BLUEs under different kinds of linear regression models can be established by a similar approach. It is no doubt that previous and recent studies show that the classic concepts like BLUPs and BLUEs have vital roles in the statistical inference of regression models, which have thrown up many difficult problems concerning optimal prediction and estimations under various parametric model assumptions, and thus have led to a broad and deep approaches in statistics and data analysis.