August  2020, 19(8): 4179-4189. doi: 10.3934/cpaa.2020187

A numerical method to compute Fisher information for a special case of heterogeneous negative binomial regression

1. 

Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China

2. 

Department of Sociology, The University of British Columbia, V6T 1Z1, Vancouver, BC, Canada

3. 

Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China

4. 

Department of Sociology and Social Science Research Institute, Duke University, 27708, Durham, NC, USA

* Corresponding author

Received  September 2019 Revised  November 2019 Published  May 2020

Fund Project: The first author is partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. PolyU 25301115)

Negative binomial regression has been widely applied in various research settings to account for counts with overdispersion. Yet, when the gamma scale parameter, $ \nu $, is parameterized, there is no direct algorithmic solution to the Fisher Information matrix of the associated heterogeneous negative binomial regression, which seriously limits its applications to a wide range of complex problems. In this research, we propose a numerical method to calculate the Fisher information of heterogeneous negative binomial regression and accordingly develop a preliminary framework for analyzing incomplete counts with overdispersion. This method is implemented in R and illustrated using an empirical example of teenage drug use in America.

Citation: Xin Guo, Qiang Fu, Yue Wang, Kenneth C. Land. A numerical method to compute Fisher information for a special case of heterogeneous negative binomial regression. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4179-4189. doi: 10.3934/cpaa.2020187
References:
[1]

P. D. Allison and R. P. Waterman, Fixed–effects negative binomial regression models, Sociol. Methodol., 32 (2002), 247-265.  doi: 10.1111/1467-9531.00117.  Google Scholar

[2]

B. M. BolkerM. E. BrooksC. J. ClarkS. W. GeangeJ. R. PoulsenM. H. H. Stevens and J. S. S. White, Generalized linear mixed models: a practical guide for ecology and evolution, Trends Ecol. Evol., 24 (2009), 127-135.  doi: 10.1016/j.tree.2008.10.008.  Google Scholar

[3] A. C. Cameron and P. K. Trivedi, Regression analysis of count data, vol. 53, Cambridge University Press, 2013.  doi: 10.1017/CBO9781139013567.  Google Scholar
[4]

A. C. Cameron and F. A. Windmeijer, R-squared measures for count data regression models with applications to health-care utilization, J. Busin. Econ. Statist., 14 (1996), 209-220.   Google Scholar

[5]

B. Efron and D. V. Hinkley, Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information, Biometrika, 65 (1978), 457-487.  doi: 10.1093/biomet/65.3.457.  Google Scholar

[6]

S. Ehsan SaffariR. Adnan and W. Greene, Hurdle negative binomial regression model with right censored count data, SORT Statist. Oper. Res. Trans., 36 (2012), 0181-194.   Google Scholar

[7]

K. V. Finn, Patterns of alcohol and marijuana use at school, J. Res. Adol., 16 (2006), 69-77.   Google Scholar

[8]

R. A. Fisher, The negative binomial distribution, Ann. Eugen., 11 (1941), 182-187.   Google Scholar

[9]

Q. FuX. Guo and K. C. Land, A Poisson-multinomial mixture approach to grouped and right-censored counts, Commun. Statist. Theory Meth., 47 (2018), 427-447.  doi: 10.1080/03610926.2017.1303736.  Google Scholar

[10]

Q. Fu, X. Guo and K. C. Land, Optimizing count responses in surveys: A machine-learning approach, Sociol. Meth. Res., (2018). doi: 10.1177/0049124117747302.  Google Scholar

[11]

Q. FuK. C. Land and V. L. Lamb, Bullying victimization, socioeconomic status and behavioral characteristics of 12th graders in the united states, 1989 to 2009: Repetitive trends and persistent risk differentials, Child Indi. Res., 6 (2013), 1-21.  doi: 10.1007/s12187-012-9152-8.  Google Scholar

[12]

Q. FuK. C. Land and V. L. Lamb, Violent physical bullying victimization at school: has there been a recent increase in exposure or intensity? an age-period-cohort analysis in the united states, 1991 to 2012, Child Indi. Res., 9 (2016), 485-513.   Google Scholar

[13]

Q. FuC. WuH. LiuZ. Shi and J. Gu, Live like mosquitoes: Hukou, rural–urban disparity, and depression, Chin. J. Sociol., 4 (2018), 56-78.   Google Scholar

[14]

W. H. Greene, Accounting for excess zeros and sample selection in Poisson and negative binomial regression models, in NYU working paper no. EC-94-10. Google Scholar

[15]

R. M. Groves, F. J. Fowler Jr, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology, vol. 561, John Wiley & Sons, 2011.  Google Scholar

[16] J. M. Hilbe, Negative Binomial Regression, 2$^nd$ edition, Cambridge University Press, Cambridge, 2011.  doi: 10.1017/CBO9780511973420.  Google Scholar
[17] R. A. Horn and C. R. Johnson, Matrix analysis, 2$^nd$ edition, Cambridge University Press, Cambridge, 2013.   Google Scholar
[18]

L. D. JohnstonP. M. O'Malley and J. G. Bachman, Bachman, Monitoring the Future: National results on adolescent drug use: Overview of key findings, Focus, 1 (2003), 213-234.   Google Scholar

[19]

L. D. Johnston, P. M. O'Malley, R. A. Miech, J. G. Bachman and J. E. Schulenberg, Monitoring the future national survey results on drug use, 1975–2016: Overview, key findings on adolescent drug use, 2017. Available from: https://files.eric.ed.gov/fulltext/ED578534.pdf. Google Scholar

[20]

L. D. Johnston, P. M. O'Malley, R. A. Miech, J. G. Bachman and J. E. Schulenberg, Monitoring the Future national survey results on drug use, 1975-2016: Overview, key findings on adolescent drug use, Inst. Social Res.. Google Scholar

[21]

F. Kunstner, L. Balles and P. Hennig, Limitations of the empirical Fisher approximation, preprint, arXiv: 1905.12558. Google Scholar

[22]

K. C. LandP. L. McCall and D. S. Nagin, A comparison of Poisson, negative binomial, and semiparametric mixed Poisson regression models: With empirical applications to criminal careers data, Sociol. Meth. Res., 24 (1996), 387-442.   Google Scholar

[23]

E. L. Lehmann and G. Casella, Theory of Point Estimation, 2$^{nd}$ edition, Springer Texts in Statistics, Springer-Verlag, New York, 1998.  Google Scholar

[24]

L. R. PacekR. J. Malcolm and S. S. Martins, Race/ethnicity differences between alcohol, marijuana, and co-occurring alcohol and marijuana use disorders and their association with public health and social problems using a national sample, Amer. Addi., 21 (2012), 435-444.   Google Scholar

[25]

W. W. Piegorsch, Maximum likelihood estimation for the negative binomial dispersion parameter, Biometrics, 46 (1990), 863-867.  doi: 10.2307/2532104.  Google Scholar

show all references

References:
[1]

P. D. Allison and R. P. Waterman, Fixed–effects negative binomial regression models, Sociol. Methodol., 32 (2002), 247-265.  doi: 10.1111/1467-9531.00117.  Google Scholar

[2]

B. M. BolkerM. E. BrooksC. J. ClarkS. W. GeangeJ. R. PoulsenM. H. H. Stevens and J. S. S. White, Generalized linear mixed models: a practical guide for ecology and evolution, Trends Ecol. Evol., 24 (2009), 127-135.  doi: 10.1016/j.tree.2008.10.008.  Google Scholar

[3] A. C. Cameron and P. K. Trivedi, Regression analysis of count data, vol. 53, Cambridge University Press, 2013.  doi: 10.1017/CBO9781139013567.  Google Scholar
[4]

A. C. Cameron and F. A. Windmeijer, R-squared measures for count data regression models with applications to health-care utilization, J. Busin. Econ. Statist., 14 (1996), 209-220.   Google Scholar

[5]

B. Efron and D. V. Hinkley, Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information, Biometrika, 65 (1978), 457-487.  doi: 10.1093/biomet/65.3.457.  Google Scholar

[6]

S. Ehsan SaffariR. Adnan and W. Greene, Hurdle negative binomial regression model with right censored count data, SORT Statist. Oper. Res. Trans., 36 (2012), 0181-194.   Google Scholar

[7]

K. V. Finn, Patterns of alcohol and marijuana use at school, J. Res. Adol., 16 (2006), 69-77.   Google Scholar

[8]

R. A. Fisher, The negative binomial distribution, Ann. Eugen., 11 (1941), 182-187.   Google Scholar

[9]

Q. FuX. Guo and K. C. Land, A Poisson-multinomial mixture approach to grouped and right-censored counts, Commun. Statist. Theory Meth., 47 (2018), 427-447.  doi: 10.1080/03610926.2017.1303736.  Google Scholar

[10]

Q. Fu, X. Guo and K. C. Land, Optimizing count responses in surveys: A machine-learning approach, Sociol. Meth. Res., (2018). doi: 10.1177/0049124117747302.  Google Scholar

[11]

Q. FuK. C. Land and V. L. Lamb, Bullying victimization, socioeconomic status and behavioral characteristics of 12th graders in the united states, 1989 to 2009: Repetitive trends and persistent risk differentials, Child Indi. Res., 6 (2013), 1-21.  doi: 10.1007/s12187-012-9152-8.  Google Scholar

[12]

Q. FuK. C. Land and V. L. Lamb, Violent physical bullying victimization at school: has there been a recent increase in exposure or intensity? an age-period-cohort analysis in the united states, 1991 to 2012, Child Indi. Res., 9 (2016), 485-513.   Google Scholar

[13]

Q. FuC. WuH. LiuZ. Shi and J. Gu, Live like mosquitoes: Hukou, rural–urban disparity, and depression, Chin. J. Sociol., 4 (2018), 56-78.   Google Scholar

[14]

W. H. Greene, Accounting for excess zeros and sample selection in Poisson and negative binomial regression models, in NYU working paper no. EC-94-10. Google Scholar

[15]

R. M. Groves, F. J. Fowler Jr, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology, vol. 561, John Wiley & Sons, 2011.  Google Scholar

[16] J. M. Hilbe, Negative Binomial Regression, 2$^nd$ edition, Cambridge University Press, Cambridge, 2011.  doi: 10.1017/CBO9780511973420.  Google Scholar
[17] R. A. Horn and C. R. Johnson, Matrix analysis, 2$^nd$ edition, Cambridge University Press, Cambridge, 2013.   Google Scholar
[18]

L. D. JohnstonP. M. O'Malley and J. G. Bachman, Bachman, Monitoring the Future: National results on adolescent drug use: Overview of key findings, Focus, 1 (2003), 213-234.   Google Scholar

[19]

L. D. Johnston, P. M. O'Malley, R. A. Miech, J. G. Bachman and J. E. Schulenberg, Monitoring the future national survey results on drug use, 1975–2016: Overview, key findings on adolescent drug use, 2017. Available from: https://files.eric.ed.gov/fulltext/ED578534.pdf. Google Scholar

[20]

L. D. Johnston, P. M. O'Malley, R. A. Miech, J. G. Bachman and J. E. Schulenberg, Monitoring the Future national survey results on drug use, 1975-2016: Overview, key findings on adolescent drug use, Inst. Social Res.. Google Scholar

[21]

F. Kunstner, L. Balles and P. Hennig, Limitations of the empirical Fisher approximation, preprint, arXiv: 1905.12558. Google Scholar

[22]

K. C. LandP. L. McCall and D. S. Nagin, A comparison of Poisson, negative binomial, and semiparametric mixed Poisson regression models: With empirical applications to criminal careers data, Sociol. Meth. Res., 24 (1996), 387-442.   Google Scholar

[23]

E. L. Lehmann and G. Casella, Theory of Point Estimation, 2$^{nd}$ edition, Springer Texts in Statistics, Springer-Verlag, New York, 1998.  Google Scholar

[24]

L. R. PacekR. J. Malcolm and S. S. Martins, Race/ethnicity differences between alcohol, marijuana, and co-occurring alcohol and marijuana use disorders and their association with public health and social problems using a national sample, Amer. Addi., 21 (2012), 435-444.   Google Scholar

[25]

W. W. Piegorsch, Maximum likelihood estimation for the negative binomial dispersion parameter, Biometrics, 46 (1990), 863-867.  doi: 10.2307/2532104.  Google Scholar

Figure 1.  Time complexity m for achieving relative errors
Table 1.  Heterogeneous negative-binomial regression analysis of lifetime marijuana use among American youth (Number of observations = 8,874). Data source: the 2012 wave of the Monitoring the Future study
Coefficient Coefficient Z value 95% confidence interval
Covariates for estimating µ
Intercept 0.677*** 0.183 3.696 [0.318, 1.036]
10th graders 1.551*** 0.153 10.145 [1.251, 1.850]
12th graders 2.002*** 0.168 11.927 [1.673, 2.331]
Male 1.268*** 0.125 10.143 [1.023, 1.513]
African American -0.796*** 0.149 -5.361 [-1.087, -0.505]
Metropolitan areas 0.148 0.150 0.983 [-0.147, 0.442]
Covariates for estimating ν
Intercept -3.627*** 0.082 -44.331 [-3.787, -3.466]
10th graders 0.972*** 0.068 14.374 [0.839, 1.104]
12th graders 1.332*** 0.074 18.018 [1.188, 1.477]
Male -0.006 0.051 -0.107 [-0.106, 0.095]
African American 0.268*** 0.077 3.480 [0.117, 0.418]
Metropolitan areas 0.117 . 0.063 1.844 [-0.007, 0.240]
Goodness of fit
AIC 18400 BIC 18480
McFadden’s R2 0.04828 McFadden’s adjusted R2 0.04703
Note: ***p<0.001 ** p<0.01 * p<0.05 . P<0.1
Coefficient Coefficient Z value 95% confidence interval
Covariates for estimating µ
Intercept 0.677*** 0.183 3.696 [0.318, 1.036]
10th graders 1.551*** 0.153 10.145 [1.251, 1.850]
12th graders 2.002*** 0.168 11.927 [1.673, 2.331]
Male 1.268*** 0.125 10.143 [1.023, 1.513]
African American -0.796*** 0.149 -5.361 [-1.087, -0.505]
Metropolitan areas 0.148 0.150 0.983 [-0.147, 0.442]
Covariates for estimating ν
Intercept -3.627*** 0.082 -44.331 [-3.787, -3.466]
10th graders 0.972*** 0.068 14.374 [0.839, 1.104]
12th graders 1.332*** 0.074 18.018 [1.188, 1.477]
Male -0.006 0.051 -0.107 [-0.106, 0.095]
African American 0.268*** 0.077 3.480 [0.117, 0.418]
Metropolitan areas 0.117 . 0.063 1.844 [-0.007, 0.240]
Goodness of fit
AIC 18400 BIC 18480
McFadden’s R2 0.04828 McFadden’s adjusted R2 0.04703
Note: ***p<0.001 ** p<0.01 * p<0.05 . P<0.1
[1]

Wei Li, Yun Teng. Enterprise inefficient investment behavior analysis based on regression analysis. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1015-1025. doi: 10.3934/dcdss.2019069

[2]

Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031

[3]

Bingzheng Li, Zhengzhan Dai. Error analysis on regularized regression based on the Maximum correntropy criterion. Mathematical Foundations of Computing, 2020, 3 (1) : 25-40. doi: 10.3934/mfc.2020003

[4]

Wai-Ki Ching, Jia-Wen Gu, Harry Zheng. On correlated defaults and incomplete information. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020003

[5]

Shaoyong Lai, Qichang Xie. A selection problem for a constrained linear regression model. Journal of Industrial & Management Optimization, 2008, 4 (4) : 757-766. doi: 10.3934/jimo.2008.4.757

[6]

Adil Bagirov, Sona Taheri, Soodabeh Asadi. A difference of convex optimization algorithm for piecewise linear regression. Journal of Industrial & Management Optimization, 2019, 15 (2) : 909-932. doi: 10.3934/jimo.2018077

[7]

Shuhua Wang, Zhenlong Chen, Baohuai Sheng. Convergence of online pairwise regression learning with quadratic loss. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4023-4054. doi: 10.3934/cpaa.2020178

[8]

Song Wang, Quanxi Shao, Xian Zhou. Knot-optimizing spline networks (KOSNETS) for nonparametric regression. Journal of Industrial & Management Optimization, 2008, 4 (1) : 33-52. doi: 10.3934/jimo.2008.4.33

[9]

Erik Kropat, Gerhard Wilhelm Weber. Fuzzy target-environment networks and fuzzy-regression approaches. Numerical Algebra, Control & Optimization, 2018, 8 (2) : 135-155. doi: 10.3934/naco.2018008

[10]

Baohuai Sheng, Huanxiang Liu, Huimin Wang. Learning rates for the kernel regularized regression with a differentiable strongly convex loss. Communications on Pure & Applied Analysis, 2020, 19 (8) : 3973-4005. doi: 10.3934/cpaa.2020176

[11]

Yang Mi, Kang Zheng, Song Wang. Homography estimation along short videos by recurrent convolutional regression network. Mathematical Foundations of Computing, 2020, 3 (2) : 125-140. doi: 10.3934/mfc.2020014

[12]

Feimin Zhong, Jinxing Xie, Jing Jiao. Solutions for bargaining games with incomplete information: General type space and action space. Journal of Industrial & Management Optimization, 2018, 14 (3) : 953-966. doi: 10.3934/jimo.2017084

[13]

Miquel Oliu-Barton. Asymptotically optimal strategies in repeated games with incomplete information and vanishing weights. Journal of Dynamics & Games, 2019, 6 (4) : 259-275. doi: 10.3934/jdg.2019018

[14]

Yanqing Liu, Jiyuan Tao, Huan Zhang, Xianchao Xiu, Lingchen Kong. Fused LASSO penalized least absolute deviation estimator for high dimensional linear regression. Numerical Algebra, Control & Optimization, 2018, 8 (1) : 97-117. doi: 10.3934/naco.2018006

[15]

Lianjun Zhang, Lingchen Kong, Yan Li, Shenglong Zhou. A smoothing iterative method for quantile regression with nonconvex $ \ell_p $ penalty. Journal of Industrial & Management Optimization, 2017, 13 (1) : 93-112. doi: 10.3934/jimo.2016006

[16]

Andrew J. Majda, Yuan Yuan. Fundamental limitations of Ad hoc linear and quadratic multi-level regression models for physical systems. Discrete & Continuous Dynamical Systems - B, 2012, 17 (4) : 1333-1363. doi: 10.3934/dcdsb.2012.17.1333

[17]

Victor Meng Hwee Ong, David J. Nott, Taeryon Choi, Ajay Jasra. Flexible online multivariate regression with variational Bayes and the matrix-variate Dirichlet process. Foundations of Data Science, 2019, 1 (2) : 129-156. doi: 10.3934/fods.2019006

[18]

Yazhe Li, Tony Bellotti, Niall Adams. Issues using logistic regression with class imbalance, with a case study from credit risk modelling. Foundations of Data Science, 2019, 1 (4) : 389-417. doi: 10.3934/fods.2019016

[19]

Lucian Coroianu, Danilo Costarelli, Sorin G. Gal, Gianluca Vinti. Approximation by multivariate max-product Kantorovich-type operators and learning rates of least-squares regularized regression. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4213-4225. doi: 10.3934/cpaa.2020189

[20]

Wenxian Shen, Zhongwei Shen. Transition fronts in nonlocal Fisher-KPP equations in time heterogeneous media. Communications on Pure & Applied Analysis, 2016, 15 (4) : 1193-1213. doi: 10.3934/cpaa.2016.15.1193

2019 Impact Factor: 1.105

Metrics

  • PDF downloads (57)
  • HTML views (57)
  • Cited by (0)

Other articles
by authors

[Back to Top]