August  2020, 19(8): 4179-4189. doi: 10.3934/cpaa.2020187

A numerical method to compute Fisher information for a special case of heterogeneous negative binomial regression

1. 

Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China

2. 

Department of Sociology, The University of British Columbia, V6T 1Z1, Vancouver, BC, Canada

3. 

Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China

4. 

Department of Sociology and Social Science Research Institute, Duke University, 27708, Durham, NC, USA

* Corresponding author

Received  September 2019 Revised  November 2019 Published  May 2020

Fund Project: The first author is partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. PolyU 25301115)

Negative binomial regression has been widely applied in various research settings to account for counts with overdispersion. Yet, when the gamma scale parameter, $ \nu $, is parameterized, there is no direct algorithmic solution to the Fisher Information matrix of the associated heterogeneous negative binomial regression, which seriously limits its applications to a wide range of complex problems. In this research, we propose a numerical method to calculate the Fisher information of heterogeneous negative binomial regression and accordingly develop a preliminary framework for analyzing incomplete counts with overdispersion. This method is implemented in R and illustrated using an empirical example of teenage drug use in America.

Citation: Xin Guo, Qiang Fu, Yue Wang, Kenneth C. Land. A numerical method to compute Fisher information for a special case of heterogeneous negative binomial regression. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4179-4189. doi: 10.3934/cpaa.2020187
References:
[1]

P. D. Allison and R. P. Waterman, Fixed–effects negative binomial regression models, Sociol. Methodol., 32 (2002), 247-265.  doi: 10.1111/1467-9531.00117.  Google Scholar

[2]

B. M. BolkerM. E. BrooksC. J. ClarkS. W. GeangeJ. R. PoulsenM. H. H. Stevens and J. S. S. White, Generalized linear mixed models: a practical guide for ecology and evolution, Trends Ecol. Evol., 24 (2009), 127-135.  doi: 10.1016/j.tree.2008.10.008.  Google Scholar

[3] A. C. Cameron and P. K. Trivedi, Regression analysis of count data, vol. 53, Cambridge University Press, 2013.  doi: 10.1017/CBO9781139013567.  Google Scholar
[4]

A. C. Cameron and F. A. Windmeijer, R-squared measures for count data regression models with applications to health-care utilization, J. Busin. Econ. Statist., 14 (1996), 209-220.   Google Scholar

[5]

B. Efron and D. V. Hinkley, Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information, Biometrika, 65 (1978), 457-487.  doi: 10.1093/biomet/65.3.457.  Google Scholar

[6]

S. Ehsan SaffariR. Adnan and W. Greene, Hurdle negative binomial regression model with right censored count data, SORT Statist. Oper. Res. Trans., 36 (2012), 0181-194.   Google Scholar

[7]

K. V. Finn, Patterns of alcohol and marijuana use at school, J. Res. Adol., 16 (2006), 69-77.   Google Scholar

[8]

R. A. Fisher, The negative binomial distribution, Ann. Eugen., 11 (1941), 182-187.   Google Scholar

[9]

Q. FuX. Guo and K. C. Land, A Poisson-multinomial mixture approach to grouped and right-censored counts, Commun. Statist. Theory Meth., 47 (2018), 427-447.  doi: 10.1080/03610926.2017.1303736.  Google Scholar

[10]

Q. Fu, X. Guo and K. C. Land, Optimizing count responses in surveys: A machine-learning approach, Sociol. Meth. Res., (2018). doi: 10.1177/0049124117747302.  Google Scholar

[11]

Q. FuK. C. Land and V. L. Lamb, Bullying victimization, socioeconomic status and behavioral characteristics of 12th graders in the united states, 1989 to 2009: Repetitive trends and persistent risk differentials, Child Indi. Res., 6 (2013), 1-21.  doi: 10.1007/s12187-012-9152-8.  Google Scholar

[12]

Q. FuK. C. Land and V. L. Lamb, Violent physical bullying victimization at school: has there been a recent increase in exposure or intensity? an age-period-cohort analysis in the united states, 1991 to 2012, Child Indi. Res., 9 (2016), 485-513.   Google Scholar

[13]

Q. FuC. WuH. LiuZ. Shi and J. Gu, Live like mosquitoes: Hukou, rural–urban disparity, and depression, Chin. J. Sociol., 4 (2018), 56-78.   Google Scholar

[14]

W. H. Greene, Accounting for excess zeros and sample selection in Poisson and negative binomial regression models, in NYU working paper no. EC-94-10. Google Scholar

[15]

R. M. Groves, F. J. Fowler Jr, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology, vol. 561, John Wiley & Sons, 2011.  Google Scholar

[16] J. M. Hilbe, Negative Binomial Regression, 2$^nd$ edition, Cambridge University Press, Cambridge, 2011.  doi: 10.1017/CBO9780511973420.  Google Scholar
[17] R. A. Horn and C. R. Johnson, Matrix analysis, 2$^nd$ edition, Cambridge University Press, Cambridge, 2013.   Google Scholar
[18]

L. D. JohnstonP. M. O'Malley and J. G. Bachman, Bachman, Monitoring the Future: National results on adolescent drug use: Overview of key findings, Focus, 1 (2003), 213-234.   Google Scholar

[19]

L. D. Johnston, P. M. O'Malley, R. A. Miech, J. G. Bachman and J. E. Schulenberg, Monitoring the future national survey results on drug use, 1975–2016: Overview, key findings on adolescent drug use, 2017. Available from: https://files.eric.ed.gov/fulltext/ED578534.pdf. Google Scholar

[20]

L. D. Johnston, P. M. O'Malley, R. A. Miech, J. G. Bachman and J. E. Schulenberg, Monitoring the Future national survey results on drug use, 1975-2016: Overview, key findings on adolescent drug use, Inst. Social Res.. Google Scholar

[21]

F. Kunstner, L. Balles and P. Hennig, Limitations of the empirical Fisher approximation, preprint, arXiv: 1905.12558. Google Scholar

[22]

K. C. LandP. L. McCall and D. S. Nagin, A comparison of Poisson, negative binomial, and semiparametric mixed Poisson regression models: With empirical applications to criminal careers data, Sociol. Meth. Res., 24 (1996), 387-442.   Google Scholar

[23]

E. L. Lehmann and G. Casella, Theory of Point Estimation, 2$^{nd}$ edition, Springer Texts in Statistics, Springer-Verlag, New York, 1998.  Google Scholar

[24]

L. R. PacekR. J. Malcolm and S. S. Martins, Race/ethnicity differences between alcohol, marijuana, and co-occurring alcohol and marijuana use disorders and their association with public health and social problems using a national sample, Amer. Addi., 21 (2012), 435-444.   Google Scholar

[25]

W. W. Piegorsch, Maximum likelihood estimation for the negative binomial dispersion parameter, Biometrics, 46 (1990), 863-867.  doi: 10.2307/2532104.  Google Scholar

show all references

References:
[1]

P. D. Allison and R. P. Waterman, Fixed–effects negative binomial regression models, Sociol. Methodol., 32 (2002), 247-265.  doi: 10.1111/1467-9531.00117.  Google Scholar

[2]

B. M. BolkerM. E. BrooksC. J. ClarkS. W. GeangeJ. R. PoulsenM. H. H. Stevens and J. S. S. White, Generalized linear mixed models: a practical guide for ecology and evolution, Trends Ecol. Evol., 24 (2009), 127-135.  doi: 10.1016/j.tree.2008.10.008.  Google Scholar

[3] A. C. Cameron and P. K. Trivedi, Regression analysis of count data, vol. 53, Cambridge University Press, 2013.  doi: 10.1017/CBO9781139013567.  Google Scholar
[4]

A. C. Cameron and F. A. Windmeijer, R-squared measures for count data regression models with applications to health-care utilization, J. Busin. Econ. Statist., 14 (1996), 209-220.   Google Scholar

[5]

B. Efron and D. V. Hinkley, Assessing the accuracy of the maximum likelihood estimator: observed versus expected Fisher information, Biometrika, 65 (1978), 457-487.  doi: 10.1093/biomet/65.3.457.  Google Scholar

[6]

S. Ehsan SaffariR. Adnan and W. Greene, Hurdle negative binomial regression model with right censored count data, SORT Statist. Oper. Res. Trans., 36 (2012), 0181-194.   Google Scholar

[7]

K. V. Finn, Patterns of alcohol and marijuana use at school, J. Res. Adol., 16 (2006), 69-77.   Google Scholar

[8]

R. A. Fisher, The negative binomial distribution, Ann. Eugen., 11 (1941), 182-187.   Google Scholar

[9]

Q. FuX. Guo and K. C. Land, A Poisson-multinomial mixture approach to grouped and right-censored counts, Commun. Statist. Theory Meth., 47 (2018), 427-447.  doi: 10.1080/03610926.2017.1303736.  Google Scholar

[10]

Q. Fu, X. Guo and K. C. Land, Optimizing count responses in surveys: A machine-learning approach, Sociol. Meth. Res., (2018). doi: 10.1177/0049124117747302.  Google Scholar

[11]

Q. FuK. C. Land and V. L. Lamb, Bullying victimization, socioeconomic status and behavioral characteristics of 12th graders in the united states, 1989 to 2009: Repetitive trends and persistent risk differentials, Child Indi. Res., 6 (2013), 1-21.  doi: 10.1007/s12187-012-9152-8.  Google Scholar

[12]

Q. FuK. C. Land and V. L. Lamb, Violent physical bullying victimization at school: has there been a recent increase in exposure or intensity? an age-period-cohort analysis in the united states, 1991 to 2012, Child Indi. Res., 9 (2016), 485-513.   Google Scholar

[13]

Q. FuC. WuH. LiuZ. Shi and J. Gu, Live like mosquitoes: Hukou, rural–urban disparity, and depression, Chin. J. Sociol., 4 (2018), 56-78.   Google Scholar

[14]

W. H. Greene, Accounting for excess zeros and sample selection in Poisson and negative binomial regression models, in NYU working paper no. EC-94-10. Google Scholar

[15]

R. M. Groves, F. J. Fowler Jr, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology, vol. 561, John Wiley & Sons, 2011.  Google Scholar

[16] J. M. Hilbe, Negative Binomial Regression, 2$^nd$ edition, Cambridge University Press, Cambridge, 2011.  doi: 10.1017/CBO9780511973420.  Google Scholar
[17] R. A. Horn and C. R. Johnson, Matrix analysis, 2$^nd$ edition, Cambridge University Press, Cambridge, 2013.   Google Scholar
[18]

L. D. JohnstonP. M. O'Malley and J. G. Bachman, Bachman, Monitoring the Future: National results on adolescent drug use: Overview of key findings, Focus, 1 (2003), 213-234.   Google Scholar

[19]

L. D. Johnston, P. M. O'Malley, R. A. Miech, J. G. Bachman and J. E. Schulenberg, Monitoring the future national survey results on drug use, 1975–2016: Overview, key findings on adolescent drug use, 2017. Available from: https://files.eric.ed.gov/fulltext/ED578534.pdf. Google Scholar

[20]

L. D. Johnston, P. M. O'Malley, R. A. Miech, J. G. Bachman and J. E. Schulenberg, Monitoring the Future national survey results on drug use, 1975-2016: Overview, key findings on adolescent drug use, Inst. Social Res.. Google Scholar

[21]

F. Kunstner, L. Balles and P. Hennig, Limitations of the empirical Fisher approximation, preprint, arXiv: 1905.12558. Google Scholar

[22]

K. C. LandP. L. McCall and D. S. Nagin, A comparison of Poisson, negative binomial, and semiparametric mixed Poisson regression models: With empirical applications to criminal careers data, Sociol. Meth. Res., 24 (1996), 387-442.   Google Scholar

[23]

E. L. Lehmann and G. Casella, Theory of Point Estimation, 2$^{nd}$ edition, Springer Texts in Statistics, Springer-Verlag, New York, 1998.  Google Scholar

[24]

L. R. PacekR. J. Malcolm and S. S. Martins, Race/ethnicity differences between alcohol, marijuana, and co-occurring alcohol and marijuana use disorders and their association with public health and social problems using a national sample, Amer. Addi., 21 (2012), 435-444.   Google Scholar

[25]

W. W. Piegorsch, Maximum likelihood estimation for the negative binomial dispersion parameter, Biometrics, 46 (1990), 863-867.  doi: 10.2307/2532104.  Google Scholar

Figure 1.  Time complexity m for achieving relative errors
Table 1.  Heterogeneous negative-binomial regression analysis of lifetime marijuana use among American youth (Number of observations = 8,874). Data source: the 2012 wave of the Monitoring the Future study
Coefficient Coefficient Z value 95% confidence interval
Covariates for estimating µ
Intercept 0.677*** 0.183 3.696 [0.318, 1.036]
10th graders 1.551*** 0.153 10.145 [1.251, 1.850]
12th graders 2.002*** 0.168 11.927 [1.673, 2.331]
Male 1.268*** 0.125 10.143 [1.023, 1.513]
African American -0.796*** 0.149 -5.361 [-1.087, -0.505]
Metropolitan areas 0.148 0.150 0.983 [-0.147, 0.442]
Covariates for estimating ν
Intercept -3.627*** 0.082 -44.331 [-3.787, -3.466]
10th graders 0.972*** 0.068 14.374 [0.839, 1.104]
12th graders 1.332*** 0.074 18.018 [1.188, 1.477]
Male -0.006 0.051 -0.107 [-0.106, 0.095]
African American 0.268*** 0.077 3.480 [0.117, 0.418]
Metropolitan areas 0.117 . 0.063 1.844 [-0.007, 0.240]
Goodness of fit
AIC 18400 BIC 18480
McFadden’s R2 0.04828 McFadden’s adjusted R2 0.04703
Note: ***p<0.001 ** p<0.01 * p<0.05 . P<0.1
Coefficient Coefficient Z value 95% confidence interval
Covariates for estimating µ
Intercept 0.677*** 0.183 3.696 [0.318, 1.036]
10th graders 1.551*** 0.153 10.145 [1.251, 1.850]
12th graders 2.002*** 0.168 11.927 [1.673, 2.331]
Male 1.268*** 0.125 10.143 [1.023, 1.513]
African American -0.796*** 0.149 -5.361 [-1.087, -0.505]
Metropolitan areas 0.148 0.150 0.983 [-0.147, 0.442]
Covariates for estimating ν
Intercept -3.627*** 0.082 -44.331 [-3.787, -3.466]
10th graders 0.972*** 0.068 14.374 [0.839, 1.104]
12th graders 1.332*** 0.074 18.018 [1.188, 1.477]
Male -0.006 0.051 -0.107 [-0.106, 0.095]
African American 0.268*** 0.077 3.480 [0.117, 0.418]
Metropolitan areas 0.117 . 0.063 1.844 [-0.007, 0.240]
Goodness of fit
AIC 18400 BIC 18480
McFadden’s R2 0.04828 McFadden’s adjusted R2 0.04703
Note: ***p<0.001 ** p<0.01 * p<0.05 . P<0.1
[1]

Nicolas Rougerie. On two properties of the Fisher information. Kinetic & Related Models, , () : -. doi: 10.3934/krm.2020049

[2]

Shengxin Zhu, Tongxiang Gu, Xingping Liu. AIMS: Average information matrix splitting. Mathematical Foundations of Computing, 2020, 3 (4) : 301-308. doi: 10.3934/mfc.2020012

[3]

Jiaquan Liu, Xiangqing Liu, Zhi-Qiang Wang. Sign-changing solutions for a parameter-dependent quasilinear equation. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020454

[4]

Yuxin Zhang. The spatially heterogeneous diffusive rabies model and its shadow system. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020357

[5]

Zhenzhen Wang, Tianshou Zhou. Asymptotic behaviors and stochastic traveling waves in stochastic Fisher-KPP equations. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020323

[6]

Christopher S. Goodrich, Benjamin Lyons, Mihaela T. Velcsov. Analytical and numerical monotonicity results for discrete fractional sequential differences with negative lower bound. Communications on Pure & Applied Analysis, 2021, 20 (1) : 339-358. doi: 10.3934/cpaa.2020269

[7]

Yueyang Zheng, Jingtao Shi. A stackelberg game of backward stochastic differential equations with partial information. Mathematical Control & Related Fields, 2020  doi: 10.3934/mcrf.2020047

[8]

Min Chen, Olivier Goubet, Shenghao Li. Mathematical analysis of bump to bucket problem. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5567-5580. doi: 10.3934/cpaa.2020251

[9]

Qianqian Han, Xiao-Song Yang. Qualitative analysis of a generalized Nosé-Hoover oscillator. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020346

[10]

Laurence Cherfils, Stefania Gatti, Alain Miranville, Rémy Guillevin. Analysis of a model for tumor growth and lactate exchanges in a glioma. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020457

[11]

Vieri Benci, Sunra Mosconi, Marco Squassina. Preface: Applications of mathematical analysis to problems in theoretical physics. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020446

[12]

Yining Cao, Chuck Jia, Roger Temam, Joseph Tribbia. Mathematical analysis of a cloud resolving model including the ice microphysics. Discrete & Continuous Dynamical Systems - A, 2021, 41 (1) : 131-167. doi: 10.3934/dcds.2020219

[13]

Xin Guo, Lei Shi. Preface of the special issue on analysis in data science: Methods and applications. Mathematical Foundations of Computing, 2020, 3 (4) : i-ii. doi: 10.3934/mfc.2020026

[14]

Martin Kalousek, Joshua Kortum, Anja Schlömerkemper. Mathematical analysis of weak and strong solutions to an evolutionary model for magnetoviscoelasticity. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 17-39. doi: 10.3934/dcdss.2020331

[15]

Feifei Cheng, Ji Li. Geometric singular perturbation analysis of Degasperis-Procesi equation with distributed delay. Discrete & Continuous Dynamical Systems - A, 2021, 41 (2) : 967-985. doi: 10.3934/dcds.2020305

[16]

Jianquan Li, Xin Xie, Dian Zhang, Jia Li, Xiaolin Lin. Qualitative analysis of a simple tumor-immune system with time delay of tumor action. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020341

[17]

A. M. Elaiw, N. H. AlShamrani, A. Abdel-Aty, H. Dutta. Stability analysis of a general HIV dynamics model with multi-stages of infected cells and two routes of infection. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020441

2019 Impact Factor: 1.105

Metrics

  • PDF downloads (80)
  • HTML views (60)
  • Cited by (0)

Other articles
by authors

[Back to Top]