April  2016, 12(2): 515-527. doi: 10.3934/jimo.2016.12.515

A combinatorial optimization approach to the selection of statistical units

1. 

Università di Roma "Sapienza", Dip. di Ing. Informatica, Automatica e Gestionale (DIAG), Via Ariosto 25, Roma, 00185, Italy

2. 

Italian National Statistic Office “Istat", Dip. per i Censimenti e gli Archivi Amm. e Statistici (DICA), Viale Oceano Pacifico 171, Roma, 00144, Italy

3. 

Italian National Statistic Office “Istat”, Dip. per i Censimenti e gli Archivi Amm. e Statistici (DICA), Viale Oceano Pacifico 171, Roma, 00144, Italy

Received  April 2014 Revised  November 2014 Published  June 2015

In the case of some large statistical surveys, the set of units that will constitute the scope of the survey must be selected. We focus on the real case of a Census of Agriculture, where the units are farms. Surveying each unit has a cost and brings a different portion of the whole information. In this case, one wants to determine a subset of units producing the minimum total cost for being surveyed and representing at least a certain portion of the total information. Uncertainty aspects also occur, because the portion of information corresponding to each unit is not perfectly known before surveying it. The proposed approach is based on combinatorial optimization, and the arising decision problems are modeled as multidimensional binary knapsack problems. Experimental results show the effectiveness of the proposed approach.
Citation: Renato Bruni, Gianpiero Bianchi, Alessandra Reale. A combinatorial optimization approach to the selection of statistical units. Journal of Industrial & Management Optimization, 2016, 12 (2) : 515-527. doi: 10.3934/jimo.2016.12.515
References:
[1]

E. Balas, Facets of the knapsack polytope,, Mathematical Programming, 8 (1975), 146.  doi: 10.1007/BF01580440.  Google Scholar

[2]

R. M. Bell and M. L. Cohen, Coverage Measurement in the 2010 Census - Panel on Correlation Bias and Coverage Measurement in the 2010 Decennial Census,, The National Academic Press, (2008).   Google Scholar

[3]

G. Bianchi, R. Bruni and A. Reale, Information reconstruction via discrete optimization for agricultural census data,, Applied Mathematical Sciences, 6 (2012), 6241.   Google Scholar

[4]

G. Bianchi, R. Bruni and A. Reale, Balancing of agricultural census data by using discrete optimization,, Optimization Letters, 8 (2014), 1553.  doi: 10.1007/s11590-013-0652-3.  Google Scholar

[5]

G. Bianchi, R. Bruni and A. Reale, Open source integer linear programming solvers for error localization in numerical data,, In Advances in Theoretical and Applied Statistics (eds. N. Torelli, (2012).  doi: 10.1007/978-3-642-35588-2_28.  Google Scholar

[6]

E. Boros, A. Scozzari, F. Tardella and P. Veneziani, Polynomially computable bounds for the probability of the union of events,, Mathematics of Operations Research, 39 (2014), 1311.  doi: 10.1287/moor.2014.0657.  Google Scholar

[7]

R. Bruni, Discrete models for data imputation,, Discrete Applied Mathematics, 144 (2004), 59.  doi: 10.1016/j.dam.2004.04.004.  Google Scholar

[8]

R. Bruni, Error correction for massive data sets,, Optimization Methods and Software, 20 (2005), 295.  doi: 10.1080/10556780512331318281.  Google Scholar

[9]

European Parliament, Regulation of the European Parliament,, N. 1166/2008, (1166).   Google Scholar

[10]

Food and Agriculture Organization of the United Nations (FAO), A System of Integrated Agricultural Censuses and Surveys,, Vol.1 - World Programme for the Census of Agriculture 2010. FAO Statistical Development Series (2005)., (2005).   Google Scholar

[11]

M. Ferri and M. Piccioni, Optimal selection of statistical units: An approach via simulated annealing,, Computational Statistics & Data Analysis, 13 (1992), 47.  doi: 10.1016/0167-9473(92)90153-7.  Google Scholar

[12]

M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness,, W.H. Freeman, (1979).   Google Scholar

[13]

R. M. Groves, F. J. Jr. Fowler, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology,, Wiley Series in Survey Methodology, (2009).   Google Scholar

[14]

T. Hailperin, Best possible inequalities for the probability of a logical function of events,, The American Mathematical Monthly, 72 (1965), 343.  doi: 10.2307/2313491.  Google Scholar

[15]

P. L. Hammer, E. L. Johnson and U. N. Peled, Facets of regular 0-1 polytopes,, Mathematical Programming, 8 (1975), 179.  doi: 10.1007/BF01580442.  Google Scholar

[16]

W. K. Kremers, Completeness and unbiased estimation for sum-quota sampling,, Journal of the American Statistical Association, 81 (1986), 1070.   Google Scholar

[17]

H. Marchand, A. Martin, R. Weismantel and L. Wolsey, Cutting planes in integer and mixed integer programming,, Discrete Applied Mathematics, 123 (2002), 397.  doi: 10.1016/S0166-218X(01)00348-1.  Google Scholar

[18]

C. A. Moser, Quota sampling,, Journal of the Royal Statistical Society, 115 (1952), 411.  doi: 10.2307/2980740.  Google Scholar

[19]

A. Mucherino, P. Papajorgji and P. M. Pardalos, Data Mining in Agriculture,, Springer, (2009).  doi: 10.1007/978-0-387-88615-2.  Google Scholar

[20]

K. G. Murty, Linear Programming,, John Wiley & Sons Inc., (1983).   Google Scholar

[21]

G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization,, John Wiley, (1988).  doi: 10.1002/9781118627372.  Google Scholar

[22]

T. M. F. Smith, Populations and selection: Limitations of statistics,, Journal of the Royal Statistical Society Series A (Statistics in Society), 156 (1993), 144.  doi: 10.2307/2982726.  Google Scholar

[23]

S. Sudman, Probability sampling with quotas,, Journal of the American Statistical Association, 61 (1966), 749.  doi: 10.1080/01621459.1966.10480903.  Google Scholar

[24]

L. A. Wolsey, Faces for a linear inequality in 0-1 variables,, Mathematical Programming, 8 (1975), 165.  doi: 10.1007/BF01580441.  Google Scholar

[25]

Y. Zhang, F. Zhang and M. Cai, Some new results on multi-dimension Knapsack problem,, Journal of Industrial and Management Optimization, 1 (2005), 315.  doi: 10.3934/jimo.2005.1.315.  Google Scholar

show all references

References:
[1]

E. Balas, Facets of the knapsack polytope,, Mathematical Programming, 8 (1975), 146.  doi: 10.1007/BF01580440.  Google Scholar

[2]

R. M. Bell and M. L. Cohen, Coverage Measurement in the 2010 Census - Panel on Correlation Bias and Coverage Measurement in the 2010 Decennial Census,, The National Academic Press, (2008).   Google Scholar

[3]

G. Bianchi, R. Bruni and A. Reale, Information reconstruction via discrete optimization for agricultural census data,, Applied Mathematical Sciences, 6 (2012), 6241.   Google Scholar

[4]

G. Bianchi, R. Bruni and A. Reale, Balancing of agricultural census data by using discrete optimization,, Optimization Letters, 8 (2014), 1553.  doi: 10.1007/s11590-013-0652-3.  Google Scholar

[5]

G. Bianchi, R. Bruni and A. Reale, Open source integer linear programming solvers for error localization in numerical data,, In Advances in Theoretical and Applied Statistics (eds. N. Torelli, (2012).  doi: 10.1007/978-3-642-35588-2_28.  Google Scholar

[6]

E. Boros, A. Scozzari, F. Tardella and P. Veneziani, Polynomially computable bounds for the probability of the union of events,, Mathematics of Operations Research, 39 (2014), 1311.  doi: 10.1287/moor.2014.0657.  Google Scholar

[7]

R. Bruni, Discrete models for data imputation,, Discrete Applied Mathematics, 144 (2004), 59.  doi: 10.1016/j.dam.2004.04.004.  Google Scholar

[8]

R. Bruni, Error correction for massive data sets,, Optimization Methods and Software, 20 (2005), 295.  doi: 10.1080/10556780512331318281.  Google Scholar

[9]

European Parliament, Regulation of the European Parliament,, N. 1166/2008, (1166).   Google Scholar

[10]

Food and Agriculture Organization of the United Nations (FAO), A System of Integrated Agricultural Censuses and Surveys,, Vol.1 - World Programme for the Census of Agriculture 2010. FAO Statistical Development Series (2005)., (2005).   Google Scholar

[11]

M. Ferri and M. Piccioni, Optimal selection of statistical units: An approach via simulated annealing,, Computational Statistics & Data Analysis, 13 (1992), 47.  doi: 10.1016/0167-9473(92)90153-7.  Google Scholar

[12]

M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness,, W.H. Freeman, (1979).   Google Scholar

[13]

R. M. Groves, F. J. Jr. Fowler, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology,, Wiley Series in Survey Methodology, (2009).   Google Scholar

[14]

T. Hailperin, Best possible inequalities for the probability of a logical function of events,, The American Mathematical Monthly, 72 (1965), 343.  doi: 10.2307/2313491.  Google Scholar

[15]

P. L. Hammer, E. L. Johnson and U. N. Peled, Facets of regular 0-1 polytopes,, Mathematical Programming, 8 (1975), 179.  doi: 10.1007/BF01580442.  Google Scholar

[16]

W. K. Kremers, Completeness and unbiased estimation for sum-quota sampling,, Journal of the American Statistical Association, 81 (1986), 1070.   Google Scholar

[17]

H. Marchand, A. Martin, R. Weismantel and L. Wolsey, Cutting planes in integer and mixed integer programming,, Discrete Applied Mathematics, 123 (2002), 397.  doi: 10.1016/S0166-218X(01)00348-1.  Google Scholar

[18]

C. A. Moser, Quota sampling,, Journal of the Royal Statistical Society, 115 (1952), 411.  doi: 10.2307/2980740.  Google Scholar

[19]

A. Mucherino, P. Papajorgji and P. M. Pardalos, Data Mining in Agriculture,, Springer, (2009).  doi: 10.1007/978-0-387-88615-2.  Google Scholar

[20]

K. G. Murty, Linear Programming,, John Wiley & Sons Inc., (1983).   Google Scholar

[21]

G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization,, John Wiley, (1988).  doi: 10.1002/9781118627372.  Google Scholar

[22]

T. M. F. Smith, Populations and selection: Limitations of statistics,, Journal of the Royal Statistical Society Series A (Statistics in Society), 156 (1993), 144.  doi: 10.2307/2982726.  Google Scholar

[23]

S. Sudman, Probability sampling with quotas,, Journal of the American Statistical Association, 61 (1966), 749.  doi: 10.1080/01621459.1966.10480903.  Google Scholar

[24]

L. A. Wolsey, Faces for a linear inequality in 0-1 variables,, Mathematical Programming, 8 (1975), 165.  doi: 10.1007/BF01580441.  Google Scholar

[25]

Y. Zhang, F. Zhang and M. Cai, Some new results on multi-dimension Knapsack problem,, Journal of Industrial and Management Optimization, 1 (2005), 315.  doi: 10.3934/jimo.2005.1.315.  Google Scholar

[1]

Xin Guo, Lei Shi. Preface of the special issue on analysis in data science: Methods and applications. Mathematical Foundations of Computing, 2020, 3 (4) : i-ii. doi: 10.3934/mfc.2020026

[2]

Reza Lotfi, Zahra Yadegari, Seyed Hossein Hosseini, Amir Hossein Khameneh, Erfan Babaee Tirkolaee, Gerhard-Wilhelm Weber. A robust time-cost-quality-energy-environment trade-off with resource-constrained in project management: A case study for a bridge construction project. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020158

[3]

Javier Fernández, Cora Tori, Marcela Zuccalli. Lagrangian reduction of nonholonomic discrete mechanical systems by stages. Journal of Geometric Mechanics, 2020, 12 (4) : 607-639. doi: 10.3934/jgm.2020029

[4]

Min Xi, Wenyu Sun, Jun Chen. Survey of derivative-free optimization. Numerical Algebra, Control & Optimization, 2020, 10 (4) : 537-555. doi: 10.3934/naco.2020050

[5]

Marion Darbas, Jérémy Heleine, Stephanie Lohrengel. Numerical resolution by the quasi-reversibility method of a data completion problem for Maxwell's equations. Inverse Problems & Imaging, 2020, 14 (6) : 1107-1133. doi: 10.3934/ipi.2020056

[6]

Dan Zhu, Rosemary A. Renaut, Hongwei Li, Tianyou Liu. Fast non-convex low-rank matrix decomposition for separation of potential field data using minimal memory. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020076

[7]

Predrag S. Stanimirović, Branislav Ivanov, Haifeng Ma, Dijana Mosić. A survey of gradient methods for solving nonlinear optimization. Electronic Research Archive, 2020, 28 (4) : 1573-1624. doi: 10.3934/era.2020115

[8]

Wolfgang Riedl, Robert Baier, Matthias Gerdts. Optimization-based subdivision algorithm for reachable sets. Journal of Computational Dynamics, 2021, 8 (1) : 99-130. doi: 10.3934/jcd.2021005

[9]

Xinpeng Wang, Bingo Wing-Kuen Ling, Wei-Chao Kuang, Zhijing Yang. Orthogonal intrinsic mode functions via optimization approach. Journal of Industrial & Management Optimization, 2021, 17 (1) : 51-66. doi: 10.3934/jimo.2019098

[10]

Lars Grüne, Matthias A. Müller, Christopher M. Kellett, Steven R. Weller. Strict dissipativity for discrete time discounted optimal control problems. Mathematical Control & Related Fields, 2020  doi: 10.3934/mcrf.2020046

[11]

Cuicui Li, Lin Zhou, Zhidong Teng, Buyu Wen. The threshold dynamics of a discrete-time echinococcosis transmission model. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020339

[12]

Veena Goswami, Gopinath Panda. Optimal customer behavior in observable and unobservable discrete-time queues. Journal of Industrial & Management Optimization, 2021, 17 (1) : 299-316. doi: 10.3934/jimo.2019112

[13]

Alberto Bressan, Sondre Tesdal Galtung. A 2-dimensional shape optimization problem for tree branches. Networks & Heterogeneous Media, 2020  doi: 10.3934/nhm.2020031

[14]

Yi An, Bo Li, Lei Wang, Chao Zhang, Xiaoli Zhou. Calibration of a 3D laser rangefinder and a camera based on optimization solution. Journal of Industrial & Management Optimization, 2021, 17 (1) : 427-445. doi: 10.3934/jimo.2019119

[15]

Haodong Yu, Jie Sun. Robust stochastic optimization with convex risk measures: A discretized subgradient scheme. Journal of Industrial & Management Optimization, 2021, 17 (1) : 81-99. doi: 10.3934/jimo.2019100

[16]

Ripeng Huang, Shaojian Qu, Xiaoguang Yang, Zhimin Liu. Multi-stage distributionally robust optimization with risk aversion. Journal of Industrial & Management Optimization, 2021, 17 (1) : 233-259. doi: 10.3934/jimo.2019109

[17]

Yasmine Cherfaoui, Mustapha Moulaï. Biobjective optimization over the efficient set of multiobjective integer programming problem. Journal of Industrial & Management Optimization, 2021, 17 (1) : 117-131. doi: 10.3934/jimo.2019102

[18]

Liping Tang, Ying Gao. Some properties of nonconvex oriented distance function and applications to vector optimization problems. Journal of Industrial & Management Optimization, 2021, 17 (1) : 485-500. doi: 10.3934/jimo.2020117

[19]

Leilei Wei, Yinnian He. A fully discrete local discontinuous Galerkin method with the generalized numerical flux to solve the tempered fractional reaction-diffusion equation. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020319

[20]

Yuri Fedorov, Božidar Jovanović. Continuous and discrete Neumann systems on Stiefel varieties as matrix generalizations of the Jacobi–Mumford systems. Discrete & Continuous Dynamical Systems - A, 2020  doi: 10.3934/dcds.2020375

2019 Impact Factor: 1.366

Metrics

  • PDF downloads (37)
  • HTML views (0)
  • Cited by (0)

[Back to Top]