April  2016, 12(2): 515-527. doi: 10.3934/jimo.2016.12.515

A combinatorial optimization approach to the selection of statistical units

1. 

Università di Roma "Sapienza", Dip. di Ing. Informatica, Automatica e Gestionale (DIAG), Via Ariosto 25, Roma, 00185, Italy

2. 

Italian National Statistic Office “Istat", Dip. per i Censimenti e gli Archivi Amm. e Statistici (DICA), Viale Oceano Pacifico 171, Roma, 00144, Italy

3. 

Italian National Statistic Office “Istat”, Dip. per i Censimenti e gli Archivi Amm. e Statistici (DICA), Viale Oceano Pacifico 171, Roma, 00144, Italy

Received  April 2014 Revised  November 2014 Published  June 2015

In the case of some large statistical surveys, the set of units that will constitute the scope of the survey must be selected. We focus on the real case of a Census of Agriculture, where the units are farms. Surveying each unit has a cost and brings a different portion of the whole information. In this case, one wants to determine a subset of units producing the minimum total cost for being surveyed and representing at least a certain portion of the total information. Uncertainty aspects also occur, because the portion of information corresponding to each unit is not perfectly known before surveying it. The proposed approach is based on combinatorial optimization, and the arising decision problems are modeled as multidimensional binary knapsack problems. Experimental results show the effectiveness of the proposed approach.
Citation: Renato Bruni, Gianpiero Bianchi, Alessandra Reale. A combinatorial optimization approach to the selection of statistical units. Journal of Industrial & Management Optimization, 2016, 12 (2) : 515-527. doi: 10.3934/jimo.2016.12.515
References:
[1]

E. Balas, Facets of the knapsack polytope,, Mathematical Programming, 8 (1975), 146.  doi: 10.1007/BF01580440.  Google Scholar

[2]

R. M. Bell and M. L. Cohen, Coverage Measurement in the 2010 Census - Panel on Correlation Bias and Coverage Measurement in the 2010 Decennial Census,, The National Academic Press, (2008).   Google Scholar

[3]

G. Bianchi, R. Bruni and A. Reale, Information reconstruction via discrete optimization for agricultural census data,, Applied Mathematical Sciences, 6 (2012), 6241.   Google Scholar

[4]

G. Bianchi, R. Bruni and A. Reale, Balancing of agricultural census data by using discrete optimization,, Optimization Letters, 8 (2014), 1553.  doi: 10.1007/s11590-013-0652-3.  Google Scholar

[5]

G. Bianchi, R. Bruni and A. Reale, Open source integer linear programming solvers for error localization in numerical data,, In Advances in Theoretical and Applied Statistics (eds. N. Torelli, (2012).  doi: 10.1007/978-3-642-35588-2_28.  Google Scholar

[6]

E. Boros, A. Scozzari, F. Tardella and P. Veneziani, Polynomially computable bounds for the probability of the union of events,, Mathematics of Operations Research, 39 (2014), 1311.  doi: 10.1287/moor.2014.0657.  Google Scholar

[7]

R. Bruni, Discrete models for data imputation,, Discrete Applied Mathematics, 144 (2004), 59.  doi: 10.1016/j.dam.2004.04.004.  Google Scholar

[8]

R. Bruni, Error correction for massive data sets,, Optimization Methods and Software, 20 (2005), 295.  doi: 10.1080/10556780512331318281.  Google Scholar

[9]

European Parliament, Regulation of the European Parliament,, N. 1166/2008, (1166).   Google Scholar

[10]

Food and Agriculture Organization of the United Nations (FAO), A System of Integrated Agricultural Censuses and Surveys,, Vol.1 - World Programme for the Census of Agriculture 2010. FAO Statistical Development Series (2005)., (2005).   Google Scholar

[11]

M. Ferri and M. Piccioni, Optimal selection of statistical units: An approach via simulated annealing,, Computational Statistics & Data Analysis, 13 (1992), 47.  doi: 10.1016/0167-9473(92)90153-7.  Google Scholar

[12]

M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness,, W.H. Freeman, (1979).   Google Scholar

[13]

R. M. Groves, F. J. Jr. Fowler, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology,, Wiley Series in Survey Methodology, (2009).   Google Scholar

[14]

T. Hailperin, Best possible inequalities for the probability of a logical function of events,, The American Mathematical Monthly, 72 (1965), 343.  doi: 10.2307/2313491.  Google Scholar

[15]

P. L. Hammer, E. L. Johnson and U. N. Peled, Facets of regular 0-1 polytopes,, Mathematical Programming, 8 (1975), 179.  doi: 10.1007/BF01580442.  Google Scholar

[16]

W. K. Kremers, Completeness and unbiased estimation for sum-quota sampling,, Journal of the American Statistical Association, 81 (1986), 1070.   Google Scholar

[17]

H. Marchand, A. Martin, R. Weismantel and L. Wolsey, Cutting planes in integer and mixed integer programming,, Discrete Applied Mathematics, 123 (2002), 397.  doi: 10.1016/S0166-218X(01)00348-1.  Google Scholar

[18]

C. A. Moser, Quota sampling,, Journal of the Royal Statistical Society, 115 (1952), 411.  doi: 10.2307/2980740.  Google Scholar

[19]

A. Mucherino, P. Papajorgji and P. M. Pardalos, Data Mining in Agriculture,, Springer, (2009).  doi: 10.1007/978-0-387-88615-2.  Google Scholar

[20]

K. G. Murty, Linear Programming,, John Wiley & Sons Inc., (1983).   Google Scholar

[21]

G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization,, John Wiley, (1988).  doi: 10.1002/9781118627372.  Google Scholar

[22]

T. M. F. Smith, Populations and selection: Limitations of statistics,, Journal of the Royal Statistical Society Series A (Statistics in Society), 156 (1993), 144.  doi: 10.2307/2982726.  Google Scholar

[23]

S. Sudman, Probability sampling with quotas,, Journal of the American Statistical Association, 61 (1966), 749.  doi: 10.1080/01621459.1966.10480903.  Google Scholar

[24]

L. A. Wolsey, Faces for a linear inequality in 0-1 variables,, Mathematical Programming, 8 (1975), 165.  doi: 10.1007/BF01580441.  Google Scholar

[25]

Y. Zhang, F. Zhang and M. Cai, Some new results on multi-dimension Knapsack problem,, Journal of Industrial and Management Optimization, 1 (2005), 315.  doi: 10.3934/jimo.2005.1.315.  Google Scholar

show all references

References:
[1]

E. Balas, Facets of the knapsack polytope,, Mathematical Programming, 8 (1975), 146.  doi: 10.1007/BF01580440.  Google Scholar

[2]

R. M. Bell and M. L. Cohen, Coverage Measurement in the 2010 Census - Panel on Correlation Bias and Coverage Measurement in the 2010 Decennial Census,, The National Academic Press, (2008).   Google Scholar

[3]

G. Bianchi, R. Bruni and A. Reale, Information reconstruction via discrete optimization for agricultural census data,, Applied Mathematical Sciences, 6 (2012), 6241.   Google Scholar

[4]

G. Bianchi, R. Bruni and A. Reale, Balancing of agricultural census data by using discrete optimization,, Optimization Letters, 8 (2014), 1553.  doi: 10.1007/s11590-013-0652-3.  Google Scholar

[5]

G. Bianchi, R. Bruni and A. Reale, Open source integer linear programming solvers for error localization in numerical data,, In Advances in Theoretical and Applied Statistics (eds. N. Torelli, (2012).  doi: 10.1007/978-3-642-35588-2_28.  Google Scholar

[6]

E. Boros, A. Scozzari, F. Tardella and P. Veneziani, Polynomially computable bounds for the probability of the union of events,, Mathematics of Operations Research, 39 (2014), 1311.  doi: 10.1287/moor.2014.0657.  Google Scholar

[7]

R. Bruni, Discrete models for data imputation,, Discrete Applied Mathematics, 144 (2004), 59.  doi: 10.1016/j.dam.2004.04.004.  Google Scholar

[8]

R. Bruni, Error correction for massive data sets,, Optimization Methods and Software, 20 (2005), 295.  doi: 10.1080/10556780512331318281.  Google Scholar

[9]

European Parliament, Regulation of the European Parliament,, N. 1166/2008, (1166).   Google Scholar

[10]

Food and Agriculture Organization of the United Nations (FAO), A System of Integrated Agricultural Censuses and Surveys,, Vol.1 - World Programme for the Census of Agriculture 2010. FAO Statistical Development Series (2005)., (2005).   Google Scholar

[11]

M. Ferri and M. Piccioni, Optimal selection of statistical units: An approach via simulated annealing,, Computational Statistics & Data Analysis, 13 (1992), 47.  doi: 10.1016/0167-9473(92)90153-7.  Google Scholar

[12]

M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness,, W.H. Freeman, (1979).   Google Scholar

[13]

R. M. Groves, F. J. Jr. Fowler, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology,, Wiley Series in Survey Methodology, (2009).   Google Scholar

[14]

T. Hailperin, Best possible inequalities for the probability of a logical function of events,, The American Mathematical Monthly, 72 (1965), 343.  doi: 10.2307/2313491.  Google Scholar

[15]

P. L. Hammer, E. L. Johnson and U. N. Peled, Facets of regular 0-1 polytopes,, Mathematical Programming, 8 (1975), 179.  doi: 10.1007/BF01580442.  Google Scholar

[16]

W. K. Kremers, Completeness and unbiased estimation for sum-quota sampling,, Journal of the American Statistical Association, 81 (1986), 1070.   Google Scholar

[17]

H. Marchand, A. Martin, R. Weismantel and L. Wolsey, Cutting planes in integer and mixed integer programming,, Discrete Applied Mathematics, 123 (2002), 397.  doi: 10.1016/S0166-218X(01)00348-1.  Google Scholar

[18]

C. A. Moser, Quota sampling,, Journal of the Royal Statistical Society, 115 (1952), 411.  doi: 10.2307/2980740.  Google Scholar

[19]

A. Mucherino, P. Papajorgji and P. M. Pardalos, Data Mining in Agriculture,, Springer, (2009).  doi: 10.1007/978-0-387-88615-2.  Google Scholar

[20]

K. G. Murty, Linear Programming,, John Wiley & Sons Inc., (1983).   Google Scholar

[21]

G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization,, John Wiley, (1988).  doi: 10.1002/9781118627372.  Google Scholar

[22]

T. M. F. Smith, Populations and selection: Limitations of statistics,, Journal of the Royal Statistical Society Series A (Statistics in Society), 156 (1993), 144.  doi: 10.2307/2982726.  Google Scholar

[23]

S. Sudman, Probability sampling with quotas,, Journal of the American Statistical Association, 61 (1966), 749.  doi: 10.1080/01621459.1966.10480903.  Google Scholar

[24]

L. A. Wolsey, Faces for a linear inequality in 0-1 variables,, Mathematical Programming, 8 (1975), 165.  doi: 10.1007/BF01580441.  Google Scholar

[25]

Y. Zhang, F. Zhang and M. Cai, Some new results on multi-dimension Knapsack problem,, Journal of Industrial and Management Optimization, 1 (2005), 315.  doi: 10.3934/jimo.2005.1.315.  Google Scholar

[1]

Subrata Dasgupta. Disentangling data, information and knowledge. Big Data & Information Analytics, 2016, 1 (4) : 377-389. doi: 10.3934/bdia.2016016

[2]

Jose-Luis Roca-Gonzalez. Designing dynamical systems for security and defence network knowledge management. A case of study: Airport bird control falconers organizations. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1311-1329. doi: 10.3934/dcdss.2015.8.1311

[3]

Qinglei Zhang, Wenying Feng. Detecting coalition attacks in online advertising: A hybrid data mining approach. Big Data & Information Analytics, 2016, 1 (2&3) : 227-245. doi: 10.3934/bdia.2016006

[4]

Zhen Mei. Manifold data mining helps businesses grow more effectively. Big Data & Information Analytics, 2016, 1 (2&3) : 275-276. doi: 10.3934/bdia.2016009

[5]

Sunmoo Yoon, Maria Patrao, Debbie Schauer, Jose Gutierrez. Prediction models for burden of caregivers applying data mining techniques. Big Data & Information Analytics, 2017, 2 (5) : 1-9. doi: 10.3934/bdia.2017014

[6]

Anupama N, Sudarson Jena. A novel approach using incremental under sampling for data stream mining. Big Data & Information Analytics, 2017, 2 (5) : 1-13. doi: 10.3934/bdia.2017017

[7]

Feyza Gürbüz, Panos M. Pardalos. A decision making process application for the slurry production in ceramics via fuzzy cluster and data mining. Journal of Industrial & Management Optimization, 2012, 8 (2) : 285-297. doi: 10.3934/jimo.2012.8.285

[8]

Sarah Ibri. An efficient distributed optimization and coordination protocol: Application to the emergency vehicle management. Journal of Industrial & Management Optimization, 2015, 11 (1) : 41-63. doi: 10.3934/jimo.2015.11.41

[9]

Zhongbao Zhou, Ximei Zeng, Helu Xiao, Tiantian Ren, Wenbin Liu. Multiperiod portfolio optimization for asset-liability management with quadratic transaction costs. Journal of Industrial & Management Optimization, 2019, 15 (3) : 1493-1515. doi: 10.3934/jimo.2018106

[10]

Tran Ngoc Thach, Nguyen Huy Tuan, Donal O'Regan. Regularized solution for a biharmonic equation with discrete data. Evolution Equations & Control Theory, 2019, 0 (0) : 0-0. doi: 10.3934/eect.2020008

[11]

Xueting Cui, Xiaoling Sun, Dan Sha. An empirical study on discrete optimization models for portfolio selection. Journal of Industrial & Management Optimization, 2009, 5 (1) : 33-46. doi: 10.3934/jimo.2009.5.33

[12]

Zhong Wan, Jingjing Liu, Jing Zhang. Nonlinear optimization to management problems of end-of-life vehicles with environmental protection awareness and damaged/aging degrees. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-23. doi: 10.3934/jimo.2019046

[13]

Dan Li, Li-Ping Pang, Fang-Fang Guo, Zun-Quan Xia. An alternating linearization method with inexact data for bilevel nonsmooth convex optimization. Journal of Industrial & Management Optimization, 2014, 10 (3) : 859-869. doi: 10.3934/jimo.2014.10.859

[14]

Lianshuan Shi, Enmin Feng, Huanchun Sun, Zhaosheng Feng. A two-step algorithm for layout optimization of structures with discrete variables. Journal of Industrial & Management Optimization, 2007, 3 (3) : 543-552. doi: 10.3934/jimo.2007.3.543

[15]

Guillaume Bal, Eric Bonnetier, François Monard, Faouzi Triki. Inverse diffusion from knowledge of power densities. Inverse Problems & Imaging, 2013, 7 (2) : 353-375. doi: 10.3934/ipi.2013.7.353

[16]

Donglei Du, Tianping Shuai. Errata to:''Optimal preemptive online scheduling to minimize $l_{p}$ norm on two processors''[Journal of Industrial and Management Optimization, 1(3) (2005), 345-351.]. Journal of Industrial & Management Optimization, 2008, 4 (2) : 339-341. doi: 10.3934/jimo.2008.4.339

[17]

Tao Zhang, Yue-Jie Zhang, Qipeng P. Zheng, P. M. Pardalos. A hybrid particle swarm optimization and tabu search algorithm for order planning problems of steel factories based on the Make-To-Stock and Make-To-Order management architecture. Journal of Industrial & Management Optimization, 2011, 7 (1) : 31-51. doi: 10.3934/jimo.2011.7.31

[18]

Rodrigo I. Brevis, Jaime H. Ortega, David Pardo. A source time reversal method for seismicity induced by mining. Inverse Problems & Imaging, 2017, 11 (1) : 25-45. doi: 10.3934/ipi.2017002

[19]

Julius Fergy T. Rabago, Jerico B. Bacani. Shape optimization approach for solving the Bernoulli problem by tracking the Neumann data: A Lagrangian formulation. Communications on Pure & Applied Analysis, 2018, 17 (6) : 2683-2702. doi: 10.3934/cpaa.2018127

[20]

Jian-Wu Xue, Xiao-Kun Xu, Feng Zhang. Big data dynamic compressive sensing system architecture and optimization algorithm for internet of things. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1401-1414. doi: 10.3934/dcdss.2015.8.1401

2018 Impact Factor: 1.025

Metrics

  • PDF downloads (12)
  • HTML views (0)
  • Cited by (0)

[Back to Top]