2016, 1(2&3): 185-216. doi: 10.3934/bdia.2016004

How do I choose the right NoSQL solution? A comprehensive theoretical and experimental survey

1. 

Center of Excellence for Research in Adaptive Systems, York University, Toronto, Ontario, Canada, Canada, Canada, Canada, Canada, Canada

Received  October 2015 Revised  December 2015 Published  September 2016

With the advent of the Internet of Things (IoT) and cloud computing,the need for data stores that would be able to store and process big data inan ecient and cost-e ective manner has increased dramatically. Traditionaldata stores seem to have numerous limitations in addressing such requirements.NoSQL data stores have been designed and implemented to address the shortcomingsof relational databases by compromising on ACID and transactionalproperties to achieve high scalability and availability. These systems are designedto scale to thousands or millions of users performing updates, as wellas reads, in contrast to traditional RDBMSs and data warehouses. Althoughthere is a plethora of potential NoSQL implementations, there is no one-size- t-all solution to satisfy even main requirements. In this paper, we explorepopular and commonly used NoSQL technologies and elaborate on their documentation,existing literature and performance evaluation. More speci cally,we will describe the background, characteristics, classi cation, data model andevaluation of NoSQL solutions that aim to provide the capabilities for big dataanalytics. This work is intended to help users, individuals or organizations,to obtain a clear view of the strengths and weaknesses of well-known NoSQLdata stores and select the right technology for their applications and use cases.To do so, we rst present a systematic approach to narrow down the properNoSQL candidates and then adopt an experimental methodology that can berepeated by anyone to nd the best among short listed candidates consideringtheir speci c requirements.
Citation: Hamzeh Khazaei, Marios Fokaefs, Saeed Zareian, Nasim Beigi-Mohammadi, Brian Ramprasad, Mark Shtern, Purwa Gaikwad, Marin Litoiu. How do I choose the right NoSQL solution? A comprehensive theoretical and experimental survey. Big Data & Information Analytics, 2016, 1 (2&3) : 185-216. doi: 10.3934/bdia.2016004
References:
[1]

Y. Abubakar, T. S. Adeyi and I. G. Auta, Performance evaluation of nosql systems using ycsb in a resource austere environment,, Performance Evaluation, 7 (2014), 23. doi: 10.5120/ijais14-451229.

[2]

P. Andlinger, 2015,, URL , ().

[3]

Apache Software Foundation, Apache tinkerpop, 2015,, URL , ().

[4]

Apache Software Foundation, Technical overview of apache couchdb, 2015,, URL , ().

[5]

ArangoDB GmbH, Arangodb documentation, 2015,, URL , ().

[6]

Aurelius LLC, Titan architecture overview, 2015,, URL , ().

[7]

Basho Technologies, Inc, Riak docs, 2015,, URL , ().

[8]

M. Burrows, The chubby lock service for loosely-coupled distributed systems,, in Proceedings of the 7th symposium on Operating systems design and implementation, (2006), 335.

[9]

R. Casado and M. Younas, Emerging trends and technologies in big data processing,, Concurrency and Computation: Practice and Experience, 27 (2015), 2078. doi: 10.1002/cpe.3398.

[10]

R. Cattell, Scalable sql and nosql data stores,, ACM SIGMOD Record, 39 (2010), 12. doi: 10.1145/1978915.1978919.

[11]

F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes and R. E. Gruber, Bigtable: A distributed storage system for structured data,, ACM Transactions on Computer Systems (TOCS), 26 (2008). doi: 10.1145/1365815.1365816.

[12]

B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan and R. Sears, Benchmarking cloud serving systems with ycsb,, in Proceedings of the 1st ACM symposium on Cloud computing, (2010), 143. doi: 10.1145/1807128.1807152.

[13]

S. Edlich, A. Friedland, J. Hampe, B. Brauer, M. Brückner, S. Edlich, A. Friedland, J. Hampe, B. Brauer and M. Brückner, Brückner,, Nosql., ().

[14]

A. Feinberg, Project voldemort: Reliable distributed storage,, in Proceedings of the 10th IEEE International Conference on Data Engineering, (2011).

[15]

B. Fitzpatrick, Distributed caching with memcached,, Linux journal, 2004 (2004).

[16]

S. K. Gajendran, A survey on nosql databases,, University of Illinois., ().

[17]

J. Gray, Graysort benchmark, 2015,, URL , ().

[18]

Hibernating Rhinos, Ravendb - the open source nosql database for .NET, 2015,, URL , ().

[19]

Hypertable Inc, Hypertable, 2014,, URL , ().

[20]

S. IT, Knowledge base of relational and nosql database management systems, 2015,, URL , ().

[21]

S. IT, System properties comparison neo4j vs. orientdb vs. titan, 2015,, URL , ().

[22]

J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. Wasi-ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur et al., Memcached design on high performance rdma capable interconnects,, in Parallel Processing (ICPP), (2011), 743. doi: 10.1109/ICPP.2011.37.

[23]

S. Jouili and V. Vansteenberghe, An empirical comparison of graph databases,, in Social Computing (SocialCom), (2013), 708. doi: 10.1109/SocialCom.2013.106.

[24]

J. Klein, I. Gorton, N. Ernst, P. Donohoe, K. Pham and C. Matser, Performance evaluation of nosql databases: A case study,, in Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems, (2015), 5. doi: 10.1145/2694730.2694731.

[25]

LinkedIn, Project voldemort, 2015,, URL , ().

[26]

R. C. McColl, D. Ediger, J. Poovey, D. Campbell and D. A. Bader, A performance evaluation of open source graph databases,, in Proceedings of the First Workshop on Parallel Programming for Analytics Applications, (2014), 11. doi: 10.1145/2567634.2567638.

[27]

MongoDB Inc., Mongodb 3.0 manual, 2015,, URL , ().

[28]

A. Moniruzzaman and S. A. Hossain, Nosql database: New era of databases for big data analytics-classification, characteristics and comparison,, arXiv preprint arXiv:1307.0191., ().

[29]

M. A. Olson, K. Bostic and M. I. Seltzer, Berkeley db.,, in USENIX Annual Technical Conference, (1999), 183.

[30]

Orient Technologies, Top 10 key advantages for going with orientdb, 2015,, URL , ().

[31]

A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden and M. Stonebraker, A comparison of approaches to large-scale data analysis,, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, (2009), 165. doi: 10.1145/1559845.1559865.

[32]

D. Pritchett, Base: An acid alternative,, Queue, 6 (2008), 48. doi: 10.1145/1394127.1394128.

[33]

T. Rabl, A. Ghazal, M. Hu, A. Crolotte, F. Raab, M. Poess and H.-A. Jacobsen, Bigbench specification v0. 1,, in Specifying Big Data Benchmarks, (2014), 164.

[34]

RedisLabs, Redis, 2015,, URL , ().

[35]

SAVI, Smart Applications on Virtual Infrastructure,, Cloud platform, (2015).

[36]

S. Sivasubramanian, Amazon dynamodb: A seamlessly scalable non-relational database service,, in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, (2012), 729. doi: 10.1145/2213836.2213945.

[37]

C. Strozzi, Nosql-a relational database management system, 2015,, URL , ().

[38]

Technology, Cypher query language, 2015,, URL , ().

[39]

The Apache Foundation, Apache accumulo, 2015,, URL , ().

[40]

The Apache Foundation, Welcome to apache cassandra, 2015,, URL , ().

[41]

The Apache Foundation, Welcome to apache hbase, 2015,, URL , ().

[42]

A. Tizghadam and A. Leon-Garcia, Connected Vehicles and Smart Transportation - CVST Platform, 2015,, URL , ().

[43]

G. Vaish, Getting started with NoSQL,, Packt Publishing Ltd, (2013).

[44]

vsChart.com, The comparison wiki: Database list, 2015,, URL , ().

[45]

P. Wiki, Pig mix benchmark, 2015,, URL, ().

show all references

References:
[1]

Y. Abubakar, T. S. Adeyi and I. G. Auta, Performance evaluation of nosql systems using ycsb in a resource austere environment,, Performance Evaluation, 7 (2014), 23. doi: 10.5120/ijais14-451229.

[2]

P. Andlinger, 2015,, URL , ().

[3]

Apache Software Foundation, Apache tinkerpop, 2015,, URL , ().

[4]

Apache Software Foundation, Technical overview of apache couchdb, 2015,, URL , ().

[5]

ArangoDB GmbH, Arangodb documentation, 2015,, URL , ().

[6]

Aurelius LLC, Titan architecture overview, 2015,, URL , ().

[7]

Basho Technologies, Inc, Riak docs, 2015,, URL , ().

[8]

M. Burrows, The chubby lock service for loosely-coupled distributed systems,, in Proceedings of the 7th symposium on Operating systems design and implementation, (2006), 335.

[9]

R. Casado and M. Younas, Emerging trends and technologies in big data processing,, Concurrency and Computation: Practice and Experience, 27 (2015), 2078. doi: 10.1002/cpe.3398.

[10]

R. Cattell, Scalable sql and nosql data stores,, ACM SIGMOD Record, 39 (2010), 12. doi: 10.1145/1978915.1978919.

[11]

F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes and R. E. Gruber, Bigtable: A distributed storage system for structured data,, ACM Transactions on Computer Systems (TOCS), 26 (2008). doi: 10.1145/1365815.1365816.

[12]

B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan and R. Sears, Benchmarking cloud serving systems with ycsb,, in Proceedings of the 1st ACM symposium on Cloud computing, (2010), 143. doi: 10.1145/1807128.1807152.

[13]

S. Edlich, A. Friedland, J. Hampe, B. Brauer, M. Brückner, S. Edlich, A. Friedland, J. Hampe, B. Brauer and M. Brückner, Brückner,, Nosql., ().

[14]

A. Feinberg, Project voldemort: Reliable distributed storage,, in Proceedings of the 10th IEEE International Conference on Data Engineering, (2011).

[15]

B. Fitzpatrick, Distributed caching with memcached,, Linux journal, 2004 (2004).

[16]

S. K. Gajendran, A survey on nosql databases,, University of Illinois., ().

[17]

J. Gray, Graysort benchmark, 2015,, URL , ().

[18]

Hibernating Rhinos, Ravendb - the open source nosql database for .NET, 2015,, URL , ().

[19]

Hypertable Inc, Hypertable, 2014,, URL , ().

[20]

S. IT, Knowledge base of relational and nosql database management systems, 2015,, URL , ().

[21]

S. IT, System properties comparison neo4j vs. orientdb vs. titan, 2015,, URL , ().

[22]

J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. Wasi-ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur et al., Memcached design on high performance rdma capable interconnects,, in Parallel Processing (ICPP), (2011), 743. doi: 10.1109/ICPP.2011.37.

[23]

S. Jouili and V. Vansteenberghe, An empirical comparison of graph databases,, in Social Computing (SocialCom), (2013), 708. doi: 10.1109/SocialCom.2013.106.

[24]

J. Klein, I. Gorton, N. Ernst, P. Donohoe, K. Pham and C. Matser, Performance evaluation of nosql databases: A case study,, in Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems, (2015), 5. doi: 10.1145/2694730.2694731.

[25]

LinkedIn, Project voldemort, 2015,, URL , ().

[26]

R. C. McColl, D. Ediger, J. Poovey, D. Campbell and D. A. Bader, A performance evaluation of open source graph databases,, in Proceedings of the First Workshop on Parallel Programming for Analytics Applications, (2014), 11. doi: 10.1145/2567634.2567638.

[27]

MongoDB Inc., Mongodb 3.0 manual, 2015,, URL , ().

[28]

A. Moniruzzaman and S. A. Hossain, Nosql database: New era of databases for big data analytics-classification, characteristics and comparison,, arXiv preprint arXiv:1307.0191., ().

[29]

M. A. Olson, K. Bostic and M. I. Seltzer, Berkeley db.,, in USENIX Annual Technical Conference, (1999), 183.

[30]

Orient Technologies, Top 10 key advantages for going with orientdb, 2015,, URL , ().

[31]

A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden and M. Stonebraker, A comparison of approaches to large-scale data analysis,, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, (2009), 165. doi: 10.1145/1559845.1559865.

[32]

D. Pritchett, Base: An acid alternative,, Queue, 6 (2008), 48. doi: 10.1145/1394127.1394128.

[33]

T. Rabl, A. Ghazal, M. Hu, A. Crolotte, F. Raab, M. Poess and H.-A. Jacobsen, Bigbench specification v0. 1,, in Specifying Big Data Benchmarks, (2014), 164.

[34]

RedisLabs, Redis, 2015,, URL , ().

[35]

SAVI, Smart Applications on Virtual Infrastructure,, Cloud platform, (2015).

[36]

S. Sivasubramanian, Amazon dynamodb: A seamlessly scalable non-relational database service,, in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, (2012), 729. doi: 10.1145/2213836.2213945.

[37]

C. Strozzi, Nosql-a relational database management system, 2015,, URL , ().

[38]

Technology, Cypher query language, 2015,, URL , ().

[39]

The Apache Foundation, Apache accumulo, 2015,, URL , ().

[40]

The Apache Foundation, Welcome to apache cassandra, 2015,, URL , ().

[41]

The Apache Foundation, Welcome to apache hbase, 2015,, URL , ().

[42]

A. Tizghadam and A. Leon-Garcia, Connected Vehicles and Smart Transportation - CVST Platform, 2015,, URL , ().

[43]

G. Vaish, Getting started with NoSQL,, Packt Publishing Ltd, (2013).

[44]

vsChart.com, The comparison wiki: Database list, 2015,, URL , ().

[45]

P. Wiki, Pig mix benchmark, 2015,, URL, ().

[1]

Mohamed Baouch, Juan Antonio López-Ramos, Blas Torrecillas, Reto Schnyder. An active attack on a distributed Group Key Exchange system. Advances in Mathematics of Communications, 2017, 11 (4) : 715-717. doi: 10.3934/amc.2017052

[2]

Yoichi Enatsu, Yukihiko Nakata, Yoshiaki Muroya. Global stability of SIR epidemic models with a wide class of nonlinear incidence rates and distributed delays. Discrete & Continuous Dynamical Systems - B, 2011, 15 (1) : 61-74. doi: 10.3934/dcdsb.2011.15.61

[3]

Keiji Tatsumi, Masashi Akao, Ryo Kawachi, Tetsuzo Tanino. Performance evaluation of multiobjective multiclass support vector machines maximizing geometric margins. Numerical Algebra, Control & Optimization, 2011, 1 (1) : 151-169. doi: 10.3934/naco.2011.1.151

[4]

Shunfu Jin, Wuyi Yue, Zhanqiang Huo. Performance evaluation for connection oriented service in the next generation Internet. Numerical Algebra, Control & Optimization, 2011, 1 (4) : 749-761. doi: 10.3934/naco.2011.1.749

[5]

Shunfu Jin, Wuyi Yue, Chao Meng, Zsolt Saffer. A novel active DRX mechanism in LTE technology and its performance evaluation. Journal of Industrial & Management Optimization, 2015, 11 (3) : 849-866. doi: 10.3934/jimo.2015.11.849

[6]

Tuan Phung-Duc, Wouter Rogiest, Sabine Wittevrongel. Single server retrial queues with speed scaling: Analysis and performance evaluation. Journal of Industrial & Management Optimization, 2017, 13 (4) : 1927-1943. doi: 10.3934/jimo.2017025

[7]

Sergei Avdonin, Jonathan Bell. Determining a distributed conductance parameter for a neuronal cable model defined on a tree graph. Inverse Problems & Imaging, 2015, 9 (3) : 645-659. doi: 10.3934/ipi.2015.9.645

[8]

Tsuguhito Hirai, Hiroyuki Masuyama, Shoji Kasahara, Yutaka Takahashi. Performance optimization of parallel-distributed processing with checkpointing for cloud environment. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-20. doi: 10.3934/jimo.2018014

[9]

Shunfu Jin, Wuyi Yue, Xuena Yan. Performance evaluation of a power saving mechanism in IEEE 802.16 wireless MANs with bi-directional traffic. Journal of Industrial & Management Optimization, 2011, 7 (3) : 717-733. doi: 10.3934/jimo.2011.7.717

[10]

Shunfu Jin, Wuyi Yue. Performance analysis and evaluation for power saving class type III in IEEE 802.16e network. Journal of Industrial & Management Optimization, 2010, 6 (3) : 691-708. doi: 10.3934/jimo.2010.6.691

[11]

Yuan Zhao, Wuyi Yue. Performance evaluation and optimization of cognitive radio networks with adjustable access control for multiple secondary users. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-14. doi: 10.3934/jimo.2018029

[12]

Omer Faruk Yilmaz, Mehmet Bulent Durmusoglu. A performance comparison and evaluation of metaheuristics for a batch scheduling problem in a multi-hybrid cell manufacturing system with skilled workforce assignment. Journal of Industrial & Management Optimization, 2018, 14 (3) : 1219-1249. doi: 10.3934/jimo.2018007

[13]

VicenŢiu D. RǍdulescu, Somayeh Saiedinezhad. A nonlinear eigenvalue problem with $ p(x) $-growth and generalized Robin boundary value condition. Communications on Pure & Applied Analysis, 2018, 17 (1) : 39-52. doi: 10.3934/cpaa.2018003

[14]

Patrizia Pucci, Maria Cesarina Salvatori. On an initial value problem modeling evolution and selection in living systems. Discrete & Continuous Dynamical Systems - S, 2014, 7 (4) : 807-821. doi: 10.3934/dcdss.2014.7.807

[15]

Anupam Gautam, Selvamuthu Dharmaraja. Selection of DRX scheme for voice traffic in LTE-A networks: Markov modeling and performance analysis. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-18. doi: 10.3934/jimo.2018068

[16]

Zhanqiang Huo, Wuyi Yue, Naishuo Tian, Shunfu Jin. Performance evaluation for the sleep mode in the IEEE 802.16e based on a queueing model with close-down time and multiple vacations. Journal of Industrial & Management Optimization, 2009, 5 (3) : 511-524. doi: 10.3934/jimo.2009.5.511

[17]

Shiva Moslemi, Abolfazl Mirzazadeh. Performance evaluation of four-stage blood supply chain with feedback variables using NDEA cross-efficiency and entropy measures under IER uncertainty. Numerical Algebra, Control & Optimization, 2017, 7 (4) : 379-401. doi: 10.3934/naco.2017024

[18]

Shaojun Lan, Yinghui Tang. Performance analysis of a discrete-time $ Geo/G/1$ retrial queue with non-preemptive priority, working vacations and vacation interruption. Journal of Industrial & Management Optimization, 2017, 13 (5) : 1-26. doi: 10.3934/jimo.2018102

[19]

Tsuguhito Hirai, Hiroyuki Masuyama, Shoji Kasahara, Yutaka Takahashi. Performance analysis of large-scale parallel-distributed processing with backup tasks for cloud computing. Journal of Industrial & Management Optimization, 2014, 10 (1) : 113-129. doi: 10.3934/jimo.2014.10.113

[20]

David W. Pravica, Michael J. Spurr. Analytic continuation into the future. Conference Publications, 2003, 2003 (Special) : 709-716. doi: 10.3934/proc.2003.2003.709

[Back to Top]