doi: 10.3934/bdia.2018001
Online First

Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Online First articles via the “Online First” tab for the selected journal.

Understanding AI in a world of big data

Environics Analytics, 33 Bloor St. East, Toronto, Ont. M4W3H1, Canada

Early access October 2018

Big Data and AI are now very popular concepts within the public lexicon. Yet, much confusion exists as to what these concepts actually mean and more importantly why they are significant forces within the world today. New tools and technologies now allow better access as well as facilitating the analysis of this data for better decision-making. But the discipline of data science with its four-step process in conducting any analysis is the key towards success in both non-advanced and advanced analytics which would, of course, include the use of AI. This paper attempts to demystify these concepts from a data science perspective. In attempting to understand Big Data and AI, we look at the history of data science and how these more recent concepts have helped to optimize solutions within this 4 step process.

Citation: Richard Boire. Understanding AI in a world of big data. Big Data & Information Analytics, doi: 10.3934/bdia.2018001
References:
[1]

Figure.1: The 5 V's of big data, Environics Analytics: Best Practices and Considerations in Big Data Analytics, June, 2018. Google Scholar

[2]

Figure.2: Moore's Law, https://www.google.ca/search?hl=en&tbm=isch&source=hp&biw=1366&bih=651&ei=wd3pWuPdMqqPjwSUr4SQDQ&q=exponential+growth+in+computing+power&oq=growth+in+computing+power&gs_l=img.1.1.0j0i5i30k1.4574.14021.0.16389.28.27.0.1.0.0.182.2657.16j10.26.0....0...1ac.1.64.img..1.25.2467.0..0i24k1j0i8i30k1.0.eDlGB4j2AdI#imgrc=jhm-BdlhnmB2HM:. Google Scholar

[3]

Figure.3: Columnar file formats, https://www.google.ca/search?hl=en&tbm=isch&source=hp&biw=1366&bih=651&ei=wd3pWuPdMqqPjwSUr4SQDQ&q=exponential+growth+in+computing+power&oq=growth+in+computing+power&gs_l=img.1.1.0j0i5i30k1.4574.14021.0.16389.28.27.0.1.0.0.182.2657.16j10.26.0....0...1ac.1.64.img..1.25.2467.0..0i24k1j0i8i30k1.0.eDlGB4j2AdI#imgrc=jhm-BdlhnmB2HM:. Google Scholar

[4]

Index compression, https://nlp.stanford.edu/IR-book/html/htmledition/index-compression-1.html. Google Scholar

[5]

Figure.6-Sequential vs. parallel data processing, https://www.google.ca/search?biw=1607&bih=678&tbm=isch&sa=1&ei=UVPwWu_uGoeYjwSkqrjwBA&q=sequential+db+processing&oq=sequential+db+processing&gs_l=img.3...0.0.0.123836.0.0.0.0.0.0.0.0..0.0....0...1c..64.img..0.0.0....0.jkNEKg1fCW0#imgdii=kH8ag2orN-LWNM:&imgrc=pBOBcUMsqlXNGM:&spf=1525699534175. Google Scholar

[6]

Turn to in-memory processing when performance matters, https://searchdatacenter.techtarget.com/feature/Turn-to-in-memory-processing-when-performance-matters. Google Scholar

[7]

Figure.8: Schematic of weights within neural net structure, https://www.google.ca/search?hl=en&tbm=isch&source=hp&biw=1366&bih=651&ei=bpvwWv2FM82O5wLqzLigCA&q=neural+net+simple+network&oq=neural+net+simple+network&gs_l=img.3...1065.21853.0.22452.38.24.0.14.14.0.120.1836.21j2.23.0....0...1ac.1.64.img..1.13.1052.0..0j0i24k1j0i10i24k1j0i10k1j0i7i30k1.0.nu7gREvNHkk#imgrc=13gO7BFb0GYZqM:. Google Scholar

[8]

Figure. 9-Examples of some optimization algorithms, https://www.google.ca/search?hl=en&tbm=isch&q=logistic+function&chips=q:logistic+function,g_5:logistical&sa=X&ved=0ahUKEwjw-KD5oPTaAhWkpFkKHSxSDJwQ4lYIMCgA&biw=1366&bih=651&dpr=1#imgrc=oAHIGiD5uTjw2M: https://www.google.ca/search?hl=en&tbm=isch&q=tan+function+graph&chips=q:tan+function+graph,g_1:tangent,online_chips:cos+tan&sa=X&ved=0ahUKEwjK-IGYovTaAhVQwlkKHUBnC0cQ4lYIKygC&biw=1366&bih=651&dpr=1#imgrc=gWnErav-9CIbGM:. Google Scholar

[9]

"Is predictive analytics for marketers really that accurate?", Journal of Marketing Analytics, May, 2013. https://link.springer.com/article/10.1057/jma.2013.8. Google Scholar

[10]

"Data Mining for Managers: How to use data (big and small) to solve business problems", by Palgrave Macmillan, Oct, 2014. Google Scholar

show all references

References:
[1]

Figure.1: The 5 V's of big data, Environics Analytics: Best Practices and Considerations in Big Data Analytics, June, 2018. Google Scholar

[2]

Figure.2: Moore's Law, https://www.google.ca/search?hl=en&tbm=isch&source=hp&biw=1366&bih=651&ei=wd3pWuPdMqqPjwSUr4SQDQ&q=exponential+growth+in+computing+power&oq=growth+in+computing+power&gs_l=img.1.1.0j0i5i30k1.4574.14021.0.16389.28.27.0.1.0.0.182.2657.16j10.26.0....0...1ac.1.64.img..1.25.2467.0..0i24k1j0i8i30k1.0.eDlGB4j2AdI#imgrc=jhm-BdlhnmB2HM:. Google Scholar

[3]

Figure.3: Columnar file formats, https://www.google.ca/search?hl=en&tbm=isch&source=hp&biw=1366&bih=651&ei=wd3pWuPdMqqPjwSUr4SQDQ&q=exponential+growth+in+computing+power&oq=growth+in+computing+power&gs_l=img.1.1.0j0i5i30k1.4574.14021.0.16389.28.27.0.1.0.0.182.2657.16j10.26.0....0...1ac.1.64.img..1.25.2467.0..0i24k1j0i8i30k1.0.eDlGB4j2AdI#imgrc=jhm-BdlhnmB2HM:. Google Scholar

[4]

Index compression, https://nlp.stanford.edu/IR-book/html/htmledition/index-compression-1.html. Google Scholar

[5]

Figure.6-Sequential vs. parallel data processing, https://www.google.ca/search?biw=1607&bih=678&tbm=isch&sa=1&ei=UVPwWu_uGoeYjwSkqrjwBA&q=sequential+db+processing&oq=sequential+db+processing&gs_l=img.3...0.0.0.123836.0.0.0.0.0.0.0.0..0.0....0...1c..64.img..0.0.0....0.jkNEKg1fCW0#imgdii=kH8ag2orN-LWNM:&imgrc=pBOBcUMsqlXNGM:&spf=1525699534175. Google Scholar

[6]

Turn to in-memory processing when performance matters, https://searchdatacenter.techtarget.com/feature/Turn-to-in-memory-processing-when-performance-matters. Google Scholar

[7]

Figure.8: Schematic of weights within neural net structure, https://www.google.ca/search?hl=en&tbm=isch&source=hp&biw=1366&bih=651&ei=bpvwWv2FM82O5wLqzLigCA&q=neural+net+simple+network&oq=neural+net+simple+network&gs_l=img.3...1065.21853.0.22452.38.24.0.14.14.0.120.1836.21j2.23.0....0...1ac.1.64.img..1.13.1052.0..0j0i24k1j0i10i24k1j0i10k1j0i7i30k1.0.nu7gREvNHkk#imgrc=13gO7BFb0GYZqM:. Google Scholar

[8]

Figure. 9-Examples of some optimization algorithms, https://www.google.ca/search?hl=en&tbm=isch&q=logistic+function&chips=q:logistic+function,g_5:logistical&sa=X&ved=0ahUKEwjw-KD5oPTaAhWkpFkKHSxSDJwQ4lYIMCgA&biw=1366&bih=651&dpr=1#imgrc=oAHIGiD5uTjw2M: https://www.google.ca/search?hl=en&tbm=isch&q=tan+function+graph&chips=q:tan+function+graph,g_1:tangent,online_chips:cos+tan&sa=X&ved=0ahUKEwjK-IGYovTaAhVQwlkKHUBnC0cQ4lYIKygC&biw=1366&bih=651&dpr=1#imgrc=gWnErav-9CIbGM:. Google Scholar

[9]

"Is predictive analytics for marketers really that accurate?", Journal of Marketing Analytics, May, 2013. https://link.springer.com/article/10.1057/jma.2013.8. Google Scholar

[10]

"Data Mining for Managers: How to use data (big and small) to solve business problems", by Palgrave Macmillan, Oct, 2014. Google Scholar

1] The 5 V's of Big Data">Figure 1.  [1] The 5 V's of Big Data
2] Moore's Law">Figure 2.  [2] Moore's Law
3] Columnar File Format">Figure 3.  [3] Columnar File Format
Figure 4.  Example of Structured Data
Figure 5.  Example of Twitter Data
Figure 6.  Sequential vs. Parallel Data Processing
Figure 7.  Schematic of Simple Neural Net-One Hidden layer
7] Schematic of Weights within Neural Net Structure">Figure 8.  [7] Schematic of Weights within Neural Net Structure
8] Examples of some Optimization Algorithms">Figure 9.  [8] Examples of some Optimization Algorithms
Figure 10.  Examples of Neural Nets
Figure 11.  Sample of 3 records
Figure 12.  Sample of 3 records-Fixed
Figure 13.  Frequency Distribution of Numeric Variable
Figure 14.  Frequency Distribution of Character Variable
Figure 15.  Example of Data Diagnostics
Figure 16.  Example of Alteryx Software
Figure 17.  Example of Gains/Decile Table
Figure 18.  Example of Final Model Variable Contribution Report
Figure 19.  Example of Final Model Variable Contribution Report
[1]

Yang Yu. Introduction: Special issue on computational intelligence methods for big data and information analytics. Big Data & Information Analytics, 2017, 2 (1) : i-ii. doi: 10.3934/bdia.201701i

[2]

Andreas Chirstmann, Qiang Wu, Ding-Xuan Zhou. Preface to the special issue on analysis in machine learning and data science. Communications on Pure & Applied Analysis, 2020, 19 (8) : i-iii. doi: 10.3934/cpaa.2020171

[3]

Yaguang Huangfu, Guanqing Liang, Jiannong Cao. MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics. Big Data & Information Analytics, 2016, 1 (4) : 349-376. doi: 10.3934/bdia.2016015

[4]

Xiangmin Zhang. User perceived learning from interactive searching on big medical literature data. Big Data & Information Analytics, 2018  doi: 10.3934/bdia.2017019

[5]

Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu. Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective. Big Data & Information Analytics, 2016, 1 (1) : 111-127. doi: 10.3934/bdia.2016.1.111

[6]

Jiang Xie, Junfu Xu, Celine Nie, Qing Nie. Machine learning of swimming data via wisdom of crowd and regression analysis. Mathematical Biosciences & Engineering, 2017, 14 (2) : 511-527. doi: 10.3934/mbe.2017031

[7]

Nick Cercone, F'IEEE. What's the big deal about big data?. Big Data & Information Analytics, 2016, 1 (1) : 31-79. doi: 10.3934/bdia.2016.1.31

[8]

James H. Elder. A new training program in data analytics & visualization. Big Data & Information Analytics, 2016, 1 (1) : i-iii. doi: 10.3934/bdia.2016.1.1i

[9]

Marc Bocquet, Julien Brajard, Alberto Carrassi, Laurent Bertino. Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization. Foundations of Data Science, 2020, 2 (1) : 55-80. doi: 10.3934/fods.2020004

[10]

Pankaj Sharma, David Baglee, Jaime Campos, Erkki Jantunen. Big data collection and analysis for manufacturing organisations. Big Data & Information Analytics, 2017, 2 (2) : 127-139. doi: 10.3934/bdia.2017002

[11]

Enrico Capobianco. Born to be big: Data, graphs, and their entangled complexity. Big Data & Information Analytics, 2016, 1 (2&3) : 163-169. doi: 10.3934/bdia.2016002

[12]

Ali Asgary, Jianhong Wu. ADERSIM-IBM partnership in big data. Big Data & Information Analytics, 2016, 1 (4) : 277-278. doi: 10.3934/bdia.2016010

[13]

Xin Guo, Lei Shi. Preface of the special issue on analysis in data science: Methods and applications. Mathematical Foundations of Computing, 2020, 3 (4) : i-ii. doi: 10.3934/mfc.2020026

[14]

Weidong Bao, Wenhua Xiao, Haoran Ji, Chao Chen, Xiaomin Zhu, Jianhong Wu. Towards big data processing in clouds: An online cost-minimization approach. Big Data & Information Analytics, 2016, 1 (1) : 15-29. doi: 10.3934/bdia.2016.1.15

[15]

Prashant Shekhar, Abani Patra. Hierarchical approximations for data reduction and learning at multiple scales. Foundations of Data Science, 2020, 2 (2) : 123-154. doi: 10.3934/fods.2020008

[16]

Weihong Guo, Yifei Lou, Jing Qin, Ming Yan. IPI special issue on "mathematical/statistical approaches in data science" in the Inverse Problem and Imaging. Inverse Problems & Imaging, 2021, 15 (1) : I-I. doi: 10.3934/ipi.2021007

[17]

Roya Soltani, Seyed Jafar Sadjadi, Mona Rahnama. Artificial intelligence combined with nonlinear optimization techniques and their application for yield curve optimization. Journal of Industrial & Management Optimization, 2017, 13 (4) : 1701-1721. doi: 10.3934/jimo.2017014

[18]

Luís Tiago Paiva, Fernando A. C. C. Fontes. Sampled–data model predictive control: Adaptive time–mesh refinement algorithms and guarantees of stability. Discrete & Continuous Dynamical Systems - B, 2019, 24 (5) : 2335-2364. doi: 10.3934/dcdsb.2019098

[19]

Jian-Wu Xue, Xiao-Kun Xu, Feng Zhang. Big data dynamic compressive sensing system architecture and optimization algorithm for internet of things. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1401-1414. doi: 10.3934/dcdss.2015.8.1401

[20]

Richard Archibald, Hoang Tran. A dictionary learning algorithm for compression and reconstruction of streaming data in preset order. Discrete & Continuous Dynamical Systems - S, 2021  doi: 10.3934/dcdss.2021102

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]