Computer Science AI as Well as Machine Learning in Big Data Questions

Question 1:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The reading material this week covers the context of how artificial intelligence (AI) and  machine learning (ML) influence the capabilities of big data analytics.  Address the following in your discussion:

  • Identify and discuss two advantages of machine learning and AI in big data analytics.
  • How has the use of AI and ML impacted businesses such as Amazon, Walmart, or the travel industry?
  •  

    Question2:

    A variety of AI analytics technologies and tools exist in the market today. Review the list of current analytics tools below:

    Save Time On Research and Writing
    Hire a Pro to Write You a 100% Plagiarism-Free Paper.
    Get My Paper
  • Google AI Platform & Google Analytics
  • Microsoft Azure
  • Ayasdi
  • Watson Studio
  • Rainbird
  • Dialogflow
  • In  a paper, provide a brief summary of  AI and ML in data analytics. Select two analytic tools from the list  above and create a comparison table addressing the features and  functions their systems provide. Following your comparison, address what  a business would need to do to effectively implement and use one of  these analytics tools.

    Refer those links:

  • Melnichuk, A. (2020, January 21). How big data and AI work together. Ncube. https://ncube.com/blog/big-data-and-ai
  • Marr, B. (2016, December 6). What is the difference between artificial intelligence and machine learning? Forbes. https://www.forbes.com/sites/bernardmarr/2016/12/06/what-is-the-difference-between-artificial-intelligence-and-machine-learning/?sh=4ac57c602742
  • Hernandez, A. (2019). The best 7 free and open-source artificial intelligence software. GoodFirms. https://www.goodfirms.co/blog/best-free-open-source-Artificial-Intelligence-software
  • Big Data Analytics
    Big Data Analytics
    Applications in Business
    and Marketing
    Kiran Chaudhary and Mansaf Alam
    First edition published [2022]
    by CRC Press
    6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487–2742
    and by CRC Press
    4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
    © 2022 Taylor & Francis Group, LLC
    CRC Press is an imprint of Taylor & Francis Group, LLC
    Reasonable eforts have been made to publish reliable data and information, but the author
    and publisher cannot assume responsibility for the validity of all materials or the consequences
    of their use. Te authors and publishers have attempted to trace the copyright holders of all
    material reproduced in this publication and apologize to copyright holders if permission
    to publish in this form has not been obtained. If any copyright material has not been
    acknowledged please write and let us know so we may rectify in any future reprint.
    Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,
    reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means,
    now known or hereafter invented, including photocopying, microflming, and recording, or in
    any information storage or retrieval system, without written permission from the publishers.
    For permission to photocopy or use material electronically from this work, access www.
    copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive,
    Danvers, MA 01923, 978–750–8400. For works that are not available on CCC please contact
    mpkbookspermissions@tandf.co.uk
    Trademark notice: Product or corporate names may be trademarks or registered trademarks and
    are used only for identifcation and explanation without intent to infringe.
    Library of Congress Cataloging‑in‑Publication Data
    A catalog record for this book has been requested
    ISBN: 978-1-032-00788-5 (hbk)
    ISBN: 978-1-032-18766-2 (pbk)
    ISBN: 978-1-003-17571-1 (ebk)
    DOI: 10.1201/9781003175711
    Typeset in Garamond
    by Apex CoVantage, LLC
    Contents
    Preface���������������������������������������������������������������������������������������������������������� vii
    Editors������������������������������������������������������������������������������������������������������������ix
    Contributors���������������������������������������������������������������������������������������������������xi
    1 Embrace the Data Analytics Chase: A Journey from Basics
    to Business����������������������������������������������������������������������������������������������1
    SUZANEE MALHOTRA
    2 Big Data Analytics and Algorithms �����������������������������������������������������19
    ALOK KUMAR, LAKSHITA BHARGAVA, AND ZAMEER FATIMA
    3 Market Basket Analysis: An Efective Data-Mining Technique
    for Anticipating Consumer Purchase Behavior������������������������������������41
    SAMALA NAGARAJ
    4 Customer View—Variation in Shopping Patterns��������������������������������55
    AMBIKA N
    5 Big Data Analytics for Market Intelligence �����������������������������������������69
    MD� RASHID FAROOQI, ANUSHKA TIWARI, SANA SIDDIQUI,
    NEERAJ KUMAR, AND NAIYAR IQBAL
    6 Advancements and Challenges in Business Applications
    of SAR Images ��������������������������������������������������������������������������������������87
    PRACHI KAUSHIK AND SURAIYA JABIN
    7 Exploring Quantum Computing to Revolutionize Big Data
    Analytics for Various Industrial Sectors���������������������������������������������113
    PREETI AGARWAL AND MANSAF ALAM
    8 Evaluation of Green Degree of Reverse Logistic of Waste
    Electrical Appliances ��������������������������������������������������������������������������131
    LI QIN HU, AMIT YADAV, HONG LIU, AND RUMESH RANJAN
    v
    vi

    Contents
    9 Nonparametric Approach of Comparing Company
    Performance: A Grey Relational Analysis ������������������������������������������149
    TIHANA ŠKRINJARIĆ
    10 Applications of Big Data Analytics in Supply-Chain
    Management���������������������������������������������������������������������������������������173
    NABEELA HASAN AND MANSAF ALAM
    11 Evaluation Study of Churn Prediction Models for Business
    Intelligence�����������������������������������������������������������������������������������������201
    SHOAIB AMIN BANDAY AND SAMIYA KHAN
    12 Big Data Analytics for Marketing Intelligence ����������������������������������215
    TRIPTI PAUL AND SANDIP RAKSHIT
    13 Demystifying the Cult of Data Analytics for Consumer
    Behavior: From Insights to Applications��������������������������������������������231
    SUZANEE MALHOTRA
    Index �����������������������������������������������������������������������������������������������������������251
    Preface
    Big Data Analytics: Applications in Business and Marketing is a book that focusses
    on business and marketing analytics. Te objective of this book is to explore the
    concept and applications related to marketing and business. In addition, it also
    provides future research directions in this domain. It is an emerging feld that
    can be extended to performance management and improved business dynamics
    understanding for better decision-making. As we know, investment in business
    and marketing analytics can create value by proper allocation of resources and
    resource orchestration processes. Te use of data analytics tools can be used to diagnose and improve performance. Tis book is divided into fve parts: Introduction,
    Applications of Business Analytics, Business Intelligence, Analytics for Marketing
    Decision Making, and Digital marketing. Part I of this book discusses the introduction of data science, big data, data analytics, and so forth. Part II of this book
    focuses on applications of business analytics that include big data analytics and
    algorithm, market basket analysis, customer view—variation in shopping patterns,
    big data analytics for market intelligence, advancements and challenges in business applications of SAR images, and exploring quantum computing to revolutionize big data analytics for various industrial sectors. Part III includes a chapter
    related to business intelligence featuring an evaluation study of churn prediction
    models for business intelligence. Part IV is dedicated to analytics for marketing
    decision-making, including big data analytics for market intelligence, data analytics and consumer behavior, and the responsibility of big data analytics in organization decision-making. Part V of this book covers digital marketing and includes
    the prediction of marketing by consumer analytics, web analytics for digital marketing, smart retailing, leveraging web analytics for optimizing digital marketing
    strategies, and so forth. Tis book includes various topics related to marketing and
    business analytics, which helps the organization to increase their profts by making
    better decisions on time with the use of data analytics. Tis book is meant for students, practitioners, industry professionals, researchers, and faculty working in the
    feld of commerce and marketing, big data analytics, and comprehensive solution
    to organizational decision-making.
    Kiran Chaudhary
    Mansaf Alam
    New Delhi, India
    vii
    Editors
    Dr� Kiran Chaudhary is assistant professor in the Department of Commerce,
    Shivaji College, University of Delhi. She has 12 years of teaching research experience. She has completed a Ph.D. in marketing (commerce) from Kurukshetra
    University, Kurukshetra, Haryana. Her area of research includes marketing, the
    Cyber Security Act, big data and social media analytics, machine learning, human
    resource management, organizational behavior, business and corporate law. She
    was district topper in M. Com and among the top 10 at Kurukshetra University,
    recipient of the Radha Krishnan scholarship of Merit in M.com fnal year (2007),
    and topper with 88 % marks in fnancial management in B.Com. She has published a book on probability and statistics. She has also published several research
    articles in reputed international journals and proceedings of reputed international
    conferences. She delivered various invited talks and chaired sessions at international conferences.
    Dr� Mansaf Alam is associate professor in the Department of Computer Science,
    Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi-110025, Young
    Faculty research fellow, DeitY, Govt. of India, and editor-in-chief, Journal of Applied
    Information Science. He has published several research articles in reputed international journals and proceedings of reputed international conferences published by
    IEEE, Springer, Elsevier Science, and ACM. His area of research includes big data
    analytics, machine learning and deep learning, cloud computing, cloud database
    management system (CDBMS), object oriented database system (OODBMS),
    information retrieval and data mining. He serves as reviewer of various journals
    of international repute like Information Science, published by Elsevier Science. He
    is also a member of the program committee of various reputed international conferences. He is an editorial board member of some reputed intentional journals
    in computer sciences. He has published Digital Logic Design by PHI, Concepts of
    Multimedia by Arihant and Internet of Tings: Concepts and Applications by Springer.
    ix
    Contributors
    Preeti Agarwal
    Department of Computer Science,
    Faculty of Natural Sciences, Jamia
    Millia Islamia
    New Delhi, India
    Maharaja Agrasen
    Institute of Technology
    New Delhi, India
    Mansaf Alam
    Department of Computer Science,
    Faculty of Natural Sciences, Jamia
    Millia Islamia
    New Delhi, India
    Shoaib Amin Banday
    Department of Electronics and
    Communication Engineering,
    Islamic University of Science and
    Technology
    Awantipora, India
    Lakshita Bhargava
    Institute of Technology
    New Delhi, India
    Sandeep B�L�
    Department of Information Science
    and Engineering, M.S. Ramaiah
    Institute of Technology
    Bangalore, India
    Krishnaveer Abhishek Challa
    Andhra University
    Andra Pradesh, India
    Ifat Sabir Chaudhry
    College of Business, Al Ain University
    Al Ain, United Arab Emirates
    Kiran Chaudhary
    Shivaji College, University of Delhi
    New Delhi, India
    Tarun Krishnan Louie Antony
    Department of Information Science
    and Engineering, M.S. Ramaiah
    Institute of Technology
    Bangalore, India
    Md Rashid Farooqi
    Department of Commerce and
    Management, Maulana Azad
    National Urdu University (Central
    University)
    Hyderabad, India
    Ezeifekwuaba Tochukwu Benedict
    University of Lagos
    Lagos, Nigeria
    Zameer Fatima
    Institute of Technology
    New Delhi, India
    xi
    xii

    Big Data Analytics
    Siddhartha Ghosh
    Mohan Malaviya School of Commerce
    and Management Sciences,
    Mahatma Gandhi Central
    University
    Bihar, India
    Siddesh G�M�
    Department of Information Science
    and Engineering, M.S. Ramaiah
    Institute of Technology
    Bangalore, India
    Nabeela Hasan
    Department of Computer Science,
    Jamia Millia Islamia
    Delhi, India
    Li Qin Hu
    Department of Information
    Management, Chengdu Neusoft
    University
    Chengdu, China
    Suraiya Jabin
    Department of Computer Science,
    Faculty of Natural Sciences, Jamia
    Millia Islamia
    New Delhi, India
    C�C� Jayasundara
    University of Kelaniya
    Colombo, Sri Lanka
    Pankaj Kakati
    Department of Mathematics
    Jagannath Barooah College
    Jorhat, India
    Prachi Kaushik
    Department of Computer Science,
    Faculty of Natural Sciences, Jamia
    Millia Islamia
    New Delhi, India
    Samiya Khan
    School of Mathematics and
    Computer Science, University of
    Wolverhampton
    Wolverhampton, United Kingdom
    Alok Kumar
    Institute of Technology
    New Delhi, India
    Neeraj Kumar
    Department of Business Management,
    L.N. Mishra College
    Muzafarpur Bihar, India
    Pavnesh Kumar
    Mohan Malaviya School of Commerce
    and Management Sciences,
    Mahatma Gandhi Central
    University
    Bihar, India
    Hong Liu
    Department of Human Resource,
    Chengdu University of Technology
    Chengdu, China
    Suzanee Malhotra
    Shaheed Bhagat Singh Evening College,
    University of Delhi Sheikh Sarai,
    New Delhi, India
    Venkata Rajasekhar Moturu
    Indian Institute of Management
    Visakhapatnam, India
    Farooq Mughal
    School of Management, University of
    Bath
    Bath, United Kingdom
    Ambika N�
    St. Francis College
    Bangalore, India
    Contributors 
    Samala Nagaraj
    Woxsen University
    Hyderabad, India
    Srinivas Dinakar Nethi
    Indian Institute of Management
    Visakhapatnam, India
    Ghanshyam Parmar
    Constituent College of CVM University:
    Natubhai V. Patel College of Pure
    and Applied Sciences
    Anand, India
    Tripti Paul
    Indian Institute of Technology (Indian
    School of Mines)
    Dhanbad, India
    S�R� Mani Sekhar
    Department of Information Science
    and Engineering, M.S. Ramaiah
    Institute of Technology
    Bangalore, India
    Sana Siddiqui
    Department of Computer Science,
    Jamia Millia Islamia
    New Delhi, India
    Tihana Škrinjarić
    University of Zagreb
    Zagreb, Croatia
    Sapna Sood
    Accenture
    Dublin, Ireland
    Saifur Rahman
    Department of Mathematics, Rajiv
    Gandhi University
    Itangar, India
    Anushka Tiwari
    Department of Computer Science,
    Jamia Millia Islamia
    New Delhi, India
    Sandip Rakshit
    American University of Nigeria
    Yola, Nigeria
    Muhammad Nawaz Tunio
    Alpen Adria University
    Klagenfurt, Austria
    Rumesh Ranjan
    Department of Plant Breeding and
    Genetics, Punjab Agriculture
    University
    Punjab, India
    Amit Yadav
    Department of Information and
    Software Engineering, Chengdu
    Neusoft University
    Chengdu, China
    xiii
    Chapter 1
    Embrace the Data
    Analytics Chase: A
    Journey from Basics
    to Business
    Suzanee Malhotra
    Contents
    1.1 Overview…………………………………………………………………………………………2
    1.1.1 Data Science ………………………………………………………………………….2
    1.1.2 Big Data ……………………………………………………………………………….2
    1.1.3 Data Science vs. Big Data ………………………………………………………..3
    1.2 Data Analytics …………………………………………………………………………………4
    1.2.1 Relationship Among Big Data, Data Science, and Data Analytics….4
    1.2.2 Types of Data Analytics…………………………………………………………..4
    1.2.2.1 Descriptive Analytics………………………………………………….5
    1.2.2.2 Diagnostic Analytics…………………………………………………..6
    1.2.2.3 Predictive Analytics ……………………………………………………6
    1.2.2.4 Prescriptive Analytics………………………………………………….6
    1.3 Business Data Analytics …………………………………………………………………….7
    1.3.1 Applications of Data Analytics in Business …………………………………8
    1.4 Data Mining, Data Warehouse Management,
    and Data Visualization…………………………………………………………………….10
    1.4.1 Data Mining………………………………………………………………………..10
    DOI: 10.1201/9781003175711-1
    1
    2

    Big Data Analytics
    1.4.2 Data Warehouse Management ………………………………………………..10
    1.4.3 Data Visualization ………………………………………………………………..11
    1.5 Insights in Action: Gains from Insights Generated out of Data Analytics ..11
    1.6 Machine Learning and Artifcial Intelligence ………………………………………12
    1.7 Course of the Book …………………………………………………………………………13
    References ……………………………………………………………………………………………..14
    1.1 Overview
    Te coming age of business has introduced new terminologies in the business dictionary, some of which add ‘data science’, ‘big data’, ‘analytics’, and many more
    puzzling terms to the list. With the ‘data’ coming to the center stage of business,
    data collection, data storage, data processing, and data analytics have all become
    felds in themselves. Further, novel data keeps on adding to the previous data sets at
    humungous speeds. With rapid advances at the front of business, companies place
    data on the same pedestal as the other corporate assets, for it ofers the potential
    and capabilities to derive many important fndings. Te sections following provide
    us with the meanings of data science and big data and a comparison of the two.
    1.1.1 Data Science
    With the data and data-related processes becoming more and more worthy, data
    science has become the need of the hour. Data science refers to scientifc management of data and data-related processes, techniques, and skills used to derive viable
    information, fndings and knowledge from the data belonging to various felds
    (Dhar 2013). It is a complex term that deals with collection, extraction, purifcation, manipulation, enumeration, tabulation, combination, examination, interpretation, simulation, visualization, and other such processes applied to data (Provost
    and Fawcett 2013). Te various processes and techniques applied to data are derived
    from many diferent disciplines like computer science, mathematics, and statistical
    analysis (Dhar 2013). But it is not only limited to these disciplines and fnds equal
    and substantial application in the felds of national defense and safety, medical
    science, architectonics, social science areas, and business management areas like
    marketing, production, fnance, and even training and development (Provost and
    Fawcett 2013). In simple terms, data science is an all-encompassing term for tools
    and methods to derive insightful information from the data.
    1.1.2 Big Data
    Big data is often termed as “high volume, high variety and high velocity” data
    (McAfee and Brynjolfsson 2012). Big data is known as the enormous repository
    of data garnered by organizations from a variety of sources like smartphones
    Embrace the Data Analytics Chase
     3
    and other multimedia devices, mobile applications, geological location tracking
    devices, remote sensing and radio-wave reading devices, wireless sensing devices,
    and other similar sources (Yin and Kaynak 2015). Te global research and advisory
    frm Gartner considers “big data as high-volume, and high velocity or high-variety
    information assets that demand cost-efective, innovative forms of information
    processing that enable enhanced insight, decision making, and process automation” (Gartner Inc. 2021). Many organizations add another ‘v’, that is, veracity, to
    the defnition of big data (Yin and Kaynak 2015). Big data represents the important
    and huge amount of data not amenable to traditional data-processing tools but with
    the potential to guide businesses to strategic decision-making from the important
    insights derived from it (Khan et al. 2017). Big data is categorized into structured,
    unstructured or semistructured types of data sets (McAfee and Brynjolfsson 2012).
    Structured data refers to well-organised and systematic data (like that once stored
    in DBMS software). Te data that is simply stored in the raw version (like analogue
    data generated from a seismometer) without any systematic order or structure is
    known as unstructured data (Alam 2012b). In between these two lies semistructured
    data, where some part of data is unstructured and some structured (like data stored
    in XML or HTML formats).
    Other types of data sets can be categorised on the basis of the time, viz., historical (or past information data) or current (novel and most-recently collected
    information data). On the basis of the source of data collection, data sets can be
    categorised as frst‑party data (collected by the company directly from their consumers), second‑party data (purchased from another organization) and third‑party
    data (the composite data obtained from a market square). Organizations often keep
    a customized and dedicated software for storage of big data, from which it can be
    easily put to computation and analysis to discover insightful trends from data in
    relation to various stakeholders.
    1.1.3 Data Science vs. Big Data
    With a basic understanding of these two data-revolutionizing ideas, let’s explain the
    boundaries separating these two.
    Data science is an extended domain of knowledge, composed of various disciplines like computers, mathematics, and statistics. Contrastingly, big data is a
    varied pool of data from varied sources so huge in volume that it requires special treatment. Big data can be everything and anything, from content choices to
    ad inclinations, search results or browsing history, purchasing-pattern trends, and
    much more (Khan et al. 2015). Data science provides a number of ways to deal
    with big data and compress it into feasible sets for further analysis. Data science
    is a superset that provides for both theoretical and practical aid to data sorting,
    cleaning and churning out of the subset big data for the purpose of deriving useful
    insights from it. If big data is the big Pandora’s box waiting to be discovered, then
    data science is the tool in the hands of an organization to do such honours. Tus,
    4

    Big Data Analytics
    one can say that, if data science is an area of study, then big data is the pool of data
    to be studied under that area of study.
    After explaining these two upcoming concepts of both data science and big
    data, now let us turn our focus to the understanding of data analytics and its related
    concepts.
    1.2 Data Analytics
    Data analytics is the application of algorithmic techniques and methods or code
    language to big data or sets of it to derive useful and pertinent conclusions from it
    (Aalst 2016). Tus, when one uses the analytical part of data science on big data or
    raw data in order to derive meaningful insights and information, it is called data
    analytics. It has gained a lot of attention and practical application across industries
    for strategic decision-making, theory building, theory testing, and theory disproving. Te thrust of data analytics is on the inferential conclusions that are arrived
    at after computation of analytical algorithms. Data analytics involves manipulation of big data to obtain contextual meanings through which business strategies
    can be formulated. Organizations use a blend of machine-learning algos, artifcial intelligence, and other systems or tools for data-analytics tasks for insightful
    decision-making, creative strategy planning, serving consumers in the best manner, and improving performance to fre up their revenues by ensuring sustainable
    bottom lines.
    1.2.1 Relationship Among Big Data, Data
    Science, and Data Analytics
    Data, defned as a collection of facts and bits of information, is nothing novel to
    organizations, but its importance and relevance has acquired a novel pedestal in the
    current times. With global data generation growing at the speed of zetta and exabytes, it has indeed become an integral part of the business-management domain.
    Dealing with a mass of data existing in many folds of layers and cutting across
    many domains is the common link connecting data science, big data, and data
    analytics. Table 1.1 summarizes the interconnected relationship among big data,
    data science, and data analytics.
    1.2.2 Types of Data Analytics
    It is vital to get a clear understanding of the diferent variants of data analytics available so as to leverage the stack of data for material benefts. Te four variants of data
    analytics are descriptive, diagnostic, predictive, and prescriptive. Te data analytics
    type is given in Figure 1.1. A combined usage of the diferent variants of data analytics and their corresponding tools and systems adds clarity to the puzzle—where
    Embrace the Data Analytics Chase
     5
    Table 1.1 Interconnected Relationship among Big Data, Data Science, and
    Data Analytics
    Big Data →
    Big data is humungous
    in volume, value, and
    variated data gathered
    from different sources,
    requiring further
    dissection and
    polishing using data
    science and data
    analytics for important
    inferences to be
    derived from it.
    Data Science →
    Data Analytics
    Data science refers to a
    multidisciplinary feld
    that involves collection,
    mining, manipulation,
    management, storage,
    and handling of the big
    data for smooth
    utilization and analysis
    of data.
    Data analytics is an
    approach to derive
    trends and conclusions
    from the chunks of
    processed big data as
    made available after the
    initial mining and
    management processes
    run under the domain
    of data sciences for
    revealing intriguing and
    infuential insights
    amenable to practical
    application.
    Descriptive
    Analytics
    Prescriptive
    analytics
    Types of
    Data
    Analytics
    Diagnostic
    Analytics
    Predictive
    Analytics
    Figure 1.1 Types of Data Analytics.
    the frm is standing and the journey to where it can reach by achieving its goals. A
    discussion regarding the four types is provided in the following paragraphs.
    1.2.2.1 Descriptive Analytics
    As the name suggests, descriptive analysis describes the data in a manner that is
    orderly, logical, and consistent (Sun, Strang and Firmin 2017). It simply answers
    the question of ‘what the data shows’. It is further used by all the other types of data
    6

    Big Data Analytics
    analytics to make sense of the complete data. Descriptive analytics collates data,
    performs number crunching on it, and present the results in visual reports. Serving
    as the primary layer of data analytics, it is most widely used across all felds from
    healthcare to marketing to banking or fnance. Te tools and methods applied in
    the process of descriptive analytics present the data in a summarized form. Te data
    collated from a consumers’ mailing records, describing their mail ID, name, and
    contact details, is an example of it.
    1.2.2.2 Diagnostic Analytics
    As suggested by the name, diagnostic analytics looks into the reasons or causes of
    any event or happening and supplements the fndings of the descriptive analytics
    (Aalst 2016). It simply answers the question ‘why or what led to any specifc event?’
    by delving into the facts to direct the future course of planning. It aims at frst
    diagnosing the problems out of the data sets and then dissecting the reasons behind
    the problems by using techniques like regression or probability analysis. Such a
    type of analytics is widely used across felds like medicine to diagnose the cause of
    the problems, marketing to know the specifc reasons behind consumer behavior,
    or even in the fnance area to know the cause behind an investment decision. For
    example, when diagnostic analytics is applied in the area of human resource, it can
    provide important details like the reasons behind employee performance or which
    kind of training and development programs improve employee efciency.
    1.2.2.3 Predictive Analytics
    As suggested by the name, predictive analytics aims to predict or prognose what
    could happen in the future (Sun, Strang and Firmin 2017). It simply answers the
    question ‘what events could unfold in future, or what events could fare up?’ One of
    the key features of business is staying ahead of others, and predictive analytics help
    business frms in maintaining the lead ahead of others by foreseeing what can happen in the future along with some probabilities. Within the available data sets, predictive analytics search for certain patterns or trends for events that could pan out in
    the future, followed by estimating the probabilities for the events that panned out. It
    provides predictive insights in areas of retailing and commerce for rolling out products aligned with consumer preferences, stock markets for predicting future stock
    prices, and even project appraisal areas for forecasting the risks posed. Tere is no
    surety of these estimated probabilities fructifying into realities, but still the attained
    information at hand is better for the business than moving forward in a dark alley.
    1.2.2.4 Prescriptive Analytics
    As the name suggests, prescriptive analytics prescribes a course of action to be adopted
    by the frm (Sun, Strang and Firmin 2017). It simply answers the question of what
    Embrace the Data Analytics Chase
     7
    the frm should do in the future. Descriptive analytics describes a scenario, diagnostic analytics identifes the important issues of the scenario, predictive analytics predicts what surprises the future holds, but it is the prescriptive analytics that
    fnally guides a business frm through those events. While prescriptive analytics may
    suggest to grab hold of the strengthening opportunities, the fndings may also help a
    frm to ward-of any danger that it may face by stepping into scenarios that could be
    threatening to the frm. It can be leveraged for use across felds like business management for budget preparation or inventory management, in healthcare for prescribing
    suitable treatment, or in construction activities for streamlining operations.
    Data analytics has found a place in many felds, from life-saving medicine and
    surgery (Kaur and Alam 2013) to money-making and fnance, from administering government and public works to controlling money supply and banking, from
    the nation-building education sector (Khan, Shakil and Alam 2016, 2019; Khan
    et al. 2019; Khanna, Singh and Alam 2016) to entertaining media and hospitality,
    from automated manufacturing to self-driven cars and trucks, which are a gift of
    artifcial intelligence. Across all the felds, data analytics has made core contributions and is continuing to make further improvements on the road ahead (Syed,
    Afan and Alam 2019). One such area of utilization of data analytics is the business
    domain, and business data analytics has become a feld of its own. Let us understand the intricacies of the business data analytics in the sections that follow.
    1.3 Business Data Analytics
    With the clumping of data in each nanosecond, the working of business institutions has drastically seen a reversal. Tough ‘data’ is considered a business asset in
    current times, what would a clump of data do itself; what beneft would it yield on
    its own; would the numbers or the bit language of 0s and 1s lead to any amenable
    change in the existing company position and turnover?
    A clear-cut understanding and know-how of the ‘whys and why nots’ that one
    wants the data sets to answer can help the business frms to dive for precious pearls.
    Teir discovery can indeed provide mileage to the frms in proftability, revenue
    generation, and productivity. Business analytics involves the application of varying
    data analytics tools, techniques, and systems to a big-data pool to derive intriguing
    insights, simulation models, strategizing decisions, and tactical plans (Christian
    and Winston 2015). A proper and channelized utilization of analytics in business
    can help the frms to face the future hiccups in operating the business in the pushing environment. Tose frms who miss out on tapping the benefts ofered by the
    analytics at play in business loose tons of add-on value compared to their peers
    (Amankwah-Amoah and Adomako 2019).
    Te power of business analytics is not restricted to decision-making only, but
    many withering industries and frms do seem to apply the power of analytics in
    industrial, business, and processes reengineering. Due to this, many companies
    8

    Big Data Analytics
    have recently changed their orientation and approach toward data collection, storage, maintenance, and manipulation. From exploration to new discoveries out of
    big data (Khan, Shakil and Alam 2017), the quantitative tools are applied to make
    progressive traction in the business growth curve.
    Business analytics refers to the deployment of statistical, mathematical, and
    computing tools (Khan, Shakil and Alam 2018; Kumar et al. 2018; Shakil
    and Alam 2018), techniques, or systems on the big-data pool for discovering,
    simulation, examination, extrapolation, interpretation, and communication of
    the insightful results with the business executives for formidable execution and
    preparation (LaValle et al. 2011). Business data analytics ofer plenty of realworld solutions across multiple business domains. Using the power of question
    and intuition, a perfect know-how of computing and statistics leveraged along
    with trending technologies provides solutions to many hard-hitting issues and
    problems.
    1.3.1 Applications of Data Analytics in Business
    With daily additions to the existing data pile, the use of data analytics in the business domain is cutting across thresholds, ofering novel opportunities to be grabbed
    and threats to be warded of for the business frms. Te correct approach used by
    business frms to exploit the merits of data analytics can afect the strengths and
    weaknesses of the frms in competitive markets. An index list of business-data analytics is presented in Table 1.2, which presents the contributions of analytics in the
    world of business, showcasing the exponential relevance of analytics in this sector
    more than ever before.
    Te wide applications of big data analytics (Alam and Shakil 2016; Khan,
    Shakil and Alam 2018; Malhotra et al. 2017) are capable of making critical contributions to many diferent felds and arenas, ofering potential competitive edges to
    move forward. Along with the ‘buzz’ of the concepts like ‘data science’, ‘big data’,
    Table 1.2 Applications of Data Analytics in Business
    Production and
    Inventory
    Management
    • In product development for gaining knowledge about
    consumer needs and wants, preferences, and the latest
    trends
    • In supply chain management for keeping fow of
    inbound logistics
    • In inventory management for maintaining economic
    order quantity, just-in-time purchases, and ABC
    analysis of stock items
    • In production process for seeking productive effciency
    gains from the resources put to use
    Embrace the Data Analytics Chase
     9
    Sales and
    Operations
    Management
    • In retail-sales management for product shelf display
    and replenishment, running special discount sales and
    loyalty programs
    • In outbound logistics to ensure proper physical
    distribution to different business locations
    • In warehouse and storage management for maintaining
    proper upkeep and ready-to-serve features
    Price Setting and
    Optimization
    • In price determination of goods and services, for
    analysis of the indicators like factor input costs,
    competitors’ price-lists, price elasticity trends, etc.
    • In tax and duty adjustments regarding different duties,
    levies and taxes, computations, and calculations
    • In determining features like discounts, rebates, special
    prices or coupons
    • In optimization of input costs and overhead costs for
    maintaining sustainable proftability
    Finance and
    Investment
    • In the stock market to track stock performance, future
    trend, and company’s future earning potential
    • In capital budgeting decisions for making investment
    decisions, dividend decisions, or determining the
    valuation of a frm
    • In investment banking for the tasks of lead book
    running, arriving at mergers, and amalgamations
    decisions
    • In credit rating generation, fnancial fraud detection or
    prevention, portfolio creation, management or
    diversifcation
    Marketing
    Research
    • In segmenting, targeting, and positioning strategy
    formulating
    • For the search-engine optimization process, to return
    the best and relevant results from search queries run
    in real time
    • In advertising from the idea conceptualization to
    content creation and designing of banners or
    billboards or directing the advertisement
    • In creating a recommendation system in this era of
    ecommerce so that products or services reach the
    appropriate and targeted audiences
    • In consumer-relationship building activities by
    maintaining close links and contacts with consumers,
    for personalized marketing activities for brand loyalty,
    and to constantly better the business in providing
    memorable consumer experiences
    (Continued )
    10

    Big Data Analytics
    Table 1.2 (Continued)
    Human
    Resource
    Management
    • In recruitment and selection for conducting
    background checks, screening candidates, and calling
    eligible candidates for interviews
    • In training and development schemes for building and
    polishing the skills that employees lack or for the
    infusion of new skills as per trending needs
    • In compensation management for successful
    motivation, retention, and satisfaction of employees by
    giving them a good mix of both pecuniary and
    nonpecuniary motives
    • In performance appraisal for seeking information
    regarding employee promotion and transfers, career
    development, and attrition rate
    ‘data analytics’, and ‘business data analytics’, other terminologies like ‘data mining’, ‘data warehouse’, and ‘data visualization’ have come to the fore. Let us explain
    them now.
    1.4 Data Mining, Data Warehouse Management,
    and Data Visualization
    1.4.1 Data Mining
    Every diamond, before gleaming on a beautiful fnger, requires polishing. In a
    similar analogy, data needs to be polished and refned before yielding intriguing
    insights. Tis useful service is what data mining does. Data mining is one of the
    frst steps of the systematic process of big data analytics. It is described as the process of drawing out the data from varied raw data sources like databases (Alam
    2012a), email or spam fltering, or consumer surveys (Tan, Steinbach and Kumar
    2014). Te tasks of extraction, transformation, and loading of data (ETL) are key
    composites of the data-mining process (Ge et al. 2017). Tese simple tasks help to
    deduce usable data sets in a proper format for further data analysis and maintenance of a data repository. Data mining is one of the most integral but strenuous
    tasks in the whole data analytics process.
    1.4.2 Data Warehouse Management
    Maintenance of a data repository is essential for proper and well-managed data
    storage (Shakil et al. 2018). It is termed data management or data warehouse man‑
    agement in the process of data analytics (Santoso and Yulia 2017). Data warehouse
    Embrace the Data Analytics Chase
     11
    management involves a well-planned and structured database designed (Malhotra
    et al. 2018) to have straightforward and simplifed access to data for data manipulation or future reference (Agapito, Zucco and Cannataro 2020). Te simplistic form
    of the maintained data warehouse is known as a data mart (Mbala and Poll 2020).
    1.4.3 Data Visualization
    It’s always said a picture explains better than a thousand words. Tis is so in the
    case of data analytics, where data presentation or data visualization is capable of
    independently summarizing tones of data in visually appealing forms to important
    stakeholders (Ge et al. 2017). Efective and reasonable data visualization forms or
    charts can narrate the core of the data meaning and give important insights to
    all the decision-making executives (Tan, Steinbach and Kumar 2014). It involves
    usage of charticle graphs or captivating diagrams or simple tabular forms to represent all forms of data types, aiding in quicker data-analytics understanding.
    1.5 Insights in Action: Gains from Insights
    Generated out of Data Analytics
    In this digital age where consumers keep on expressing their preferences at a click
    or tap, each of their clicks or taps speaks volumes about useful insights. Tat is to
    say, every tap or click refects usable information for the business frms and thus
    becomes potential data for business analytics. It can yield important information
    like the picture of the segmented or target market or how to position the brand
    message in a specifc segment or target market. Even the consumer likes, comments, or reviews can serve as usable data sources. By tapping the data regarding
    a consumer’s likes or comments, the marketer can metaform an understanding
    regarding the demographic or psychographic picture of them and use the generated
    insights to hone future consumer experiences or pass on the insightful knowledge
    to other advertisers for better consumer connect.
    Te latest Apple iPhone 12 provides the vivid application of data analytics into an actionable product development. Sensing that the age-old competitor
    like Samsung and upcoming rivals like Realme, Oppo, and Vivo were capturing
    a larger market share on the grounds of improved camera features with the added
    advantage of night-mode for dim-light pictures, Apple looked at the consumer data
    along with churning the data regarding demographic, psychographic, and behavioral segmentation to deliver the most advanced version of the iPhone loaded with
    features like a fast bionic processing chip, fabulous retina XDR display, protective
    ceramic shield, perfect Dolby vision for video recording, and advanced night mode
    for all cameras. It indeed indicates the power of data analytics, which help the business frms in bettering their products and services to cut through the competition.
    12

    Big Data Analytics
    Two important helping hands in the growth and prevalence of big data and
    data analytics are machine learning and artifcial intelligence, which are discussed
    in the sections ahead.
    1.6 Machine Learning and Artifcial Intelligence
    In a 2020 Netfix Korean drama called Start‑up, the lead couple were depicted
    having a conversation regarding the meaning of ‘machine learning’. Te female
    lead had no clue about it, and the male lead drew an analogy from the characters
    of ‘Tarzan’ and ‘Jane’ from the famous Disney flm Tarzan, where Tarzan, with
    no previous human encounter (especially from the opposite sex), being in a jungle,
    learns by and by what things make Jane happy. Similarly, the lead hero explained
    that, in machine learning, the computer learns from the data by and by to perform
    operations and present results, making its users happy.
    Machine learning is defned as “the machine’s ability to keep improving its performance without humans having to explain exactly how to accomplish all the tasks
    it’s given” (Brynjolfsson and Mcafee 2017, 2). Tus, when a machine learns to perform some functions on its own, barring the need for overt programming, to meliorate the user experience, it is referred to as machine learning (Canhoto and Clear
    2020; Kibria et al. 2018). In machine learning, an attempt is made to understand the
    computer algorithms (Alam, Sethi and Shakil 2015) that further let the computer
    programs automatically improve via continuous experiences (Mitchell 1997).
    One practical application of machine learning, utilized by the music-streaming
    apps like Spotify or Gaana.com, is corresponding the user’s music preferences with
    the music composition details, like the singer or genre information, to automatize
    likely recommendations for the user in the future (Le 2018). Similarly, in the medical
    feld machine learning can automatize the x-ray machines with respect to the patterns
    emerging out of the x-ray images for aiding some medical analysis (Iriondo 2020).
    Machine learning is of three types, viz., supervised (where the data analysis
    groups the output under already labelled patterns), unsupervised (where the data
    analysis groups the output under novel patterns in an unlabelled manner) and
    reinforcement (where the data analysis happens by constantly taking cues from
    the environment while constantly learning to extrapolate for new outputs) (Fumo
    2017). With the abilities and advances ofered by machine learning, it has really
    become a ‘dazzlingly magical buzzword’ in the business domain (Stanford, Iriondo
    and Shukla 2020).
    A cinematic delight of director Steven Spielberg, A.I. Artifcial Intelligence
    beautifully puts forth the meaning and domain of Artifcial Intelligence, popularly dubbed as AI, where an 11-year-old boy, appearing so real with real love-like
    emotions, happens to be a robot. His journey leads to discovery of a new meaning for audiences at large. Five decades back, with the inception of chess-playing
    computer programs, AI came to the forefront (Brynjolfsson and Mcafee 2017).
    Embrace the Data Analytics Chase
     13
    However, recently it has acquired a new meaning with changing times and technology (Iriondo 2020).
    Te term ‘artifcial intelligence’ means a human-made manner of doing or understanding things and carrying out operations in a system (Kibria et al. 2018). Tus,
    when human-like intelligence is added to machines or computers for performing
    functions or activities, it is termed artifcial intelligence or AI (Canhoto and Clear
    2020; Iriondo 2020). Andrew Moore, once dean at Carnegie Mellon University, has
    considered AI as “the science and engineering of making computers behave in ways
    that, until recently, we thought required human intelligence” (High 2017, 4).
    Business frms are now actively using both machine learning and AI to collect
    consumer data to strive to improve their brand experiences in the future (Canhoto
    and Clear 2020). While machine learning is a step toward AI (Mitchell 1997), the
    domain of AI is far- and wide-ranging (Kibria et al. 2018). By studying the patterns
    of big-data sets, new trends and subtle details can be explored for actuating strategies (Brynjolfsson and Mcafee 2017).
    Te recent gadgets like Siri and Alexa, coupled with human-like skills, are revolutionizing the AI industry, which further pulls the strings for app development
    and content creation. Siri and Alexa have now become human-like personal assistants aiding the humans with providing data for brand building (Brynjolfsson and
    Mcafee 2017; Iriondo 2020).
    While AI makes a computer do smart work solving multiplex issues with
    human-like intelligence (Kibria et al. 2018), machine learning analyses the data
    patterns to automatize the functions, boosting efciency and efectiveness (Han et
    al. 2017). AI runs on the key theme of spontaneity, and machine learning broadly
    runs on premeditated algorithms. However, both serve as important decision tools
    for business strategy formulation. One can certainly agree that, with the continuing technological pace, sometime in the future today’s revered Siri and Alexa may
    become obsolete like chess-playing programs, and many new things further are
    waiting to be unfolded in the tech-savvy future (High 2017; Iriondo 2020).
    1.7 Course of the Book
    With the changing times, ‘analytics’ is occupying the center stage in the business
    world. Te key actors playing an infuential role for the business frms to embrace
    these changing times are ‘big data’, ‘data science’, and ‘data analytics’. Tis book
    provides a route into these domains, with a special focus from a marketing perspective. Te book focusses on exploring these data-centered concepts and their application from marketing, business, and research angles. Te Linkages among Big Data,
    Data Science, and Data Analytics is given in Figure 1.2.
    Initial parts of the book provide a conceptual understanding of the contemporary business problems encountered by organizations, big-data analytics and related
    algorithms, the data mining process, and others. From the conceptual, progress is
    14

    Big Data Analytics
    Big Data
    Data
    Science
    Data
    Analycs
    Figure 1.2 Linkages among Big Data, Data Science, and Data Analytics.
    made toward the erupting complexities surfacing in the globalization era and how
    the big-data management approach of businesses can provide unconventional aid in
    the decision-making of the business world. Tis is followed by a discussion for the
    role of big data in contributing intelligent inputs for project life cycle management,
    decision support systems, and performance management and monitoring. Te roles
    of big-data intelligence and analytics in strategic decisions like supply-chain management, planning, and organizing are further discussed.
    Ten the course of discussion trends toward the helping hand of analytics lent
    in the marketing domain specifcally. Te marketing intelligence analysis derived
    from the data analytics used in diferent marketing decisions and strategies like
    designing marketing mix, value delivery, product life cycle decisions, understanding consumer behavior and decision-making, and making strategic product and
    service decisions are discussed is length and in depth. Te application of analytics
    in the digital and online marketing domain is covered next. Ten the patterns
    emerging from online marketing, predicting trends from consumer analytics, webanalytics trends, and the usage of marketing intelligence for optimization of marketing eforts is discussed for deriving useful insights, coupled with smart retailing
    and advertising trends.
    So, brace yourself, readers, for we are going to take you all through an insightful
    and intriguing journey driven by the knowledge and understanding of the buzz of
    the hour – ‘data analytics’ in the marketing and business world.
    References
    Aalst, Will van der. 2016. Process Mining: Data Science in Action. Heidelberg: Springer.
    Agapito, Giuseppe, Chiara Zucco, and Mario Cannataro. 2020. “COVID-warehouse: A
    Data Warehouse of Italian COVID-19, Pollution, and Climate Data.” International
    Journal of Environmental Research and Public Health 17, no. 5596, 1–22.
    Embrace the Data Analytics Chase
     15
    Alam, Mansaf. 2012a. “Cloud Algebra for Cloud Database Management System.” Te
    Second International Conference on Computational Science, Engineering and Information
    Technology (CCSEIT‑2012). Coimbatore, India: ACM, 26–28.
    Alam, Mansaf. 2012b. “Cloud Algebra for Handling Unstructured Data in Cloud
    Database Management System.” International Journal on Cloud Computing: Services
    and Architecture (IJCCSA) 2, no. 6, 2231–5853 [Online]; 2231–6663 [Print]. https://
    doi.org/10.5121/ijccsa.2012.2603
    Alam, Mansaf, Shuchi Sethi, and Kashish Ara Shakil. 2015. “Distributed Machine Learning
    Based Biocloud Prototype.” International Journal of Applied Engineering Research 10,
    no. 17, 37578–37583.
    Alam, Mansaf, and Kashish Ara Shakil. 2016. “Big Data Analytics in Cloud Environment
    Using Hadoop.” International Conferences on Mathematics, Physics & Allied Sciences.
    Goa, India: ICMPAS.
    Albright, S. Christian, and Wayne L. Winston. 2015. Business Analytics: Data Analysis and
    Decision Making. Stamford, CT: Cengage.
    Amankwah-Amoah, Joseph, and Samuel Adomako. 2019. “Big Data Analytics and Business
    Failures in Data-Rich Environments: An Organizing Framework.” Computers in
    Industry 105, 204–212.
    Brynjolfsson, Erik, and Andrew Mcafee. 2017. “Te Business of Artifcial Intelligence.”
    Harvard Business Review 7, 3–11.
    Canhoto, Ana Isabel, and Fintan Clear. 2020. “Artifcial Intelligence and Machine Learning
    as Business Tools: A Framework for Diagnosing Value Destruction Potential.” Business
    Horizons 63, no. 2, 183–193.
    Dhar, Vasant. 2013. “Data Science and Prediction.” Communications of the ACM 56, no.
    12, 64–73.
    Fumo, David. 2017. “Types of Machine Learning Algorithms You Should Know.” Towards
    Data Science, June 15. Accessed December 28, 2020. https://towardsdatascience.com/
    types-of-machine-learning-algorithms-you-should-know-953a08248861.
    Gartner Inc. 2021. Gartner Glossary, Information Technology. Accessed January 28, 2021.
    www.gartner.com/en/information-technology/glossary/big data.
    Ge, Zhiqiang, Zhihuan Song, Steven X. Ding, and Biao Huang. 2017. “Data Mining and
    Analytics in the Process Industry: Te Role of Machine Learning.” Ieee Access 5,
    20590–20616.
    Han, Shuangfeng, I. Chih-Lin, Gang Li, Sen Wang, and Qi Sun. 2017. “Big Data Enabled
    Mobile Network Design for 5G and Beyond.” IEEE Communications Magazine 55
    no. 9, 150–157.
    High, Peter. 2017. “Carnegie Mellon Dean of Computer Science on the Future of AI.”
    Forbes, October 30. Accessed December 28, 2020. www.forbes.com/sites/peterhigh/2017/10/30/carnegie-mellon-dean-of-computer-science-on-the-future-ofai/?sh=4d9a1a8c2197.
    Iriondo, Roberto. 2020. “Machine Learning (ML) vs. Artifcial Intelligence (AI)—Crucial
    Diferences.” Medium, November 12. Accessed December 28, 2020. https://medium.
    com/towards-artifcial-intelligence/diferences-between-ai-and-machine-learningand-why-it-matters-1255b182fc6.
    Kaur, Aankita, and Mansaf Alam. 2013. “Role of Knowledge Engineering in the
    Development of a Hybrid Knowledge Based Medical Information System for Atrial
    Fibrillation.” American Journal of Industrial and Business Management 3, no. 1, 36–41.
    https://doi.org/10.4236/ajibm.
    16 
    Big Data Analytics
    Khan, Imran, Shane Kazim Naqvi, Mansaf Alam, and S.N.A. Rizvi. 2015. “Data Model for
    Big Data in Cloud Environment.” 2015 2nd International Conference on Computing
    for Sustainable Global Development (INDIACom). New Delhi, India: IEEE, 582–585.
    Khan, Samiya, Xiufeng Liu, Kashish Ara Shakil, and Mansaf Alam. 2017. “A Survey on
    Scholarly Data: From Big Data Perspective.” Information Processing & Management
    53, no. 4, 923–944.
    Khan, Samiya, Xiufeng Liu, Kashish Ara Shakil, and Mansaf Alam. 2019. “Big Data
    Technology—Enabled Analytical Solution for Quality Assessment of Higher Education
    Systems.” International Journal of Advanced Computer Science and Applications
    (IJACSA) 10, no. 6, 292–304. https://doi.org/10.14569/IJACSA.2019.0100640.
    Khan, Samiya, Kashish Ara Shakil, and Mansaf Alam. 2016. “Educational Intelligence:
    Applying Cloud-based Big Data Analytics to the Indian Education Sector.” 2016 2nd
    International Conference on Contemporary Computing and Informatics (IC3I). Noida,
    India: IEEE, 29–34.
    Khan, Samiya, Kashish Ara Shakil, and Mansaf Alam. 2017. “Big Data Computing Using
    Cloud-Based Technologies: Challenges and Future Perspectives.” In: Mahmoud
    Elkhodr, Qusay F. Hassan, and Seyed Shahrestani (eds.), Networks of the Future:
    Architectures, Technologies and Implementations. London: Chapman and Hall.
    Khan, Samiya, Kashish Ara Shakil, and Mansaf Alam. 2018. “Cloud-Based Big Data
    Analytics—A Survey of Current Research and Future Directions.” In: V.B. Aggarwal,
    Vasudha Bhatnagar, and Durgesh Kumar Mishra (eds.), Big Data Analytics.
    Advances in Intelligent Systems and Computing 654. Singapore: Springer. https://doi.
    org/10.1007/978-981-10-6620-7_57.
    Khan, Samiya, Kashish Ara Shakil, and Mansaf Alam. 2019. “PABED—A Tool for Big
    Education Data Analysis.” 2019 20th IEEE International Conference on Industrial
    Technology (ICIT 2019), Melbourne, Australia: IEEE, 794–799.
    Khanna, Leena, Shailendra Narayan Singh, and Mansaf Alam. 2016. “Educational
    Data Mining and Its Role in Determining Factors Afecting Students’ Academic
    Performance: A Systematic Review.” 2016 1st India International Conference on
    Information Processing (IICIP). Delhi, India: IEEE, 1–7.
    Kibria, Mirza Golam, Kien Nguyen, Gabriel Porto Villardi, Ou Zhao, Kentaro Ishizu,
    and Fumihide Kojima. 2018. “Big Data Analytics, Machine Learning, and Artifcial
    Intelligence in Next-generation Wireless Networks.” IEEE Access 6, 32328–32338.
    Kumar, Vinod, Rajendra Kumar, Santosh Kumar Pandey, and Mansaf Alam. 2018.
    “Fully Homomorphic Encryption Scheme with Probabilistic Encryption Based
    on Euler’s Teorem and Application in Cloud Computing.” In: V.B. Aggarwal,
    Vasudha Bhatnagar, and Durgesh Kumar Mishra (eds.), Big Data Analytics.
    Advances in Intelligent Systems and Computing 654. Singapore: Springer. https://doi.
    org/10.1007/978-981-10-6620-7_58.
    LaValle, Steve, Eric Lesser, Rebecca Shockley, Michael S. Hopkins, and Nina Kruschwitz.
    2011. “Big Data, Analytics and the Path from Insights to Value.” MIT Sloan
    Management Review 52, no. 2, 21–32.
    Le, James. 2018. “Spotify’s ‘Tis Is’ Playlists: Te Ultimate Song Analysis for 50
    Mainstream Artists.” Towards Data Science, July 11. Accessed December 28, 2020.
    https://towardsdatascience.com/spotifys-this-is-playlists-the-ultimate-song-analysisfor-50-mainstream-artists-c569e41f8118.
    Embrace the Data Analytics Chase

    17
    Malhotra, Shweta, Mohammad Najmud Doja, Bashir Alam, and Mansaf Alam. 2017.
    “Bigdata Analysis and Comparison of Bigdata Analytic Approaches.” 2017
    International Conference on Computing, Communication and Automation (ICCCA).
    Noida, India: IEEE, 309–314.
    Malhotra, Shweta, Mohammad Najmud Doja, Bashir Alam, and Mansaf Alam. 2018.
    “Generalized Query Processing Mechanism in Cloud Database Management
    System.” In: V.B. Aggarwal, Vasudha Bhatnagar, and Durgesh Kumar Mishra (eds.),
    Big Data Analytics. Advances in Intelligent Systems and Computing 654. Singapore:
    Springer. https://doi.org/10.1007/978-981-10-6620-7_61.
    Mbala, Isaac Nkongolo, and John Andrew van der Poll. Nov. 16–17, 2020. “Towards a
    Formal Modelling of Data Warehouse Systems Design.” 18th JOHANNESBURG
    International Conference on Science, Engineering, Technology & Waste Management
    (SETWM‑20). Johannesburg, SA: EARET, 323–329.
    McAfee, Andrew, and Erik Brynjolfsson. 2012. “Big Data: Te Management Revolution.”
    Harvard Business Review 90, no. 10, 60–68.
    Mitchell, Tom M. 1997. “Does Machine Learning Really Work?” AI Magazine 18, no. 3,
    11–20.
    Provost, Foster, and Tom Fawcett. 2013. Data Science for Business: What You Need to Know
    About Data Mining and Data‑analytic Tinking. Sebastopol, CA: O’Reilly.
    Santoso, Leo Willyanto, and Yulia. 2017. “Data Warehouse with Big Data Technology for
    Higher Education.” Procedia Computer Science 124, 93–99.
    Shakil, Kashish Ara, and Mansaf Alam. 2018. “Cloud Computing in Bioinformatics
    and Big Data Analytics: Current Status and Future Research.” In: V.B. Aggarwal,
    Vasudha Bhatnagar, and Durgesh Kumar Mishra (eds.), Big Data Analytics.
    Advances in Intelligent Systems and Computing 654. Singapore: Springer. https://doi.
    org/10.1007/978-981-10-6620-7_60.
    Shakil, Kashish Ara, Mansaf Alam, Shabih Shakeel, Ari Ora, and Samiya Khan. 2018.
    “Exploiting Data Reduction Principles in Cloud-based Data Management for
    Cryo-image Data.” ICCMB ’18: Proceedings of the 2018 International Conference
    on Computers in Management and Business. Oxford: Association for Computing
    Machinery, New York, NY, 61–66. https://doi.org/10.1145/3232174.3232177.
    Stanford, Stacy, Roberto Iriondo, and Pratik Shukla. 2020. “Best Public Datasets for
    Machine Learning and Data Science.” Medium, August 7. Accessed December 28,
    2020. https://medium.com/towards-artifcial-intelligence/best-datasets-for-machinelearning-data-science-computer-vision-nlp-ai-c9541058cf4f.
    Sun, Zhaohao, Kenneth Strang, and Sally Firmin. 2017. “Business Analytics-based
    Enterprise Information Systems.” Journal of Computer Information Systems 57, no. 2,
    169–178.
    Syed, Arshad Ali, Mohammad Afan, and Mansaf Alam. 2019. “A Study of Efcient Energy
    Management Techniques for Cloud Computing Environment.” 9th International
    Conference on Cloud Computing, Data Science & Engineering (Confuence). Noida,
    India: IEEE, 13–18. https://doi.org/10.1109/CONFLUENCE.2019.8776977.
    Tan, Pang-Ning, Michael Steinbach, and Vipin Kumar. 2014. Introduction to Data Mining.
    Harlow: Pearson.
    Yin, Shen, and Okyay Kaynak. 2015. “Big Data for Modern Industry: Challenges and
    Trends [point of view].” Proceedings of the IEEE 103, no. 2, 143–146.
    Chapter 2
    Big Data Analytics
    and Algorithms
    Alok Kumar, Lakshita Bhargava, and Zameer Fatima
    Contents
    2.1 Introduction…………………………………………………………………………………..20
    2.2 Big Data Analytics ………………………………………………………………………….20
    2.3 Categories of Big Data Analytics……………………………………………………….21
    2.3.1 Predictive Analytics ………………………………………………………………23
    2.3.2 Prescriptive Analytics…………………………………………………………….25
    2.3.2.1 How Prescriptive Analytics Works………………………………25
    2.3.2.2 Examples of Prescriptive Analytics………………………………25
    2.3.2.3 Benefts of Prescriptive Analytics ………………………………..25
    2.3.3 Descriptive Analytics …………………………………………………………….26
    2.3.4 Diagnostic Analytics……………………………………………………………..26
    2.3.4.1 Benefts of diagnostic analytics …………………………………..26
    2.4 Big Data Analytics Algorithms………………………………………………………….26
    2.4.1 Linear Regression………………………………………………………………….28
    2.4.1.1 Preparing a Linear-Regression Model ………………………….29
    2.4.1.2 Applications of Linear Regression……………………………….30
    2.4.2 Logistic Regression ……………………………………………………………….30
    2.4.2.1 Types of Logistic Regression ………………………………………31
    2.4.2.2 Applications of Logistic Regression……………………………..32
    2.4.3 Naive Bayes Classifers…………………………………………………………..33
    2.4.3.1 Equation of the Naive Bayes Classifers ……………………….33
    2.4.3.2 Application of Naive Bayes Classifers………………………… 34
    DOI: 10.1201/9781003175711-2
    19
    20

    Big Data Analytics
    2.4.4 Classifcation and Regression Trees………………………………………… 34
    2.4.4.1 Representation of CART Model ……………………………….. 34
    2.4.4.2 Application of Classifcation and Regression Trees ………..35
    2.4.5 K-Means Clustering ………………………………………………………………35
    2.4.5.1 How K-Means Clustering Works ………………………………..36
    2.4.5.2 Te K-Means Clustering Algorithm…………………………….36
    2.4.5.3 Application of K-Means Clustering Algorithms …………….36
    2.5 Conclusion and Future Scope……………………………………………………………37
    References ……………………………………………………………………………………………..37
    2.1 Introduction
    Tere is no denying the fact that the digital era is on the horizon, and it is here to
    stay. In this digital era, a shift is occurring from an industry-based to an informationbased economy, which has caused a large amount of data to be accumulated with a
    mindboggling increase every single day. It is estimated that by 2025 we will be generating 463 exabytes of data every day. Tis staggering amount of data available is both
    a boon and a curse for humanity. Improper handling of data can lead to breaches of
    privacy, an increase in fraud, data loss, and much more. If handled properly, a tremendous growth and enhancement in technology can be achieved. Te traditional
    methods of handling and analyzing data like storing data in traditional relational
    databases usually perform very poorly in handling big data, the reason being the sheer
    size of the data. Tis is where the power of big-data analytics comes into full swing.
    Te key highlight and main contributions of the chapter include
     Te main idea behind writing this chapter is to provide a detailed and structured overview of big-data analytics along with various tools and technology
    used in the process.
     Te chapter provides a clear picture of what big-data analytics is and why it is
    an extremely important and dominant technology in the current digital era.
     We have also discussed diferent techniques of big-data analytics along with
    their relevance in diferent scenarios.
     A later section of the chapter focuses on some of the most popular and cutting-edge algorithms being used in the process of big-data analytics.
     Te chapter concludes with a fnal section discussing the shortcomings of
    current data analytics techniques, along with a brief discussion of upcoming
    technologies that can bridge the gaps present in current techniques.
    2.2 Big Data Analytics
    Big‑data analytics in very simple terms is the process of finding meaningful patterns in a large seemingly unorganized amount of data. The primary
    Big Data Analytics and Algorithms
     21
    goal of big-data analysis is always to provide insights into the source that is
    responsible for the generation of data. These insights can be extremely valuable for companies to understand the behavior of their customers and how well
    their product is working in the market. Big-data analytics is also extensively
    used for revealing product groupings as well as products that are more likely
    to be purchased together. A mindboggling real-world example of this is the
    ‘diaper-beer’ product association found by Walmart upon analyzing its consumer’s data. The finding suggested that working men tend to purchase beers
    for themselves and diapers for their kids together when coming back home
    from work on Friday night. This led Walmart to put these items together,
    which saw an increase in the sales of both the items. This finding gives a clear
    demonstration of the power of big-data analytics for finding product associations, as by using classical product-association techniques it is nearly impossible to find such a bizarre correlation. To get a better understanding of how
    the process of big-data analytics works in the real world, let’s take an example
    of how an ecommerce company can leverage the power of big-data analytics to increase the sales of their product. In this example, we would consider
    the broad analysis of two categories of data, data generated by the users in
    the course of purchasing a product and data generated in after-sales customer
    service. Big-data analytics techniques like market-basket analysis, customerproduct analysis, etc. can be used in the first kind of dataset to find associations like product–product association, customer–product association, or
    customer–customer association. These findings can be used by the company
    to improve its product-recommendation system as well as product placement
    on its portal. Similarly, the results obtained after analysis of after-sales data
    like customer care phone calls, complaint emails, etc. can be used for training customer-care personnel or even in the development and improvement
    of smart chatbots. These factors combined can increase the overall customer
    satisfaction, which can boost the sales number and also help in new-customer
    acquisition. A surface-level picture of the process is provided in Figure 2.1.
    Big-data analytics also have found widespread application in the field of medical science. Various data-mining and analytics techniques have been used in a
    variety of medical applications like disease prediction, genetic programming,
    patient data management, etc. [1–3]. Data analytics can also be used in educational sectors to analyze students; data and generate better frameworks for
    enhancing their education [4–5].
    2.3 Categories of Big Data Analytics
    Big-data analytics is usually classifed into four main categories as shown in
    Figure 2.2. In this section, we will be looking into each of these categories in detail
    as a separate subsection.
    22

    Big Data Analytics
    Figure 2.1 Levering Big-Data Analytics in An Ecommerce Company.
    Big Data Analytics and Algorithms

    23
    Figure 2.2 Categories of Big-Data Analytics.
    Figure 2.3 Process of Predictive Analytics.
    2.3.1 Predictive Analytics
    Predictive analytics is a variation of big +-data analytics that is used to make predictions based on the analysis of current data. In predictive analytics, usually historical
    and transactional data are used to identify risks and opportunities for the future.
    Predictive analytics empowers organizations in providing a concrete base on which
    they can plan their future actions. Tis allows them to make decisions that are
    more accurate and fruitful compared to the ones taken based on pure assumptions
    or manual analysis of data. Tis helps them in becoming proactive and forwardlooking organizations. Predictive analytics can even be extended further to include
    a set of probable decisions that can be made based on the analytics obtained during
    the process. Te whole process of predictive analytics can be broken down into a set
    of steps as shown in Figure 2.3.
    24

    Big Data Analytics
    Steps involved in predictive analytics process:
    1. Defne the project—Te frst and one of the most important steps in the
    process of predictive analytics is defning the project. Tis step consists of
    identifying diferent variables like scope and the outcome as well as identifying the dataset on which predictive analytics needs to be executed. Tis step
    is extremely crucial as it lays down the foundation for the whole process of
    data analytics.
    2. Data collection—Data is the most fundamental piece of every data-analytics
    process; it’s the same when it comes to predictive analytics. In the data-collection stage organizations collect various types of data through which analytics
    can take place. Te decision to determine the type of data that need to be
    collected usually depends on the desired outcome of the process established
    during the project defnition stage.
    3. Data analysis—Te data analysis stage comprises cleaning, transforming,
    and inspecting data. It is in this stage that patterns, correlations, and useful
    information about the data are found.
    4. Statistics—Tis is a kind of intermediate stage in which the hypotheses and
    assumptions behind the model architecture are validated using some existing statistical methods. Tis step is very crucial as it helps in pointing out
    any faws in the logic and highlights inaccuracies that may plague the actual
    model if unnoticed.
    5. Modeling—Tis stage involves developing the model with the ability to automatically make predictions based on information derived during the data-analytics
    stage. To improve the accuracy of the model, usually a self-learning module is
    integrated, which helps in increasing the accuracy of the model over time.
    6. Deployment—In the deployment stage, the model is fnally deployed on a
    production-grade server, where it can automatically make decisions and send
    automated decision reports based on that. It can also be exposed in the form
    of an application programming interface (API), which can be leveraged by
    other modules while abstracting the actual complicated logic.
    7. Monitoring—Once the deployment is done it is advisable to monitor the
    model and verify the predictions done by the model on actual results. Tis
    could help in enhancing the model and rectifying any minor or major issues
    that could cripple the performance of the model.
    Predictive analytics is being used extensively to tackle a wide variety of problems
    ranging from simple problems like predicting consumers’ behavior on the ecommerce
    platforms to highly sophisticated ones like predicting the chance of occurrence of a
    disease in a person based on their medical records. With the advancement in the feld
    of data analytics, the accuracy of predictive analytics models has increased exponentially over the decade, which has enabled their uses in the feld of medical science.
    Maryam et al. have discussed various predictive analytics techniques for predicting
    Big Data Analytics and Algorithms
     25
    Drug Target Interactions(DTIs) based on analysis of standard datasets [6]. Shakil et
    al. have proposed a method for predicting dengue disease outbreaks using a predictive
    analytics tool Weka [1].
    2.3.2 Prescriptive Analytics
    Prescriptive analytics is a branch of data analytics that helps in determining the
    best possible course of action that can be taken based on a particular scenario.
    Prescriptive analytics unlike predictive analytics doesn’t predict a direct outcome
    but rather provides a strategy to fnd the most optimal solution for a given scenario.
    Out of all the forms of business analytics, predictive analytics is the most sophisticated type of business analytics and is capable of bringing the highest amount of
    intelligence and value to businesses [7].
    2.3.2.1 How Prescriptive Analytics Works
    Prescriptive analytics usually relies on advanced techniques of artifcial intelligence, like machine learning and deep learning, to learn and advance from the
    data it acquires, working as an autonomous system without the requirement of any
    human intervention. Prescriptive-analytics models also have the capability to adjust
    their results automatically as new data sets become available.
    2.3.2.2 Examples of Prescriptive Analytics
    Te power of prescriptive analytics can be leveraged by any data-intensive business
    and government agency. A space agency can use prescriptive analytics to determine
    whether constructing a new launch site can endanger a species of lizards living
    nearby. Tis analysis can help in making the decision to relocate of the particular
    species to some other location or to change the location of the launch site itself.
    2.3.2.3 Benefts of Prescriptive Analytics
    Prescriptive analytics is one of the most efcient and powerful tools available in the
    arsenal of an organization’s business intelligence. Prescriptive analytics provides an
    organization the ability to:
    1. Discover the path to success—Prescriptive-analytics models can combine
    data and operations to provide a road map of what to do and how to do it
    most efciently with minimum error.
    2. Minimize the time required for planning—Te outcome generated by prescriptive-analytics models helps in reducing the time and efort required by
    the data team of the organization to plan a solution, which enables them to
    quickly design and deploy an efcient solution
    26

    Big Data Analytics
    3. Minimize human interventions and errors—Prescriptive-analytics models
    are usually fully automated and require very few human interventions, which
    makes them highly reliable and less prone to error compared to the manual
    analysis done by data scientists.
    2.3.3 Descriptive Analytics
    Descriptive analytics answers the question of what has happened. Te process of
    descriptive analytics uses a large amount of data to fnd what has happened in a business
    for a given period and also how it difers from another comparable period. Descriptive
    analytics is one of the most basic forms of analytics used by any organization for getting an overview of what has happened in the business. Using descriptive analytics on
    historic data, decision-makers within the organization can get a complete view of the
    trend on which they can base their business strategy. It also helps in identifying the
    strengths and weaknesses lying within an organization. Being an elementary form of
    analytics technique, it is usually used in conjunction with other advanced techniques
    like predictive and prescriptive analysis to generate meaningful results.
    2.3.4 Diagnostic Analytics
    Te branch of diagnostic analytics comprises a set of tools and techniques that
    are used for fnding the answer to the question of why certain things happened.
    Diagnostic analytics takes a deep dive into the data and tries to fnd valuable hidden insights. Diagnostic analytics is usually the frst step in the process of business
    analytics in an organization. Diagnostic analytics, unlike predictive or prescriptive analytics, doesn’t generate any new outcome; rather, it provides the reasoning
    behind already known results. Techniques like data discovery, data mining, drilldown, etc. are used in the process of diagnostic analytics.
    2.3.4.1 Benefts of diagnostic analytics
    Diagnostic analytics allows analysists to translate complex data into meaningful
    visualizations and insights that can be taken advantage of by everyone. Diagnostic
    analytics also provides insight behind the occurrence of a certain result. Tis insight
    can be used to generate predictive- or prescriptive-analytics models.
    A comparison of all these four analytics processes along with the critical question answered by each one of them is shown in Table 2.1 and Figure 2.4 respectively.
    2.4 Big Data Analytics Algorithms
    In the current digital era, data is the new gold. Every organization nowadays understands the importance of having a stockpile of data at its disposal. Companies like
    Google, Microsoft, and Facebook are dominating the modern era, and a big credit
    Big Data Analytics and Algorithms
     27
    Table 2.1 Comparison of Different Categories of Data Analytics
    Category of
    classifcation
    Predictive
    Prescriptive
    Descriptive
    Diagnostic
    Source of
    data
    Uses historical
    data
    Uses
    historical
    data
    Uses historical
    data
    Uses
    historical
    data
    Data
    manipulation
    Fills in gaps in
    available data
    Estimates
    outcomes
    based on
    variables
    Reconfgures
    data into
    easy-to-read
    format
    Identifes
    anomalies
    Role of
    analytics
    Creates data
    models
    Offers
    suggestions
    about
    outcomes
    Describes
    the state of
    business
    operation
    Highlights
    data trends
    Technique
    used
    Forecasts
    potential
    future
    outcomes
    Uses
    algorithms,
    machine
    learning,
    and AI
    Learns from
    the past
    Investigates
    underlying
    issues
    Critical
    question
    answered
    Answers ‘What
    might
    happen?’
    Answers ‘If,
    then
    questions’
    Answer ‘What
    questions’
    Answer
    ‘Why
    questions’
    Figure 2.4 Critical Questions Answered by Different Analytics Techniques.
    28 
    Big Data Analytics
    Figure 2.5 Big-Data Analytics Algorithms.
    for that goes to the mammoth data stores they have at their disposal. Having such
    huge data stores at their disposal has enabled these companies to push the boundaries of technological advancement in a way that was never seen before. A burning
    example that exhibits the power of data and what can be achieved through its
    proper analytics is Google Maps. Built on top of data pipelines containing a huge
    amount of dynamic and diverse data collected by Google from multiple sources, it
    is a piece of technology that seems like something straight from the future.
    But having data alone is not sufcient. Data on its own is useless and becomes
    meaningful only when proper analysis of that data is done. With an unprecedented
    increase in the amount of data generated in the last couple of years, it has become
    more necessary now than ever to have fast and efcient data-analytics algorithms
    at our disposal as the classical methods of data analysis using graphs or charts are
    simply not enough to keep up with this huge amount of data otherwise also known
    as Big Data. To solve this problem, data scientists all over the world have developed and are in the process of developing new advanced algorithms for analyzing
    big data efciently. To discuss all of these algorithms is beyond the scope of this
    chapter, hence we will keep our focus on the fve most popular big-data analytics
    algorithms that usually form the basis of the majority of high-performance analytics models. Tese algorithms are shown in Figure 2.5 and discussed afterward.
    2.4.1 Linear Regression
    Linear regression is a kind of statistical test performed on a dataset to defne and fnd
    the relation between considered variables [8]. Linear regression is one of the most
    popular and frequently used statistical analysis algorithms. Being a very simple yet
    extremely powerful algorithm for data analysis, it is used by data scientists extensively for designing simple as well as complicated analytical models.
    Linear regression, as the name suggests, is a simple linear equation that combines
    the input values (x) and then generates the solution as a predicted output (y). In
    the linear-regression model, a scale factor is assigned to each of the input values or
    Big Data Analytics and Algorithms

    29
    independent variables, which is also known as a coefcient and is symbolized using
    the Greek letter Beta (˜). An extra coefcient, also known as intercept or bias coeffcient, is added to the equation, which provides an additional degree of freedom
    to the line. If the linear-regression equation contains a single dependent variable
    (y) and a single independent variable (x), it is known as univariate regression and is
    represented by equation 2–1:
    y = ˜1 * x + ˜0
    (2–1)
    y = dependent variable
    x = independent variable
    β1 = scale factor
    β0 = bias coefcient
    Te regression model with more than one independent variable is known as multi‑
    variate regression. In a multivariate-regression model, an attempt is made to account
    for the variation of independent variables in the dependent variable synchronically
    [9]. Te equation of multivariate regression is an extension of univariate regression
    and is represented in equation 2–2:
    y = ˜0 + ˜1 * x1 + ˜ + ˜n * xn + °
    (2–2)
    y = dependent variable
    x = independent variable
    (˜1 − ˜n ) = scale factor
    ˜0 = bias coefcient
    ° = error
    2.4.1.1 Preparing a Linear-Regression Model
    Preparing a linear-regression model, also known as model training, is the process of estimating the coefcients of the equation to fnd the best-ftting line for
    our dataset. Tere are several methods for training a linear-regression model. In
    this section, we will be discussing three of the most commonly used methods
    among them.
    1. Simple Linear Regression—Simple linear regression is a technique for
    training linear-regression models when there is only one input—or, better
    to say, only one independent variable—in the equation. In the method
    of simple linear regression, model statistical properties from the data like
    mean, standard deviation, correlations, and covariance are calculated,
    which are used for estimating the coefcients and hence fnding the bestftting line.
    30

    Big Data Analytics
    2. Least Square—Te method of least square is used when there are multiple
    dependent variables and an estimation of the values of the coefcients is
    required. Tis procedure seeks to attenuate the sum of the squared residuals. Te method suggests that, for a given regression curve, we can calculate
    the space from each datum to the regression curve, square it, and determine
    the sum of all of the squared errors together. Tis is often the value that the
    method of least squares needs to attenuate.
    3. Gradient descent—Te method of gradient descent is used in the scenario
    when there are one or more inputs and there is a requirement for optimizing the value of the coefcient, which is done by an iterative minimization
    of the error of the model on training data. Te algorithm starts by assigning
    random values to every coefcient. Calculating the sum of squared errors for
    all pairs of input and output values is the next step in the process of gradient
    descent. A learning rate is associated, which acts as a multiplier with which
    the value of coefcients are updated with the goal of minimizing the error.
    Tis process gets terminated when either minimum-squared sum has been
    achieved or any further improvement is not feasible.
    Te variation of gradient descent using a rectilinear-regression model is
    more commonly used as it is relatively straightforward to understand. Tis
    algorithm fnds application in the scenario when the dataset is large and
    hence won’t ft into the memory.
    2.4.1.2 Applications of Linear Regression
    Linear regression is a simple yet very sophisticated algorithm that fnds application
    in a wide variety of felds. Roy et al. have proposed a Lasso Linear Regression Model
    for stock-market forecasting [9]. Zameer et al. have used a linear-regression-based
    model for predicting crude-oil consumption [10]. In general, linear-regression models are quite good in performing predictive data analytics.
    2.4.2 Logistic Regression
    Te technique of logistic regression in big data analytics is used when the variable
    to be considered is dichotomous (binary). Te basis of logistic regression, just
    like all other regression, is a predictive analysis. Logistic regression is employed
    to elucidate data and to explain the connection between one dependent binary
    variable and one or more nominal, ordinal, interval, or ratio-level independent
    variables.
    Logistic regression works on the concept of logit—the natural logarithms of an
    odds ratio [11]. Tis type of regression model works quite well when the dependent
    variable is categorical. Some examples of real-world problems where the dependent
    variable can be categorical are predicting if the email is spam (1) or not (0) or if a
    tumor is malignant (1) or safe (0). Logistic regression is a component of a bigger
    class of algorithms referred to as the generalized linear model (GLM). In 1972,
    Big Data Analytics and Algorithms
     31
    Figure 2.6 A Sample Logistic-Regression Plot.
    Nelder and Wedderburn proposed this model in an attempt to supply a way of
    using rectilinear regression with the issues that weren’t directly ftted to the application of rectilinear regression. Tey proposed a category of various models (linear
    regression, ANOVA, Poisson regression, etc.), including logistic regression as a special case. Equation 2–3 represents a general equation of logistic regression.
    loglog {1− p} = ˜0 + ˜1 * x
    (2–3)
    (p/1‑p) = odd ratio
    x = independent variable
    ˜1 = scale factor
    ˜0 = bias coefcient
    In this equation {1− p} is the odds ratio. Te positive log of an odds ratio usually
    translates into a probability of success greater than 50%. A sample plot of logistic
    regression is shown in Figure 2.6.
    2.4.2.1 Types of Logistic Regression
    1� Binary Logistic Regression
    In binary logistic regression, a categorical response can only have two possible
    outcomes. Example: Spam or Not email.
    32

    Big Data Analytics
    2� Multinomial Logistic Regression
    In multinomial logistic regression, dependent (target) variables can have three
    or more categories without ordering. Example: predicting which food is preferred more (Veg, Non-Veg, Vegan).
    3� Ordinal Logistic Regression
    Ordinal logistic regression is a subset of multinomial logistic regression in
    which dependent (target) variables can have three or more categories but in a
    defned order. Example: movie rating from 1–5.
    2.4.2.2 Applications of Logistic Regression
    Logistic regression is a simple yet efcient algorithm that fnds application in a wide
    variety of felds. Due to its predictive nature, logistic regression fnds application in
    felds ranging from education to healthcare. Ramosaco et al. have developed a logisticregression-based model to study students’ performance levels [12]. Alzen et al. have
    proposed another logistic-regression-based model to fnd the relationship between the
    learning assistant model and failure rates in introductory STEM courses [13].
    Although linear regression and logistic regression are both regression-based
    models, they do share a lot of diferences. Tese diferences are shown in Table 2.2.
    Table 2.2 Difference between Linear and Logistic Regression
    Linear Regression
    Logistic Regression
    Linear regression is used to predict
    the continuous dependent variable
    using a given set of independent
    variables.
    Logistic regression is used to predict
    the categorical dependent variable
    using a given set of independent
    variables.
    Linear regression is used for solving
    the regression problem.
    Logistic regression is used for solving
    classifcation problems.
    In linear regression, we predict the
    value of continuous variables.
    In logistic regression, we predict the
    values of categorical variables.
    In linear regression, we fnd the
    best-ftting line, by which we can
    easily predict the output.
    In logistic regression, we fnd the
    S-curve by which we can classify the
    samples.
    The least-square estimation method
    is used for the estimation of accuracy.
    The maximum-likelihood estimation
    method is used for the estimation of
    accuracy.
    The output of linear regression must
    be a continuous value, such as price,
    age, etc.
    The output of logistic regression
    must be a categorical value such as 0
    or 1, Yes or No, etc.
    Big Data Analytics and Algorithms
    Linear Regression

    33
    Logistic Regression
    In linear regression, it is required that
    the relationship between the
    dependent variable and independent
    variable be linear.
    In logistic regression, it is not
    required to have the linear
    relationship between the dependent
    and independent variable.
    In linear regression, there may be
    collinearity between the independent
    variables.
    In logistic regression, there should
    not be collinearity between the
    independent variables.
    2.4.3 Naive Bayes Classifers
    Naive Bayes classifers are a set of classifcation algorithms supported by Bayes’
    theorem. It’s not one algorithm but a family of algorithms where all of them share
    a standard principle, i.e. every pair of features being classifed is independent of
    every other.
    Naive Bayes uses the probabilistic approach for constructing classifers. Tese
    classifers can simplify learning by assuming that features are independent of given
    class [14]. Naive Bayes classifcation is a subset of Bayesian decision theory. It’s
    called naive because the formulation makes some naive assumptions [15].
    Te main assumption that Naive Bayes classifers make is that the value of a
    specifc feature is independent of the value of the other feature. Despite having an
    oversimplifed assumption, Naive Bayes classifers tend to perform well even in
    complex real-world scenarios. Te main advantage that Naive Bayes classifers have
    over other classifcation algorithms is the requirement of a little amount of training
    data for estimating the parameters necessary for classifcation, which is used for an
    incremental training of the classifer.
    2.4.3.1 Equation of the Naive Bayes Classifers
    To understand the equation of Naive Bayes classifers we need to understand Bayes’
    theorem, which is the fundamental theorem on which Naive Bayes classifers work.
    Bayes’ theorem
    Bayes’ theorem fnds the probability of the occurrence of an event,
    given the probability of another event that has already occurred. Bayes
    theorem is stated mathematically as shown in equation 2–4:
    B 
    P   * P ( A )
     A
     A
    P   =
     B 
    P (B )
    (2–4)
    34

    Big Data Analytics
    P(A) = Probability of occurrence of event A
    P(B) = Probability of occurrence of event B
    P(A/B) = Probability of A given B
    P(B/A) = Probability of B given A
    Bayes’ theorem can be extended to fnd equations of various Naive
    Bayes classifers.
    2.4.3.2 Application of Naive Bayes Classifers
    Naive Bayes classifers, despite having certain limitations and assumptions, work
    quite well for solving classifcation problems. Karthika and Sairam propose a classifcation methodology utilizing the Naive Bayesian classifcation algorithm for
    the classifcation of persons into diferent classes based on various attributes representing their educational qualifcation [16]. Qin et al. research classifying multilabel data based on Naive Bayes classifers, which can be extended to multilabel
    learning [17].
    2.4.4 Classifcation and Regression Trees
    Classifcation and regression trees (CART) is a term coined by Leo Breiman to
    allude to the decision tree class of algorithms that are used to solve the classifcation
    and regression predictive analytics problems.
    Traditionally, this calculation is alluded to as ‘decision trees’; however, in certain programming languages like R they are alluded to by the more present-day
    term CART. Te CART algorithms give an establishment for some other signifcant algorithms like bagged decision-tree algorithms, random-forest algorithms,
    and boosted decision-tree algorithms.
    2.4.4.1 Representation of CART Model
    Te CART model can be represented as a binary tree. Each node in the tree represents a single input variable (x) and a split point theorem variable, and the leaf node
    is represented using an output variable (y), which is utilized for forecasting.
    For example, suppose a dataset having two input variables (x) of height in centimeter and weight of a person in kilogram the output variable (y) will tell whether
    the sex of the person is male or female. Figure 2.7 represents a very simple binary
    decision tree model.
    A straightforward way for making predictions using the CART model is with
    the help of its binary tree representation. Te traversal of the tree starts with the
    evaluation of a specifc input starting with the root node of the tree. Each input
    variable in the CART model can be thought of as a dimension in an n-dimensional
    space. Te decision tree in this model splits this plane into rectangles for two input
    Big Data Analytics and Algorithms
     35
    Figure 2.7 Representation of Binary Decision-Tree Model.
    variables or into hyperrectangles for higher inputs. Te input data gets fltered
    through the tree and gets placed in one of the rectangles, whereas the prediction
    made by the model is the output value for the same rectangle; this gives us some
    idea about the type of decisions that a CART model is capable of making, e.g. boxy
    decision boundaries.
    2.4.4.2 Application of Classifcation and Regression Trees
    Pham et al. have used a classifcation and regression tree-based model for predicting
    the rainfall-induced shallow landslides in the state of India based on a dataset of
    430 historic landslide locations [18]. Pouliakis et al. have done a study on CARTbased models to estimate the risk for cervical intraepithelial neoplasia [19]. Iliev et
    al. have proposed a CART-based model for modeling the laser output power of a
    copper bromide vapor laser [20].
    2.4.5 K-Means Clustering
    K-means clustering is a very simple yet popular data-analytics algorithm. It is an
    unsupervised algorithm as it capable of drawing conclusions from datasets having
    only input variables without the requirement of having known or labeled outcomes.
    Te goal of the K-means algorithm is very basic: just group similar data points and
    reveal the pattern present in the dataset. K-means tries to fnd a predefned number
    (k) of the cluster in the dataset. A cluster in very simple terms can be thought of
    as a group of similar data points. Te prerequisite of the algorithm is the target
    number k, which denotes the number of centroids required by us. A centroid can
    either be a real or an imaginary point that represents the center of one single cluster. Each information point is designated for every one of the groups by reducing
    36

    Big Data Analytics
    the in-cluster sum of squares. Te K-means algorithm distinguishes the predefned
    number of centroids and afterward allots each data point to the nearest cluster, with
    the goal being to keep the centroids as tiny as could be expected. Te ‘means’ in the
    K-means alludes to the aggregation of the information or, say, fnding the centroid.
    2.4.5.1 How K-Means Clustering Works
    For handling the learning information, the K-means algorithm in data analytics
    begins with a set of randomly selected centroids; these are utilized as the starting
    point for each cluster and afterward perform iterative calculations to improve the
    places of the centroids.
    It stops making and optimizing cluster when either of the conditions is met:
     Te centroids have stabilized and the algorithm can proceed further, i.e. the
    clustering has been successful.
     Te predefned number of iterations has been reached.
    2.4.5.2 The K-Means Clustering Algorithm
    Te K-means clustering algorithm follows the approach of expectation-maximization. Te expectation step is assigning the data point to the closet cluster. Te
    maximization step is fnding the centroid of each of these clusters. Te fnal goal of
    the K-means algorithm is to minimize the value of squared error function given as:
    J (V ) =
    c
    ci
    2
    ∑∑ ( x −v )
    i
    j
    i=1 j=1
    xi −v j is the Euclidean distance between x and v
    i
    j
    2.4.5.3 Application of K-Means Clustering Algorithms
    Being a high performing, unsupervised learning algorithm, K-means fnds application in a wide variety of felds. Due to its popularity, researchers have created different hybrid versions of this algorithm that are being used extensively in numerous
    felds. Youguo & Haiyan have developed a clustering algorithm on top of K-means
    clustering, which provides greater dependence to choose the initial focal point [21].
    Shakil and Alam have devised a method for data management in the cloud-based
    environment on the basis of the K-means clustering algorithm [22]. Alam and
    Kishwar have categorized various clustering techniques that have been applied to
    web search results [23]. Alam and Kishwar have proposed an algorithm for websearch clustering based on K-means and a heuristic search [24].
    Big Data Analytics and Algorithms
     37
    2.5 Conclusion and Future Scope
    In this chapter, we looked into the basics of data analytics along with its application in the real world. We also looked into various categories of data analytics along
    with some of the most commonly used data-analytics algorithms as well as their
    applications to the real-world scenario. Apart from the algorithms discussed in this
    chapter, data scientists all over the world have been working on designing faster and
    more efcient algorithms. Te idea of using neural-network-based algorithms has
    been also proposed by data scientists [25, 32]. With the rise of quantum computing in the last couple of years, scientists are also looking forward to the possibility
    of leveraging the power of quantum computers in big-data analytics [26]. Cloudbased big-data analytics is also becoming quite popular as it can leverage the power
    of cloud computing for big-data analytics [27–31]. With these new technological
    advancements on the horizon, it can be safely assumed that the future of big-data
    analytics is going to be bright and exciting.
    References
    1. Shakil, K. A., Anis, S., & Alam, M. (2015). Dengue disease prediction using weka
    data mining tool. arXiv preprint arXiv:1502.05167.
    2. Khan, M. W., & Alam, M. (2012). A survey of application: Genomics and genetic
    programming, a new frontier. Genomics, 100(2), 65–71.
    3. Shakil, K. A., Zareen, F. J., Alam, M., & Jabin, S. (2020). BAMHealthCloud: A
    biometric authentication and data management system for healthcare data in cloud.
    Journal of King Saud Universi…

    Order a unique copy of this paper

    600 words
    We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
    Total price:
    $26
    Top Academic Writers Ready to Help
    with Your Research Proposal

    Order your essay today and save 25% with the discount code GREEN