Cloud Computing for Big Data Analytics

Data alone has no significance unless relevant information is extracted from it to support decision-making. Data analysis is the process of extracting useful information from available data. This information, in turn, helps decision-makers to take appropriate actions. Traditionally, the analysis of data was performed through a process known as Extract, Transform and Load (ETL) on relational database management systems (RDBMS), which were designed primarily for vertical growth ( i.e. adding more Central Processing Unit(CPU) and Random Access Memory(RAM) to systems). As the industry is already in “Exabyte & Zettabyte Age” of data, the traditional approaches like RDBMS have faced limitations to store and process this humongous data due to their architectural principles designed during ’70s. The large amounts of data, structured or unstructured, has been termed as “Big data”, having mainly five properties i.e. Volume, Velocity and Variety, Veracity and Value of which Value is the most important whose main purpose is to extract relevant information from the other four V’s. To solve this problem of humongous data and analysis, various technologies have emerged in the past decades. “Cloud computing” is a platform where thousands of servers work together to meet different computing needs and billing is done as per ‘pay as you grow’ model. This thesis studies Cloud computing and Big data fundamentals and benefits of a Cloud computing platform for Big data analytics projects. Knowing that there are many Cloud Service Providers (CSP’s), this thesis explores Big data analytics solutions available from the industry’s top three Cloud computing service providers. The study includes a demo of a Big data analysis project on a leading Cloud computing platform to validate the power of Cloud computing by utilizing publicly available data sets