Big data processing with Hadoop by Shiqi Wu
Computing technology has changed the way we work, study, and live. The distributed data processing technology is one of the popular topics in the IT field. It provides a simple and centralized computing platform by reducing the cost of the hardware. The characteristics of distributed data processing technology have changed the whole industry. Hadoop, as the open source project of Apache foundation, is the most representative platform of distributed big data processing. The Hadoop distributed framework has provided a safe and rapid big data processing architecture. The users can design the distributed applications without knowing the details in the bottom layer of the system. This thesis provides a brief introduction to Hadoop. Due to the complexity of Hadoop platform, this thesis only concentrates on the core technologies of the Hadoop, which are the HDFS, MapReduce, and Hbase