Notice bibliographique
- Notice
Type(s) de contenu et mode(s) de consultation : Texte noté : électronique
Auteur(s) : Wiktorski, Tomasz
Titre(s) : Data-intensive systems [Texte électronique] : principles and fundamentals using Hadoop and Spark / Tomasz Wiktorski
Publication : Cham : Springer, copyright 2019
Description matérielle : 1 ressource dématérialisée
Collection : Advanced information and knowledge processing
SpringerBriefs in advanced information and knowledge processing
Lien à la collection : Advanced information and knowledge processing (Internet)
SpringerBriefs in advanced information and knowledge processing (Online)
Note(s) : Notes bibliogr.
Sujet(s) : Hadoop (plate-forme informatique)
Données massives
Indice(s) Dewey :
005.74 (23e éd.) = Fichiers de donnée et bases de données (informatique)
Identifiants, prix et caractéristiques : ISBN 9783030046033. - ISBN 3030046036. - ISBN 9783030046026 (erroné). - ISBN 3030046028
(erroné)
Identifiant de la notice : ark:/12148/cb45779206m
Notice n° :
FRBNF45779206
(notice reprise d'un réservoir extérieur)
Table des matières : Intro; Contents; List of Figures; List of Listings; 1 Preface; 1.1 Conventions Used
in this Book; 1.2 Listed Code; 1.3 Terminology; 1.4 Examples and Exercises; 2 Introduction;
2.1 Growing Datasets; 2.2 Hardware Trends; 2.3 The V's of Big Data; 2.4 NOSQL; 2.5
Data as the Fourth Paradigm of Science; 2.6 Example Applications; 2.6.1 Data Hub;
2.6.2 Search and Recommendations; 2.6.3 Retail Optimization; 2.6.4 Healthcare; 2.6.5
Internet of Things; 2.7 Main Tools; 2.7.1 Hadoop; 2.7.2 Spark; 2.8 Exercises; References;
3 Hadoop 101 and Reference Scenario; 3.1 Reference Scenario; 3.2 Hadoop Setup
3.3 Analyzing Unstructured Data3.4 Analyzing Structured Data; 3.5 Exercises; 4 Functional
Abstraction; 4.1 Functional Programming Overview; 4.2 Functional Abstraction for Data
Processing; 4.3 Functional Abstraction and Parallelism; 4.4 Lambda Architecture; 4.5
Exercises; Reference; 5 Introduction to MapReduce; 5.1 Reference Code; 5.2 Map Phase;
5.3 Combine Phase; 5.4 Shuffle Phase; 5.5 Reduce Phase; 5.6 Embarrassingly Parallel
Problems; 5.7 Running MapReduce Programs; 5.8 Exercises; 6 Hadoop Architecture; 6.1
Architecture Overview; 6.2 Data Handling; 6.2.1 HDFS Architecture; 6.2.2 Read Flow
6.2.3 Write Flow6.2.4 HDFS Failovers; 6.3 Job Handling; 6.3.1 Job Flow; 6.3.2 Data
Locality; 6.3.3 Job and Task Failures; 6.4 Exercises; 7 MapReduce Algorithms and Patterns;
7.1 Counting, Summing, and Averaging; 7.2 Search Assist; 7.3 Random Sampling; 7.4
Multiline Input; 7.5 Inverted Index; 7.6 Exercises; References; 8 NOSQL Databases;
8.1 NOSQL Overview and Examples; 8.1.1 CAP and PACELC Theorem; 8.2 HBase Overview;
8.3 Data Model; 8.4 Architecture; 8.4.1 Regions; 8.4.2 HFile, HLog, and Memstore;
8.4.3 Region Server Failover; 8.5 MapReduce and HBase; 8.5.1 Loading Data
8.5.2 Running Queries8.6 Exercises; References; 9 Spark; 9.1 Motivation; 9.2 Data
Model; 9.2.1 Resilient Distributed Datasets and DataFrames; 9.2.2 Other Data Structures;
9.3 Programming Model; 9.3.1 Data Ingestion; 9.3.2 Basic Actions-Count, Take, and
Collect; 9.3.3 Basic Transformations-Filter, Map, and reduceByKey; 9.3.4 Other Operations-flatMap
and Reduce; 9.4 Architecture; 9.5 SparkSQL; 9.6 Exercises