- Book Name: Frontiers in Massive Data Analysis
- Pages: 191
- Size: 1 MB
Frontiers in Massive Data Analysis PDF
Contents of Frontiers in Massive Data Analysis PDF
- Massive data in science, technology, commerce, national defense, telecommunications, and other endeavors
- Scaling the infrastructure for data management
- Temporal data and real-time algorithms
- Resources, trade-offs, and limitations
- Building models from massive data
- Sampling and massive data
- Human interaction with data
- The seven computational giants of massive data analysis
Preface of Frontiers in Massive Data Analysis PDF
Experiments, observations, and numerical simulations in many areas of science and business are currently generating terabytes of data, and in some cases are on the verge of generating petabytes and beyond. Analyses of the information contained in these data sets have already led to major breakthroughs in fields ranging from genomics to astronomy and highenergy physics and to the development of new information-based industries. Traditional methods of analysis have been based largely on the assumption that analysts can work with data within the confines of their own computing environment, but the growth of “big data” is changing that paradigm, especially in cases in which massive amounts of data are distributed across locations.
While the scientific community and the defense enterprise have long been leaders in generating and using large data sets, the emergence of e-commerce and massive search engines has led other sectors to confront the challenges of massive data. For example, Google, Yahoo!, Microsoft, and other Internet-based companies have data that is measured in exabytes (1018 bytes). Social media (e.g., Facebook, YouTube, Twitter) have exploded beyond anyone’s wildest imagination, and today some of these companies have hundreds of millions of users.
Data mining of these massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity, and national intelligence. It is also transforming how we think about information storage and retrieval. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but also as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data.
A number of challenges in both data management and data analysis require new approaches to support the big data era. These challenges span generation of the data, preparation for analysis, and policy-related challenges in its sharing and use, including the following:
• Dealing with highly distributed data sources,
• Tracking data provenance, from data generation through data preparation,
• Validating data,
• Coping with sampling biases and heterogeneity,
• Working with different data formats and structures,
• Developing algorithms that exploit parallel and distributed architectures,
• Ensuring data integrity,
• Ensuring data security,
• Enabling data discovery and integration,
• Enabling data sharing,
• Developing methods for visualizing massive data,
• Developing scalable and incremental algorithms, and
• Coping with the need for real-time analysis and decision-making.
To the extent that massive data can be exploited effectively, the hope is that science will extend its reach, and technology will become more adaptive, personalized, and robust. It is appealing to imagine, for example, a health-care system in which increasingly detailed data are maintained for each individual—including genomic, cellular, and environmental data—and in which such data can be combined with data from other individuals and with results from fundamental biological and medical research so that optimized treatments can be designed for each individual. One can also envision numerous business opportunities that combine knowledge of preferences and needs at the level of single individuals with fine-grained descriptions of goods, skills, and services to create new markets.
Frontiers in massive data analysis pdf.