Sunday, February 05, 2012

The big data era's flux and pulse

Big data is an important contemporary trend but what does it actually mean?

What is big data?
Big data refers not just to the absolute size of a body of information (which currently can be on the order of terabytes, petabytes, and exabytes), but its usability and manageability. Some of the defining parameters of big data are its large size, high velocity activity (incoming, processing, outgoing), heterogeneous nature (a variety of structured and unstructured data types like video and images), and requirement for real-time analytics.

What is the process of working with big data?
The process of working with big data involves several steps. First there may be an exploration of the data using tools for classification, visualization, and summarization. Then there is the detailed step of data cleaning to make the data consistent and usable. The next step is data reduction, for example defining and extracting attributes, decreasing the dimensions of data, representing the problems to be solved, summarizing the data, and selecting portions of the data for analysis. Then, the steps of predictive analytics, scoring, reporting, publishing, and quality validation and maintenance can be applied.

What are the applications of big data analysis?
Some of the benefits of big data analysis are the ability to summarize information, make predictions, identify trends (for example, consumer spending patterns), and rank and prioritize information. Some of the specific algorithms employed include for summarizing: clustering and associations; for making predictions: tree-based methods, neural networks, and k-nearest neighbors; for identification: anomaly detection, similarities and matches, and change detection; and for ranking: logistics and frequency detection.

Excerpted from an Association for Computing Machinery (ACM) talk on Big Data & Predictive Analytics (slides).

blog comments powered by Disqus