BD – 118
• IBM Definition – Volume, Variety and Velocity
• Oracle Definition – Volume, Variety, Velocity and Value
• Small Data is something which can fit into RAM. Big Data is something which cannot fit into RAM.
• Byte >> Kilobyte >> Megabyte >> Gigabyte >> Terabyte >> Petabyte >> Exabyte >> Zettabyte >> Yottabyte
• Big Data is a concept…not a technology or software or tool…
• Map: The Map component distributes the task across a large number of systems and handles the placement of the tasks in a way that balances the load and manages recovery from failures.
• Reduce: Once distributed computation is completed, another function called “Reduce” aggregates all the elements back together to provide a result.
• Developed by Google.
• A distributed storage system intended to manage highly scalable structured data.
• Data is organized into tables having rows and columns. Table can expand horizontally and vertically without any limitations.
• Sparse, distributed, persistent, multi dimensional sorted map.
• Intended to store huge volumes of data across commodity servers.
• Hadoop is an Apache-Managed software framework derived from MapReduce and Big Table.
• Hadoop allows applications based on MapReduce to run on large clusters of commodity hardware.
• Two major components:
i) Distributed File System (DFS) that can support petabytes of data.
ii) Map Reduce Engine (MRE) that computes results in batch.
Stages in Big Data
1) Acquisition: The process of sampling signals, measuring real world physical conditions and convert resulting samples into digital numeric values for manipulation using computer.
2) Marshalling: The process of gathering data and transforming it into a standard format before it is transmitted over a network. Data pieces are collected in a message buffer before they are marshaled. Data marshalling is required when passing the output parameters of a program written in one language as input to a program written in another language.
3) Analysis: The process of breaking a complex substance into smaller parts in order to gain a better understanding of it.
4) Action: The final phase in Big Data is implementation, which finalizing the presentation of the data to end user.