Select Page

A treasury from Google Labs: a new programming model

MapReduce: Simplifed Data Processing on Large Clusters

"MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. "
Statistics of jobs in august 2004, as usual impressive amount of data!!!
Number of jobs 29,423
Average job completion time 634 se
Machine days used 79,186 days
Input data read 3,288 TB
Intermediate data produced 758 TB
Output data written 193 TB

"It has been used across a wide range of domains within Google, including:   large-scale machine learning problems,   clustering problems for the Google News and Froogle products, MapReduce has been so successful because it makes it possible to write a simple program and run it ef ciently on a thousand machines in the course of half an hour, greatly speeding up the development and prototyping cycle. Furthermore, it allows programmers who have no experience with distributed and/or parallel systems to exploit large amounts of resources easily." more in the PDF here

About The Author

I worked with various Insurances companies across Switzerland on online applications handling billion premium volumes. I love to continuously spark my creativity in many different and challenging open-source projects fueled by my great passion for innovation and blockchain technology.In my technical role as a senior software engineer and Blockchain consultant, I help to define and implement innovative solutions in the scope of both blockchain and traditional products, solutions, and services. I can support the full spectrum of software development activities, starting from analyzing ideas and business cases and up to the production deployment of the solutions.I'm the Founder and CEO of Disruptr GmbH.