Map/Reduce for Volunteer Computing

Map/Reduce [2] is programming model implemented by Google inc. for processing and generating distributed huge data sets. Hadoop [3] is a open-source implementation that supports Map/Reduce programming and Hadoop File System, inspired by Google File System.

Volunteer computing is an arrangement in which people (volunteers) provide computing resources to projects, which use the resources to do distributed computing and/or storage. Volunteers are typically members of the general public who own Internet-connected PCs. Organizations such as schools and businesses may also volunteer the use of their computers. Projects are typically academic (university-based) and do scientific research. As Map/Reduce has been widely accepted as a programming model for distributed computing, this project is inspired to build a Map/Reduce software framework in a volunteer computing environments, for example, BIONC [1]. Motivations of this project are:

• Volunteer computing can provide tremendous computing power with limited cost Because of the huge number (> 1 billion) of PCs in the world, volunteer computing supplies more computing power to science than does any other type of computing. This computing power enables scientific research that could not be done otherwise. This advantage will increase over time, because the laws of economics dictate that consumer products such as PCs and game consoles will advan- ce faster than more specialized products, and that there will be more of them. Volunteer computing power can’t be bought; it must be earned. A research project that has limited funding but large public appeal can get huge computing power. In contrast, traditional supercomputers are extremely expensive, and are available only for applications that can afford them (for example, nuclear wea- pon design and espionage). Volunteer computing encourages public interest in science, and provides the public with voice in determining the directions of scientific research.

• Programming model for volunteer computing Currently master/slave or split/merge model is used for volunteer computing. The master/slave model has some limitation to efficiently carry on high throughput computing. Map/Reduce has proven to be an effective computing model for various data centric computing projects, for example, bioinformatics and high energy physics. To implement map/reduce on volunteer computing environment is promising to fully harness the power of volunteer computing.

Programming Task:

 

  • Design Map/Reduce software architecture 
  • Define BIONC and Hadoop interfaces to Map/Reduce
  • Design Map/Reduce programming APIs
  • Implement BOINC interfaces and functions for Map/Reduce support
  • Implement Map/Reduce software libs in Java with BIONC and Hadoop support
  • Implement a GUI for Map/Reduce on BOINC. 
  • Performance valuate with parallel datamining benchmark
  • Performance valuate with real applications, e.g., BLAST or high energy physics application

 

Programming Language:

Java, Python.

 

Reference:

[1] BIONC project: http://boinc.berkeley.edu/

[2] Hadoop project: http://hadoop.apache.org/

[3] Map/Reduce is discussed in http://labs.google.com/papers/mapreduce-osdi04.pdf