Funded Projects

MapReduce++: Simplified Processing of Unstructured Data on Large Computing Clouds

Principal Investigator’s Organization (PIO):
Information Technology University, Lahore
Principal Investigator (PI):
Dr. Umar Saif
Project Details:
Start Date 05-Apr-2012
Duration 36 months
Budget PKR 23.73 million
Status On Going Project
Progress Report View Progress Report
Publications N/A
Thematic Area Telecommunication
Project Website
Executive Summary

The emergence of cloud computing is rapidly transforming processing techniques to support distributed platforms with large cluster sizes. With the announcement of Microsoft Azure, we are not only seeing a shift of internet scale services to cloud clusters, but also basic functionality of Operating Systems slowly becoming available on such distributed platforms. This quantum shift in computation requires a distributed framework which consumers and the industry alike could use for large amounts of computation. Google’s seminal framework for computation, called MapReduce, provides an ideal base for such distributed processing. The framework has been used for years at Google for large computations, and, now, its open-source implementation finds wide use at many other companies like Facebook. Yet there are some hurdles in making it a feasible cloud computation framework. This project plans to solve and implement such issues in MapReduce, as identified in this proposal. The solutions will be implemented as part of an open-source plug-in for Hadoop (MapReduce’s Apache implementation) which could make it a viable choice as a cloud computational framework.Since cloud platforms are the next wave of change, the project also aims to train the local software industry for using such frameworks. Three workshops will be organized as part of the project to train IT professionals, educators and researchers from other universities to use MapReduce.