MapReduce++: Simplified Processing of Unstructured Data on Large Computing Clouds
|Budget||PKR 23.73 million|
|Status||On Going Project|
|Progress Report||View Progress Report|
The emergence of cloud computing is rapidly transforming processing techniques to support distributed platforms with large cluster sizes. With the announcement of Microsoft Azure, we are not only seeing a shift of internet scale services to cloud clusters, but also basic functionality of Operating Systems slowly becoming available on such distributed platforms. This quantum shift in computation requires a distributed framework which consumers and the industry alike could use for large amounts of computation. Google’s seminal framework for computation, called MapReduce, provides an ideal base for such distributed processing. The framework has been used for years at Google for large computations, and, now, its open-source implementation finds wide use at many other companies like Facebook. Yet there are some hurdles in making it a feasible cloud computation framework. This project plans to solve and implement such issues in MapReduce, as identified in this proposal. The solutions will be implemented as part of an open-source plug-in for Hadoop (MapReduce’s Apache implementation) which could make it a viable choice as a cloud computational framework.Since cloud platforms are the next wave of change, the project also aims to train the local software industry for using such frameworks. Three workshops will be organized as part of the project to train IT professionals, educators and researchers from other universities to use MapReduce.