DataMPI

Extending MPI for Big Data with Key-Value based Communication

Performance

  • Experimental Testbed: This testbed has 8 nodes. Each node has two 4-core Intel Xeon 2.4 GHz processors, 16GB memory, one 250GB SATA disk, and one 1GigE card. Each node runs CentOS 6.4 x86_64.

  • Configuration: Hadoop version is 1.2.1. The backend MPI is mvapich2-2.0b. Hadoop benchmarks are the native examples. The version of DataMPI benchmarks is 0.6.0. Input/Output data sets are stored on HDFS. The HDFS block size is 256 MB and the replication factor is 3. Each node runs 4 O/A (or Map/Reduce) concurrent tasks for DataMPI (or Hadoop).

  • Result: Compared with Hadoop, the performance improvement of DataMPI ranges from 24% to 53%.

    DataMPI Micro-Benchmarks Performance Figure
  • Experimental Testbed: This testbed has 8 nodes. Each node has two 4-core Intel Xeon 2.4 GHz processors, 16GB memory, one 250GB SATA disk, and one 1GigE card. Each node runs CentOS 6.4 x86_64.

  • Configuration: Hadoop version is 1.2.1. The backend MPI is mvapich2-2.0b. Hadoop benchmarks are the native examples. The version of DataMPI benchmarks is 0.6.0. Input/Output data sets are stored on HDFS. The HDFS block size is 256 MB and the replication factor is 3. Each node runs 4 O/A (or Map/Reduce) concurrent tasks for DataMPI (or Hadoop).

  • Result: Compared with Hadoop, the performance improvement of DataMPI ranges from 30% to 52%.

    DataMPI BigDataBench Performance Figure

Copyright (C) 2011-2014, DataMPI team

fork me on github
Locations of visitors to this page