Skip to content

condestable2000/mapreduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce - Deliverable

This is a set of deliverable exercises regarding mapreduce.

  • Histogram (com.agartime.utad.mapreduce.histogram) - Gets the Distribution Histogram of N-Bars given N float numbers.
  • FriendsOfMyFriends (com.agartime.utad.mapreduce.friendsofmyfriends) - Gets the Friend Of My Friends in a Social Graph of pairs (FriendOrigin, FriendDestiny)

Requirements:

Compilation:

From the folder containing pom.xml file:

  $ mvn clean install

After a successful build, you will find a .jar file into the target directory (or in your local repository):

  ./target/mapreduce*.jar

You can execute the exercises in Hadoop, typing:

   hadoop jar ./target/mapreduce-1.0-SNAPSHOT.jar com.agartime.utad.Driver

Histogram:

Classes:

    * HistogramFlow - Flow

    * HistogramMinMaxJob - First Job

    * HistogramDistributorJob - Second Job

    * RangeWritable - Writable for a Range of two numbers.

    * CkIdRange - Composed Key for an Id and a Range <id,RangeWritable>.

    * NumToRangeMapper - First job mapper for min/max calculation. 

    * MinMaxRangeReducer - First job combiner/reducer for min/max calculation. Writables are received in order. There we get first and last value.

* BarDistributorMapper - Second job mapper. Calculates the bar for a specific number.

    * BarSummatorReducer - Second job reducer. Emits every bar and its sum.

    * IdRangeComparator - Sort Comparator for Range in a composed key.

    * GroupIdComparator - Grouping comparator.

    * IdPartitioner - Partitioner.

Usage:

   hadoop jar ./target/mapreduce-1.0-SNAPSHOT.jar com.agartime.utad.Driver friends input_file output_file n_bars

Example:

   hdfs dfs -put ./src/main/resources/histogram_input_sample.txt                               
   hadoop jar ./target/mapreduce-1.0-SNAPSHOT.jar com.agartime.utad.Driver histogram histogram_input_sample.txt histogram.out 30

Friends Of My Friends:

Classes:

   * FriendsOfMyFriendsJob - Job

   * DirectionalRelationshipWritable - Writable for storing directional relationships.

   * FriendAndReversalMapper - Mapper for emit a friendship and its reversal.

   * MutualRelationshipFinderReducer - Retrieve friends of my friends.

Usage:

   hadoop jar ./target/mapreduce-1.0-SNAPSHOT.jar com.agartime.utad.Driver friends input_file output_file

Example:

   hdfs dfs -put ./src/main/resources/mutualfriends_input_sample.txt 
   hadoop jar ./target/mapreduce-1.0-SNAPSHOT.jar com.agartime.utad.Driver friends mutualfriends_input_sample.txt friends.out

You may create an Eclipse Project executing:

  $ mvn eclipse:clean eclipse:eclipse

Notes:

Histogram may be improved. There's some bug in the mappers (offset is used as value instead of Text). We use secondary sort but not in the best way because every single value will end in the same reducer so it won't scale good. Anyway, It can be a nice example about how to implement Secondary Sort. I would like to note that Group Comparator usage doesn't take place during combiner phase so min max should work ok. There's an issue already opened: https://issues.apache.org/jira/browse/MAPREDUCE-3310

About

mapreduce

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages