MapReduce - Deliverable

This is a set of deliverable exercises regarding mapreduce.

Histogram (com.agartime.utad.mapreduce.histogram) - Gets the Distribution Histogram of N-Bars given N float numbers.
FriendsOfMyFriends (com.agartime.utad.mapreduce.friendsofmyfriends) - Gets the Friend Of My Friends in a Social Graph of pairs (FriendOrigin, FriendDestiny)

Requirements:

maven

Compilation:

From the folder containing pom.xml file:

  $ mvn clean install

After a successful build, you will find a .jar file into the target directory (or in your local repository):

  ./target/mapreduce*.jar

You can execute the exercises in Hadoop, typing:

   hadoop jar ./target/mapreduce-1.0-SNAPSHOT.jar com.agartime.utad.Driver

Histogram:

Classes:

    * HistogramFlow - Flow

    * HistogramMinMaxJob - First Job

    * HistogramDistributorJob - Second Job

    * RangeWritable - Writable for a Range of two numbers.

    * CkIdRange - Composed Key for an Id and a Range <id,RangeWritable>.

    * NumToRangeMapper - First job mapper for min/max calculation. 

    * MinMaxRangeReducer - First job combiner/reducer for min/max calculation. Writables are received in order. There we get first and last value.

* BarDistributorMapper - Second job mapper. Calculates the bar for a specific number.

    * BarSummatorReducer - Second job reducer. Emits every bar and its sum.

    * IdRangeComparator - Sort Comparator for Range in a composed key.

    * GroupIdComparator - Grouping comparator.

    * IdPartitioner - Partitioner.

Usage:

   hadoop jar ./target/mapreduce-1.0-SNAPSHOT.jar com.agartime.utad.Driver friends input_file output_file n_bars

Example:

   hdfs dfs -put ./src/main/resources/histogram_input_sample.txt                               
   hadoop jar ./target/mapreduce-1.0-SNAPSHOT.jar com.agartime.utad.Driver histogram histogram_input_sample.txt histogram.out 30

Friends Of My Friends:

Classes:

   * FriendsOfMyFriendsJob - Job

   * DirectionalRelationshipWritable - Writable for storing directional relationships.

   * FriendAndReversalMapper - Mapper for emit a friendship and its reversal.

   * MutualRelationshipFinderReducer - Retrieve friends of my friends.

Usage:

   hadoop jar ./target/mapreduce-1.0-SNAPSHOT.jar com.agartime.utad.Driver friends input_file output_file

Example:

   hdfs dfs -put ./src/main/resources/mutualfriends_input_sample.txt 
   hadoop jar ./target/mapreduce-1.0-SNAPSHOT.jar com.agartime.utad.Driver friends mutualfriends_input_sample.txt friends.out

You may create an Eclipse Project executing:

  $ mvn eclipse:clean eclipse:eclipse

Notes:

Histogram may be improved. There's some bug in the mappers (offset is used as value instead of Text). We use secondary sort but not in the best way because every single value will end in the same reducer so it won't scale good. Anyway, It can be a nice example about how to implement Secondary Sort. I would like to note that Group Comparator usage doesn't take place during combiner phase so min max should work ok. There's an issue already opened: https://issues.apache.org/jira/browse/MAPREDUCE-3310

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.settings		.settings
src/main		src/main
target/classes		target/classes
.classpath		.classpath
.gitattributes		.gitattributes
.gitignore		.gitignore
.project		.project
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MapReduce - Deliverable

Requirements:

Compilation:

Histogram:

Friends Of My Friends:

Notes:

About

Uh oh!

Releases

Packages

Languages

condestable2000/mapreduce

Folders and files

Latest commit

History

Repository files navigation

MapReduce - Deliverable

Requirements:

Compilation:

Histogram:

Friends Of My Friends:

Notes:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages