GitHub - hsinghal-wisc/GraphX

Implement PageRank using Spark GraphX and then run some queries on it.

(1)Part-B Application-1 Question-1

Initially we associate degree with each vertices using join operation, based on degree value each edge attribute his initialized with weight 1/outdegree and vertex attributes are initialized with initial page rank values. PageRank stored as vertex attributes is updated using aggregate messages for 20 iterations.

(2)Part-B Application-2 Question-1

VertexRDD is constructed with interval number as vertexId and top-words in the interval as attributes. EdgeRDD is created filtering cartesian of vertexRDD with itself, filtering criteria is both vertices in cartesian should have at least one common attribute and their id’s should not be same (to remove self loop) . Further, we create the graph using VertexRDD and EdgeRDD.

Then we calculate number of edges having number of words in the source vertex strictly larger than the number of words in the destination vertex by applying filter on triplet view of graph based on filtering criteria that size of source vertex attribute should be greater than that of destination vertex.

(3)Part-B Application-2 Question-2

VertexRDD is constructed with interval number as vertexId and top-words in the interval as attributes. EdgeRDD is created filtering cartesian of vertexRDD with itself, filtering criteria is both vertices in cartesian should have at least one common attribute and their id’s should not be same (to remove self loop) . Further, we create the graph using VertexRDD and EdgeRDD.

Then we find list of vertices that have the maximum number of edges if number of such vertices is 1 then the vertex is most popular vertex. If number of vertices having maximum out degree is more than we create a vertexRDD by join operation on vertices having maximum outdegree with vertices on the graph. Resultant vertexRDD is sorted based on size of attribute set. Sorted RDD is collected in an Array of vertices, vertex at end of the array is most popular vertex.

(4)Part-B Application-2 Question-3

VertexRDD is constructed with interval number as vertexId and top-words in the interval as attributes. EdgeRDD is created filtering cartesian of vertexRDD with itself, filtering criteria is both vertices in cartesian should have at least one common attribute and their id’s should not be same (to remove self loop) . Further, we create the graph using VertexRDD and EdgeRDD.

Then we calculate average number of words in neighbour vertices by calculating total number of words in neighbour vertices and count of neighbour vertices using aggregateMessage. Ratio of total number of vertices in neighbour to count of vertices is required average number of words in neighbour vertices.

(5)Part-B Extra Credit-1

VertexRDD is constructed with interval number as vertexId and top-words in the interval as attributes. EdgeRDD is created filtering cartesian of vertexRDD with itself, filtering criteria is both vertices in cartesian should have at least one common attribute and their id’s should not be same (to remove self loop) . Further, we create the graph using VertexRDD and EdgeRDD.

(6)Part-B Extra Credit-2

VertexRDD is constructed with interval number as vertexId and top-words in the interval as attributes. EdgeRDD is created filtering cartesian of vertexRDD with itself, filtering criteria is both vertices in cartesian should have at least one common attribute and their id’s should not be same (to remove self loop) . Further, we create the graph using VertexRDD and EdgeRDD.

(7)Part-B Extra Credit-3

VertexRDD is constructed with interval number as vertexId and top-words in the interval as attributes. EdgeRDD is created filtering cartesian of vertexRDD with itself, filtering criteria is both vertices in cartesian should have at least one common attribute and their id’s should not be same (to remove self loop) . Further, we create the graph using VertexRDD and EdgeRDD.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
PartBApplication1Question1.scala		PartBApplication1Question1.scala
PartBApplication2Question1.scala		PartBApplication2Question1.scala
PartBApplication2Question2.scala		PartBApplication2Question2.scala
PartBApplication2Question3.scala		PartBApplication2Question3.scala
PartBApplication2Question4.scala		PartBApplication2Question4.scala
README.md		README.md
group-34-part-b.pdf		group-34-part-b.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

hsinghal-wisc/GraphX

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages