Project overview

The final result of this project is an interactive force directed graph displaying relationships between various Hollywood movies. Click here to view and interact with the graph.

Each circle (node) represents a movie. Similar movies are linked together by lines (edges). The darker the line connecting two movies, the more similar the two movies are to each other. Colours on the node represent movie genre.

Data Source

I used Wikipedia movie plots from Kaggle.

Identify movie similarity

How do we determine if two movies are similar to each other?

First, map the movie summary to vectors, which are essentially a collection of numbers. To do so, each word in the summary will be mapped to a number. A word will receive a higher score if it is uniquely found in the summary, and has multiple occurences in the summary. For example, the movie Alice in the Wonderland might have a summary like this:

Alice chases a rabbit down a hole.... Alice encounters many strange things and confronts a mad queen....

The word "Alice" will be mapped to a larger number because we can expect multiple mentions of the word "Alice" (she is the main character after all), and summaries of most other movies will not contain the word "Alice".

In Data Science speak, we have converted our texts to tf-idf vectors. Now that we have converted each movie to numbers, we can compute how similar any two movies are to each other. This is done using cosine similarity.

If two movies are represented by the exact same vector (very unlikeley), then the cosine similarity between these two movies is 1. The lowest possible similarity value is 0.

All movies linked together with an edge (i.e. line) in the network graph have a cosine similarity with its counterpart of at least 0.5.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
data-processing		data-processing
image		image
src/textgraph		src/textgraph
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project overview

Data Source

Identify movie similarity

About

Uh oh!

Releases

Packages

Languages

rouenlee29/text-graph

Folders and files

Latest commit

History

Repository files navigation

Project overview

Data Source

Identify movie similarity

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages