Project Gutenberg is a significant initiative aimed at preserving cultural heritage by converting old literature into eBook format. This project involved analyzing a corpus of 1176 books to extract essential information about each book, including details about the author, title, word count, and number of unique words.
To gain insights into the writing styles of various authors, we identified the top 10 writers based on their works. Comparing their writing styles with those of other authors, we sought to discern distinctive characteristics that set them apart.
Additionally, we conducted an analysis to determine the top 10 most frequently used words across all the books in the corpus. This provided valuable information about common themes or language patterns prevalent throughout the collection.
Further investigation led us to focus on the writing style of the most frequently repeated author, W.W. Jacobs. Through a detailed examination of his works, we found that his writing style is characterized by a natural tone.
Furthermore, we attempted to analyze the lexical diversity over the years to understand how language and writing styles evolved. However, we encountered challenges in obtaining precise publishing dates, as the available data represented the date of conversion to eBook format rather than the original publication date. Consequently, we were unable to establish a consistent pattern for tracking lexical changes over time.
In conclusion, this project has provided valuable insights into the literary works available through Project Gutenberg. By analyzing writing styles and frequently used words, we have gained a deeper understanding of the corpus's content and notable authors. Our efforts to uncover the writing style of W.W. Jacobs further contributed to this exploration. While challenges prevented a comprehensive analysis of lexical diversity over the years, the project remains a valuable endeavor in preserving and appreciating classic literature.