This a program written in python to carry out analysis on over 1 million log entries in a news site database. The program reports business critical information.
- Makes use of a postgresql database and queries.
- Outputs three critical reports in plain text.
- Executes complex reporting in a single database query.
_You will need to run this program on a Linux machine. If you are on a different operating system, here are some steps you can follow to get software ready. :)
- Download and install python version 2.7, download
- Download and install Oracle Virtual Machine Virtual box for your operating system.
- Download and install Vagrant for your operating system.
- Follow this tutorial to set up your local linux environment using the above downloaded resources.
- Download the news database here - News Site DB
- Use git to clone this respository and save a local copy on your machine.
- Unzip the database file and navigate to the folder location in your terminal.
- Copy the
newsdata.sqlinto the shared folder which your new linux VM can access. This is usually the same folder containing yourVagrantfile - Start the linux VM and
sshinto it, then change directory to the shared folder. - Next, run the following command to create the database and then connect to it.
$ psql -d news -f newsdata.sql
$ psql -d news
- Then, you'll want to run this line of code to create a view within your news database.
CREATE VIEW log_report AS
SELECT TO_CHAR(log.time:: DATE , 'Month dd,yyyy') AS day,
COUNT(*) AS requests,
COUNT(*) FILTER (WHERE status != '200 OK') AS error
FROM log
GROUP BY day;
- I recommend running some commands to verify everything's set up okay.
# list all the tables in the database : expect three: log, authors and articles
news=> \dt
# check view for records
news=> select * from log_report limit 10;
In your terminal navigate to your cloned version of this repo, you should see a file called logdb.py.
That's the file you want to run. Type in the following:
$ python logdb.py
- Ensure you are using Python 2 to run this.
- Did you create the log_report view? If you didn't you will get an error. Run the sql command above.