Analysis of My Books on goodreads.com

 I analyzed my collection of books on the site goodreads.com. I did this using Excel. I used functions and Power Query to clean the data. I used Pivot Tables, Pivot Charts, and Splicers to visualize the data. The data was downloaded from goodreads.com as a csv file.

I first checked for, and removed, any duplicate books. 

You can expand the pics below by clicking on them. 

Power Query

At the site goodreads.com, books are placed into categories called "Bookshelves". Each book is placed by default into a "read" Bookshelf or a "to-read" Bookshelf. Each book can also be placed into additional Bookshelves created by the user. Below is an example (screenshot), for one book, from the Bookshelves column:


I used Power Query to separate that cell into 4 columns:


I  did the same to split the Author column (not shown) into 2 columns: Last Name and First Name.

Functions

The Default GR Shelf and the Main Shelf columns (seen above) were used for the visualizations. I used the following function to ensure that the Main Shelf did not have any leading spaces in any cells:

     =TRIM(MID(H2,FIND(LEFT(H2,1),H2),LEN(H2)))

This was done because Excel treated some Bookshelves with the same name as different entries, because some cells had a leading space while in another cell, with the same Bookshelf, there was not a leading space. For example, "fiction" and " fiction" (with a space) were treated as different entries. The TRIM function above was used to eliminate that leading space. 

Blank Bookshelf cells were investigated and filled individually. 

Pivot Tables, Pivot Charts, and Splicers

Below is the Pivot Table and Pivot Chart for the entire book collection, i.e. for "read" and "to-read" Bookshelves. The Fiction category had the highest number of books, followed by Sciences and Math.





Below is the Pivot Table, Pivot Chart, and Splicers for my "read" books. Remember that these books are further divided into my personally created Bookshelves. 



The Splicers enabled me to modify the chart according to Bookshelf or Rating. Also, I could change the chart to "to-read" with a Splicer. 

The chart shows (top-right corner) I have read 429 books. If I wanted to know how many 5 star books I have read, I would modify the My Rating Splicer:

 


This would also modify the chart:


I have read 152 books that I considered excellent, i.e. 5 star books. 



Comments