Why visualizations are so useful
Jacob Sieber
February 15, 2018
Understanding a nonprofit through visualizations
Visualizations are an incredibly useful tool to communicate and gain insights about data in a descriptive manner. Besides, who doesn’t love a good looking picture? One of the simplest and most powerful tools to create visualizations is ggplot2, an R package. The data that is going to be used is about Multiple Sclerosis Society Bike Rides, provided by the MS Society. MS Ride is a fundraising tool that hosts charity bike rides all over the United States. The first bike ride event was in 1980, and it still goes on today. Through visualizations alone, we can learn important details of Bike MS and maybe even learn some details that only can be seen through visual tools. Let’s start with visualizing the annual revenue over the past 5 years to get a general idea of the financial performance.
Oh no! Looks like we are in some financial downturn. According to the data we received, MS Ride has dropped over 10 million in yearly revenue since 2013. This finding gives us a new question to base our new visualizations on. This is question is: what is the cause of this 10 million dollar decline?
Why is there such a decline in revenue?
Google Trend insights
Maybe the decline in revenue is due to a decline in MS Society’s popularity. It is reasonable to assume that the popularity of a nonprofit is positively associated with the amount of donations it receives. We can use Google Trend data as a rough baseline for the popularity of MS Society on the web. Google Trend data shows the relative amount of times that a term has been searched over a period of time.
gt_NMS %>%
ggplot(aes(x = Date)) +
geom_line(aes( y = `National Multiple Sclerosis Society: (United States)` , color = Year), show.legend = FALSE) +
labs(x = 'Year', y ='National MS Search Popularity')
Just look at the general decline in search popularity. This is a strong indication that MS Society has seen a decline in digital presence over time. Let’s see if this trend is wide spread among other similar nonprofits for rare diseases.
gt_non_profits %>%
ggplot(aes(x = Date)) +
geom_line(aes( y = `Leukemia & Lymphoma Society: (United States)`, color = "Leukemia & Lymphoma Society")) +
geom_line(aes( y = `Arthritis Foundation: (United States)`, color = "Arthritis Foundation")) +
geom_line(aes( y = `American Lung Association: (United States)`, color = "American Lung Association")) +
geom_line(aes( y = `National Multiple Sclerosis Society: (United States)`, color = "National Multiple Sclerosis Society")) +
scale_colour_manual("",
breaks = c("Leukemia & Lymphoma Society", "Arthritis Foundation", "American Lung Association", "National Multiple Sclerosis Society"),
values = c("green", "gold", "blue", "red")) +
labs(y = 'Realative Popularity', x = 'Year') +
theme(legend.position = 'bottom')
Yes, it looks like there has been a uniform decline in the popularity of similar nonprofits. Perhaps this is a sign of a changing nonprofit landscape rather than a change in the individual organizations. We can look for further research in order to confirm this theory.
Competition
Another reason that MS Bike Rides decline could be a surge of competition in recent years. After all, the service that MS Bike Rides provides is easily imitable and has low barriers to entry. Lets try to find out where MS Bike Rides expect to make most of their money in 2018.
Texas_Rides %>%
filter(!is.na(Expected_Revenue)) %>%
arrange(desc(Expected_Revenue)) %>%
ggplot(aes(x = fct_reorder(City,Expected_Revenue), y =Expected_Revenue, fill = State)) +
geom_bar(stat = 'identity') +
coord_flip() +
xlab('City') + ylab('Expected Revenue')
We can clearly see here that the Bike MS ride in Houston, Texas is the greatest revenue generator, expected to bring in over 15 million dollars. This dwarfs all other bike ride events by an extremely large margin. We should place a special importance on this bike ride event, we can investigate this history for this event and why it is such an outlier at a later time. Without a doubt, Texas is our most important state with three rides in 2018. It would be a wise idea to see all the rivaling charity bike rides in Texas during 2018. We can do this through a map visualization combined with data from http://www.wheelbrothers.com/Texas-bike-rides/.
map <- get_map(location = 'Texas', zoom=6, source = 'google')
ggmap(map, base_layer = ggplot(data = Texas_Rides, aes(Lat,Long))) +
geom_point(aes(color = ifelse(is.na(Expected_Revenue), 'red','blue'))) +
theme(legend.position = 'none')
MS Ride Events are in red
Here we can see that there is a plethora of competition for MS Ride to deal with in their most productive state. Additionally, this map doesn’t include walks, hikes, and runs that take place all over Texas. Unfortunately it seems as though the nonprofit biking ‘industry’ is heading towards zero economic profit. It is strange to think of a nonprofit as a competitive business entity. However as this chart shows, whether competing in a football league or to treat multiple sclerosis, competition always rears its head when the getting is good.
Diving deeper into the data
Now that we have established some generalities of Bike MS, it’s time to dig deeper into the data. If we choose to view the revenue problem of Bike MS as a function of increasing competitiveness in the nonprofit industry, what might Bike MS want to do overcome the competition? A differentiation strategy might be a good place to start. One claim that the MS Society makes, is firm belief that MS Society has a strong relationship with corporate partners. This could be a strong source of competitive advantage for Bike MS, as a corporate partnership brings in resources some charities only dream about. Let’s see if we can verify this claim of strong corporate support.
by_employer <- Donations %>%
rename(Company=`Donor Employer`) %>%
group_by(Company) %>%
summarise(Donation = sum(`Gift Amount($)`),
Different_people = n(),
Avg_donation = mean(`Gift Amount($)`),
Median_donation = median(`Gift Amount($)`)) %>%
arrange(desc(Donation)) %>%
mutate(Donation_log = log(Donation))
by_employer %>%
filter(!(is.na(Company))) %>%
top_n(10,Donation) %>%
ggplot(aes(fct_reorder(Company,Donation),Donation)) +
geom_bar(stat = 'identity', fill = 'blue') +
coord_flip() +
xlab('Source of donation')
We do have clear evidence of dirty data, the ‘self’ and ‘Self’ columns. This implies that we may have serious dirty data issues under the surface, but for now we can get a good general idea of some of the top donors behind gifts. It looks like BP takes the lead, with around 1.5 million in donations from their employees. This visualization serves as evidence that MS Bikes do in fact have lucrative relationships with large corporate sponsors.
Insights about donations
We can also take a look at whether more donations are smaller in nature or larger in nature.
large_donation <- Donations %>%
filter(`Gift Amount($)` > 500) %>%
summarise(mean = sum(`Gift Amount($)`))
small_donation <- Donations %>%
filter(`Gift Amount($)` <= 500) %>%
summarise(mean = sum(`Gift Amount($)`))
labels <- tibble(label = c('Less or equal to $500', 'Greater than $500'))
bind_rows(small_donation,large_donation) %>%
bind_cols(labels) %>%
ggplot(aes(label,mean, fill = label)) +
geom_col() +
theme(legend.position = "none") +
ylab('Total Dollar Value of Donations')+
xlab('Amount Donated')
It appears as though most donations, while coming from those employed from corporations, are typically a small amount under 500 dollars. We can also see the distribution of the donations under 500 dollars and see that the majority are from 50 - 100 dollars.
Donations %>%
filter(`Gift Amount($)` < 600) %>%
ggplot(aes(`Gift Amount($)`, fill = `Fiscal Year`)) +
geom_histogram(binwidth = 25, fill = 'dark blue') +
ylab('Count')
***
Wrapping it up
We now have a good idea about the data and why the data is the way it is. We can explore further without flying blind, leading to insights that could perhaps turn around MS Society’s decline in revenue.
Who knew these little pictures could tell us so much?