One-fourth of our beer cabinet in the office

Beer & Data

Insights using Untappd, Neo4j, Datawrapper, Pandas & Jupyter

--

Friday drinks are something special on the 3rd floor of RTL Netherlands. That’s where the Data Intelligence team and other colleagues usually gather for the Beer Club. Every week we share a great new beer. Now 134 beers and counting. Great material for a fun data project! In this blog, I use Jupyter notebook to enrich the dataset with Untappd, analyze with Pandas, visualize with Datawrapper, and explore with Neo4j. The Jupyter notebooks are available on GitHub.

Beer Club

The tradition was started by Philip Meyfroyt and Joel Fridriksson in October 2017. Every week a different colleague brings her or his favorite special beer. The dedication and diversity are a result of the club’s 8 rules:

Credits to Joan Chan

Data collection

All featured beers are noted in a shared Google Sheet. With a Jupyter Notebook and the Google Sheets API the data is loaded and stored as a Pandas data frame, containing the date and beer name.

But little about the beers itself. Luckily there are multiple beer platforms, like BeerAdvocate, BreweryDB, Catalog.beer, OpenBeerDB, and Untappd. I like Untappd the best. It is a social platform for checking in, rating, and discovering beers. It has a great database on beers, and just as important: a free API. Only new beers are retrieved via the API, as it has a rate limit of 100 calls per hour. This data is concatenated with the earlier stored data as csv-file.

Excluding home brews, all 130 beers were found on Untappd. A great score! As a check, the beer name in the spreadsheet is compared to the beer name of Untappd. For this comparison, the partial ratio of FuzzyWuzzy is applied. The mean score of 92.2, the standard deviation of 12.5, and the minimum of 58 confirm the correctness of the full match. This data enrichment results in 27 additional columns like beer_name, beer_label, beer_abv, beer_style, brewery_name, brewery_type, brewery_city, lat, lng, and url. Valuable information for analysis and visualization.

Analysis

For the analysis, the Pandas plot method is used. Simple and sufficient for some statistics.

Beer type

We love Belgian beers: Belgian Tripel was brought 17 times, followed by Belgian Strong Golden Ale and IPA-American (both 8x), and Hefeweizen, Belgian Blonde (all 7x). In total there 50 different beer styles featured in our Beer Club.

Brewery

Although we prefer Belgian style beers, Brewery Jopen (6 beers) from Haarlem is featured most. This microbrewery is founded in 1992 and nowadays it is located in the beautiful Jacobskerk (“Jopen church”). Jopen is followed by Abbaye de Leffe and Gulpener Bierbrouwerij (both 5x). Most of the beers come from microbreweries (62 %). Which is also stimulated. But macro breweries (24 %) are also allowed. And a couple of regional breweries, bars, nano breweries, and brewpubs are represented.

Seasonal trends

The alcohol by volume is a well-known metric of beer. Grouped and averaged by month there is a small trend visible. In the winter stronger beers are brought, while in the summer lighter beers are preferred.

Brewery location

Most of our beers are brewed in The Netherlands (54 %) or Belgium (32 %). With Amsterdam (14x), Haarlem and Utrecht (both 9x) as the top originating cities. Most colleagues bring beers from their hometown or birthplace.

With the latitude and longitude information, these breweries can be plotted on a map. Datawrapper is a great tool for that, with an API. There is a Python wrapper available, but I prefer the API. The code below shows how to place a pin.

As most of our beers come from the Netherlands and Belgium, the maps are split between these two countries and the world.

Neo4j

Playing with this beer data shows the interconnectedness between the different beers. They share the style, brewery, or origin for example. This kind of data is perfect for graph database Neo4j: “Neo4j is a highly scalable native graph database, purpose-built to leverage not only data but also data relationships.” It is open source and the most popular graph database. Talking about beer and Neo4j, Belgian Rik van Bruggen is well known for combining these two in demos and blogs. With a simple Docker command, a Neo4j instance is started and available in the browser at localhost:7474.

docker run -d — publish=7474:7474 — publish=7687:7687 — env NEO4J_AUTH=none neo4j

Neo4j uses the query language Cypher which is in some way a combination of SQL and graph theory. I experience it as very straight forward. Neo4j works with nodes and edges, both can hold attributes. To fire queries to the database a driver with the Bolt protocol is initiated. With this first query, its create or update a beer node (b:beer) and connect it using an edge -[:STYLE]-> with its style node (s:style). Additional queries create or update the brewery and its location. This procedure creates 362 nodes with 427 edges.

Neo4j is ideal for walking the path. For example, there are two beers I like, and I want to know their relation to discover other beers that I may prefer. The shortest path operator, an implementation of Dijkstra’s algorithm, easily retrieves this. This query (borrowed from Rik) returns a path that discovers a Dutch brew that also belongs to the California Common style: Op Stoom by Gulpener Bierbrouwerij.

MATCH (d:beer {beer_name:"Kompaan 58 Handlanger"}), (o:beer {beer_name: "Anchor Steam Beer"}), path = shortestpath((d)-[*]-(o)) 
RETURN path
Shortestpath

Let’s say you like the style of Tripel Karmeliet, but you want to give a Dutch version a try. Then you can go for the Helse Engel by Brouwerij de Hemel from Nijmegen.

MATCH (b1:beer {beer_name:”Tripel Karmeliet”})--(s:style)--(b2:beer) --(r:brewery)--(y:city)--(t:state)--(c:country {country_name:”Netherlands”})
RETURN b1,s,b2,r,y,t,c
Similar style

Closing time

In these uncommon times also our Beer Club turned virtual. Which is far from the same. Meanwhile, there is enough data to explore. On GitHub, you find all Jupyter notebooks. The data set is not available and I keep my credentials in a .env file, retrieved using python-dotenv. I hope you stay curious and discover great new beers.

Cheers!

2019 Beer Club edition with our CEO Sven Sauvé

--

--