Our blog posts
17/08

The power of Manchester City: a data analysis

What makes Manchester City such a great team? The infographic below illustrates one of the teams most powerful characteristics: its successful passing capability. The visualisation is based upon the Opta dataset released in August 2011, containing the high detailed Bolton vs Manchester City match statistics. The data has been loaded in the neo4j graph databases

Continue Reading →

27/05

Coalition-Cocktail – Hacking the Elections @ Engagor

Last weekend, Engagor organised their hacktheelections hackaton. The Datablend team (Quentin, Stijn and Davy) was joined by Marc Broos, Tim Coene and Josbert van de Zande with one goal in mind: trying to visualise the (pre-arranged?) political coalition and, if possible, also predict the formation-period. Technically, we extracted over 160K tweets through the Engagor API.

Continue Reading →

12/05

Datablend lanceert vk14-bingo.be

Wordt U ook overladen met informatie in verband met de komende verkiezingen? Bent U, net zoals zo vele andere burgers, op zoek naar een eenvoudig alternatief waarbij U in 1 oogopslag kunt zien waar elke partij voor staat? Zoek niet langer en maak gebruik van vk14-bingo.be. We hebben voor U de verschillende partijprogramma’s woord voor

Continue Reading →

02/12

The power of graphs to analyse biological data

Watch Davy Suvee present at GraphConnect London 2013 on the power of graph databases to analyse biological datasets. The Power of Graphs to Analyze Biological Data – Davy Suvee @ GraphConnect London 2013 from Neo Technology on Vimeo.

01/12

Yelp graph: checkin-based business clustering

Recently, Yelp made available a sample dataset from the greater Phoenix metropolitan area including around 11.000 business, 8000 checkin-sets, 43.000 users and 230.000 user reviews. With the help of this data, data scientists can execute real-life experiments with various data mining/machine learning algorithms. In our case, we are interested in finding out whether it is possible

Continue Reading →

11/02

Counting triangles smarter (or how to beat Big Data vendors at their own game)

A few months ago, I discovered Vertica’s “Counting Triangles”-article through Prismatic. The blog post describes a number of benchmarks on counting triangles in large networks. A triangle is detected whenever a vertex has two adjacent vertices that are also adjacent to each other. Imagine your social network; if two of your friends are also friends

Continue Reading →

04/02

Similr: blazingly fast chemical similarity searches

Today, Datablend announces Similr to be available for beta sign-up. Similr allows scientist (both from academics and enterprise) to quickly search for compounds that exhibit a particular chemical structure. It employs a wide range of fingerprinting algorithms, which combined, allow to identify matching compounds in millisecond time. Similr’s functionalities are available through a flexible and

Continue Reading →

29/01

Redis and Lua: a NoSQL power-horse

Recently, I’ve started implementing a number of Redis-based solutions for a Datablend customer. Redis is frequently referred to as the Swiss Army Knife of NoSQL databases and rightfully deserves that title. At its core, it is an in-memory key-value datastore. Values that are assigned to keys can be ‘structured’ through the use of strings, hashes,

Continue Reading →

16/10

Hubway Data Visualization Challenge Entry: the flow of bikers

Last week, Hubway announced its Data Visualization Challenge. Hubway is a bike sharing system located in the Boston area: you simply pick up a bike at a particular station and drop it off at the closest station near your destination. For this challenge, Hubway released a CSV-file, containing over half a million rides. Each entry

Continue Reading →

29/08

8 things I like about Datomic

Yesterday evening, the 9th BigData.be MeetUp was organised at the offices of NGDATA in Ghent. With 45 people showing up, this was our best attended MeetUp till know, illustrating the growing popularity of Big Data in Belgium. The meeting had a line-up of three presentations: Kenny Helsens, who presented a wrap-up of the zimmo.be project

Continue Reading →

24/07

Graphs, Graphs, Graphs, …

Last week, Datablend open-sourced two new Tinkerpop Blueprints implementations: blueprints-mongodb-graph and blueprints-datomic-graph. Tinkerpop is an open source project that provides an entire stack of technologies within the Graph Database space. At the core of this stack is the Blueprints framework. Blueprints can be considered as the JDBC of Graph Databases. By providing a collection of

Continue Reading →

12/04

Back To The Future with Datomic

At the beginning of March, Rich Hickey and his team released Datomic. Datomic is a novel distributed database system designed to enable scalable, flexible and intelligent applications, running on next-generation cloud architectures. Its launch was surrounded with quite some buzz and skepticism, mainly related to its rather disruptive architectural proposal. Instead of trying to recapitulate

Continue Reading →

11/03

Circle through your Google Analytics data with Neo4J and Circos

Storing massive amounts of data in a NoSQL data store is just one side of the Big Data equation. Being able to visualize your data in such a way that you can easily gain deeper insights, is where things really start to get interesting. Lately, I’ve been exploring various options for visualizing (directed) graphs, including

Continue Reading →

08/02

The joy of algorithms and NoSQL revisited: the MongoDB Aggregation Framework

[information] Part 1 of this article describes the use of MongoDB to implement the computation of molecular similarities. Part 2 discusses the refactoring of this solution by making use of MongoDB’s build-in map-reduce functionality to improve overall performance. Part 3 finally, illustrates the use of the new MongoDB Aggregation Framework, which boosts performance beyond the

Continue Reading →

04/01

Running along the graph using Neo4J Spatial and Gephi

When I started running some years ago, I bought a Garmin Forerunner 405. It’s a nifty little device that tracks GPS coordinates while you are running. After a run, the device can be synchronized by uploading your data to the Garmin Connect website. Based upon the tracked time and GPS coordinates, the Garmin Connect website

Continue Reading →

21/11

Visualizing RDF Schema inferencing through Neo4J, Tinkerpop, Sail and Gephi

Last week, the Neo4J plugin for Gephi was released. Gephi is an open-source visualization and manipulation tool that allows users to interactively browse and explore graphs. The graphs themselves can be loaded through a variety of file formats. Thanks to Martin Škurla, it is now possible to load and lazily explore graphs that are stored

Continue Reading →

11/10

The (non-)sense of NoSQL O(R)M frameworks

NoSQL seems to be ready for prime time. Several NoSQL companies, including 10gen (MongoDB), DataStax (Cassandra) and Neo Technology (Neo4J), recently received millions in funding to expand their (commercial) NoSQL offerings. Even Oracle is now entering the already crowded NoSQL-space with its very own key-value NoSQL Database 11g. No doubt that this type of publicity

Continue Reading →

11/08

The joy of algorithms and NoSQL: a MongoDB example (part 2)

[information] Part 1 of this article describes the use of MongoDB to implement the computation of molecular similarities. Part 2 discusses the refactoring of this solution by making use of MongoDB’s build-in map-reduce functionality to improve overall performance. [/information] In part 1 of this article, I described the use of MongoDB to solve a specific

Continue Reading →

07/08

The joy of algorithms and NoSQL: a MongoDB example (part 1)

[information] Part 1 of this article describes the use of MongoDB to implement the computation of molecular similarities. Part 2 discusses the refactoring of this solution by making use of MongoDB’s build-in map-reduce functionality to improve overall performance. [/information] In one of my previous blog posts, I debated the superficial idea that you should own

Continue Reading →

17/07

RDF data in Neo4J: the Tinkerpop story

[information] My previous blog post discussed the use of Neo4J as a RDF triple store. Michael Hunger however informed me that the neo-rdf-sail component is no longer under active development and advised me to have a look at Tinkerpop’s Sail implementation. [/information] As mentioned in my previous blog post, I recently got asked to implement

Continue Reading →

10/07

Storing and querying RDF data in Neo4J through Sail

[information] This blog post discusses the use of Neo4J as a RDF triple store. Michael Hunger however informed me that the neo-rdf-sail component is no longer under active development and advised me to have a look at Tinkerpop’s Sail implementation. Read the updated version of this article here. [/information] 1. Introduction Recently, I got asked

Continue Reading →

06/07

Should Big Data always be Big?

Yesterday evening the first BigData.be MeetUp was organized at the IBBT in Ghent. The intention of this meeting is to bring together Belgian Big Data and NoSQL enthusiasts. It’s an ideal opportunity to share thoughts and experiences with a mix of people, each having different backgrounds and levels of expertise with Big Data and NoSQL.

Continue Reading →

24/06

Big Data Genomics – How to efficiently store and retrieve mutation data

[information] This blog post is the first one in a series of articles that describe the use of NoSQL databases to efficiently store and retrieve mutation data. 1. Part one introduces the notion of mutation data and describes the conceptual use of the Cassandra NoSQL datastore. [/information] The only way to learn a new technology

Continue Reading →

19/06

Big Data / Concise Articles

Through this blog, we will try to keep you posted on our NoSQL escapades. Our goal is to provide you biweekly with a (technical) article that highlights a particular NoSQL aspect. Each technical article will be a self-contained piece of knowledge that can easily be digested and applied. Code samples will be made available through our github repository. Expect

Continue Reading →