<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datablend &#187; spatial</title>
	<atom:link href="http://datablend.be/?cat=31&#038;feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://datablend.be</link>
	<description>Big Data Simplified</description>
	<lastBuildDate>Mon, 07 Sep 2015 09:04:17 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.6.1</generator>
		<item>
		<title>Running along the graph using Neo4J Spatial and Gephi</title>
		<link>http://datablend.be/?p=262</link>
		<comments>http://datablend.be/?p=262#comments</comments>
		<pubDate>Wed, 04 Jan 2012 09:48:13 +0000</pubDate>
		<dc:creator>Davy Suvee</dc:creator>
				<category><![CDATA[gephi]]></category>
		<category><![CDATA[neo4j]]></category>
		<category><![CDATA[spatial]]></category>
		<category><![CDATA[visualisation]]></category>

		<guid isPermaLink="false">http://datablend22.lin3.nucleus.be/?p=262</guid>
		<description><![CDATA[When I started running some years ago, I bought a Garmin Forerunner 405. It&#8217;s a nifty little device that tracks GPS coordinates while you are running. After a run, the device can be synchronized by uploading your data to the Garmin Connect website. Based upon the tracked time and GPS coordinates, the Garmin Connect website<p><a href="http://datablend.be/?p=262">Continue Reading →</a></p>]]></description>
				<content:encoded><![CDATA[<p style="text-align: justify;">When I started running some years ago, I bought a <a target='_blank' href="https://buy.garmin.com/shop/shop.do?pID=11039&#038;ra=true#owners">Garmin Forerunner 405</a>. It&#8217;s a nifty little device that tracks GPS coordinates while you are running. After a run, the device can be synchronized by uploading your data to the <a target='_blank' href="http://connect.garmin.com">Garmin Connect website</a>. Based upon the tracked time and GPS coordinates, the Garmin Connect website provides you with a detailed overview of your run, including <span class="highlight"><em>distance</em></span>, <span class="highlight"><em>average pace</em></span>, <span class="highlight"><em>elevation loss/gain</em></span> and <span class="highlight"><em>lap splits</em></span>. It also visualizes your run, by overlaying the tracked course on Bing and/or Google maps. Pretty cool! One of my last runs can be found <a target='_blank' href="http://connect.garmin.com/activity/138373187">here</a>.</p>
<p style="text-align: justify;">Apart from <span class="highlight"><em>simple aggregations</em></span> such as total distance and average speed, the Garmin Connect website provides little or no support to gain deeper insights in all of my runs. As I often run the same course, it would be interesting to calculate my <span class="highlight"><em>average pace at specific locations</em></span>. When combining the data of all of my courses, I could deduct <span class="highlight"><em>frequently encountered locations</em></span>. Finally, could there be a <span class="highlight"><em> correlation</em></span> between my <span class="highlight"><em>average pace</em></span> and my <span class="highlight"><em>distance from home?</em></span> In order to come up with answers to these questions, I will import my running data into a <a target='_blank' href="https://github.com/neo4j/spatial">Neo4J Spatial</a> datastore. Neo4J Spatial extends the <a target='_blank' href="http://neo4j.org/">Neo4J Graph Database</a> with the necessary tools and utilities to store and query spatial data in your graph models. For visualizing my running data, I will make use of <a target='_blank' href="http://gephi.org/">Gephi</a>, an open-source visualization and manipulation tool that allows users to interactively browse and explore graphs.</p>
<p>&nbsp;</p>
<h3>1. Extracting GPX data</h3>
<p style="text-align: justify;">The Garmin Connect website allows to download running data through various formats, including <span class="highlight"><em>KML</em></span>, <span class="highlight"><em>TCX</em></span> and <span class="highlight"><em>GPX</em></span>. <a target='_blank' href="http://topografix.com/gpx.asp">GPX</a> (the GPS Exchange Format) is a light-weight XML data format that is used for interchanging GPS data (waypoints, routes, and tracks) between applications and web services. Below, you can find a GPX extract enumerating several tracked points. Each of these points contains the <span class="highlight"><em>GPS location</em></span>, the <span class="highlight"><em>elevation</em></span> and the corresponding <span class="highlight"><em>timestamp</em></span>.</p>
<script src="https://gist.github.com/1559458.js"></script>
<p>&nbsp;</p>
<p style="text-align: justify;">Based upon this data, one is able to calculate various metrics, including <span class="highlight"><em>pace</em></span>. For this, we will use <a target='_blank' href="http://gpstools.sourceforge.net/">GPSdings</a>, a Java library that provides the required functionality to extract and analyze GPX data. We start by reading in a GPX file. Afterwards, we <span class="highlight"><em>analyze</em></span> the content using the GPSdings <em>TrackAnalyzer</em> which, amongst other metrics, calculates the pace for each point that was tracked during a run. The information we need is stored in the first segment of the first track.</p>
<script src="https://gist.github.com/1559808.js"></script>
<p>&nbsp;</p>
<h3>2. Importing GPS data in Neo4J Spatial</h3>
<p style="text-align: justify;"><span class="highlight"><em>Neo4J Spatial</em></span> is build on top of <span class="highlight"><em>Neo4J</em></span> and provides support for <span class="highlight"><em>spatial data</em></span>. Once your data is stored, <span class="highlight"><em>spatial operations</em></span> can be executed, which for instance allow to search for data within specified regions or within a specified distance of a particular point of interest. We start by setting up a Neo4J <em>EmbeddedGraphDatabase</em>. We then wrap it as a <em>SpatialDatabaseService</em>, which allows us to create an <em>EditableLayer</em>. <em>EditableLayer</em> is Neo4J&#8217;s main abstraction, which is used to define a <span class="highlight"><em>collection of geometries</em></span>. Each layer needs to be initialized with a specific <em>GeometryEncoder</em>, which acts a kind of adapter to map from the graph to the geometries and vice versa. In our case, we will employ the <em>SimplePointEncoder</em>.</p>
<script src="https://gist.github.com/1559893.js"></script>
<p>&nbsp;</p>
<p style="text-align: justify;">Adding spatial data to the running layer is very easy. We start by creating a <em>Coordinate</em> for each point that is parsed by GPSdings. Next, we add this new coordinate to the running layer. This operation returns a <em>SpatialDatabaseRecord</em> which, under the hood, is just a <span class="highlight"><em>regular Neo4J node</em></span>. Hence, we can add any property we want to this node. In our case, we will add two properties. One property, named <span class="highlight"><em>speed</em></span>, indicating the (average) pace. One property, named <span class="highlight"><em>occurrences</em></span>, indicating the number of times this particular coordinate was encountered in the overall data set. Once the new coordinate is created, we connect the previous node with the newly created node through the <span class="highlight"><em>NEXT</em></span> relationship type. Hence, our graph is an <span class="highlight"><em>enumeration</em></span> of the encountered coordinates, <span class="highlight"><em>interlinked</em></span> through NEXT edges.</p>
<script src="https://gist.github.com/1559954.js"></script>
<p>&nbsp;</p>
<p style="text-align: justify;">In case a coordinate is encountered multiple times, we <span class="highlight"><em>recalculate the average speed</em></span> and <span class="highlight"><em>increment the number of encounters</em></span>.</p>
<script src="https://gist.github.com/1560142.js"></script>
<p>&nbsp;</p>
<p style="text-align: justify;">Unfortunately, chances are low to encounter an already existing coordinate, as coordinates in a GPX file have a 15-digit precision right of the decimal point. Instead of trying to <span class="highlight"><em>round</em></span> these coordinates ourselves, we will use the <span class="highlight"><em>Neo4J Spatial querying API</em></span>. A simple <span class="highlight"><em>nearest neighbor</em></span>-search limited to 20 meters allows us to find matching coordinates. (I choose 20 meters, as 20 is a little above the average distance between two coordinates). In case we find a coordinate within this 20-meter range, we will <span class="highlight"><em>reuse</em></span> it. Otherwise, we just create a <span class="highlight"><em>new coordinate</em></span>. The full algorithm for importing multiple GPX datasets can be found below.</p>
<script src="https://gist.github.com/1560218.js"></script>
<p>&nbsp;</p>
<h3>3. Visualizing running data</h3>
<p style="text-align: justify;">By using the <span class="highlight"><em>Neo4J Spatial querying API</em></span>, we are able to retrieve the set of coordinates that satisfy a particular condition. However, coordinates are somewhat <span class="highlight"><em>abstract</em></span> to interpret. Instead, we will use the excellent Gephi Graph visualization and exploration tool. By installing the <a target='_blank' href="http://gephi.org/tag/neo4j/">Gephi Neo4J plugin</a>, we are able to load and explore graphs that are stored in a Neo4J (Spatial) datastore. Let&#8217;s start by <span class="highlight"><em>importing</em></span>  our dataset in Gephi.</p>
<p><a target='_blank' href="http://datablend.be/wp-content/uploads/geo1.jpg">
<p align="center"><img width="550" src="http://datablend.be/wp-content/uploads/geo1.jpg" alt="gephi" /></p>
<p></a></p>
<p style="text-align: justify;">The displayed graph contains other types of nodes and edges (i.e. <em>Layer </em>and <em>RTree </em>index information), in addition to the coordinates and NEXT edges that we added ourselves. Let&#8217;s get rid of those by <span class="highlight"><em>filtering our graph</em></span> on the NEXT relationship-type.</p>
<p><a target='_blank' href="http://datablend.be/wp-content/uploads/geo2.jpg">
<p align="center"><img width="550" src="http://datablend.be/wp-content/uploads/geo2.jpg" alt="gephi" /></p>
<p></a>  </p>
<p style="text-align: justify;">Only half of the edges remain &#8230; However, we will still not gain novel insights from this mess. Let&#8217;s layout our graph by using the <a target='_blank' href="http://gephi.org/plugins/geolayout/">Gephi GeoLayout plugin</a>. This layouter takes <span class="highlight"><em>geocoded graphs</em></span> as input and will layout graphs according to the geocoded attributes. Make sure to increase scaling, as our coordinates are located closely together. Cool! This view clearly outlines the courses I&#8217;m running.</p>
<p><a target='_blank' href="http://datablend.be/wp-content/uploads/geo3.jpg">
<p align="center"><img width="550" src="http://datablend.be/wp-content/uploads/geo3.jpg" alt="gephi" /></p>
<p></a></p>
<p style="text-align: justify;">Let&#8217;s visualize the coordinates that were <span class="highlight"><em>frequently encountered</em></span> during the 4 runs that are imported in the Neo4J Spatial datastore. For this, we will use the <span class="highlight"><em>InDegree</em></span> node property, which indicates <span class="highlight"><em>the number of incoming edges</em></span> for each coordinate. We rank <span class="highlight"><em>node weight</em></span> (i.e. node size) through this property. Hence, frequently encountered nodes will show up bigger. In my case, frequently encountered coordinates are found around the place where I live (and hence start my runs) and on street intersections.</p>
<p><a target='_blank' href="http://datablend.be/wp-content/uploads/geo4.jpg">
<p align="center"><img width="550" src="http://datablend.be/wp-content/uploads/geo4.jpg" alt="gephi" /></p>
<p></a></p>
<p style="text-align: justify;">Let&#8217;s do one final analysis, namely a visualization that illustrates the <span class="highlight"><em>average pace throughout all runs</em></span>. For this, we rank both <span class="highlight"><em>node weight</em></span> and <span class="highlight"><em>node color</em></span> through the <span class="highlight"><em>speed</em></span> property. Hence, coordinates with a high average pace are colored green and show up bigger. Coordinates with a low average pace are colored red and show up smaller. With the blink of an eye, I can now interpret my average pace, taking into account my overall running data set!</p>
<p><a target='_blank' href="http://datablend.be/wp-content/uploads/geo5.jpg">
<p align="center"><img width="550" src="http://datablend.be/wp-content/uploads/geo5.jpg" alt="gephi" /></p>
<p></a></p>
<p>&nbsp;</p>
<h3>4. Conclusion</h3>
<p style="text-align: justify;">This article describes the use of the <span class="highlight"><em>Neo4J Spatial datastore</em></span> and <span class="highlight"><em>Gephi</em></span> to analyze Garmin running data. As always, the complete source code can be found on the <a target='_blank' href="https://github.com/datablend/neo4j-spatial-running">Datablend public GitHub repository</a>. Any ideas for other types of analysis that could be performed on the dataset?</p>
<p></p>]]></content:encoded>
			<wfw:commentRss>http://datablend.be/?feed=rss2&#038;p=262</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
