<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Datablend &#187; tinkerpop</title>
	<atom:link href="https://datablend.be/?cat=25&#038;feed=rss2" rel="self" type="application/rss+xml" />
	<link>https://datablend.be</link>
	<description>Big Data Simplified</description>
	<lastBuildDate>Mon, 07 Sep 2015 09:04:17 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.6.1</generator>
		<item>
		<title>Graphs, Graphs, Graphs, &#8230;</title>
		<link>https://datablend.be/?p=271</link>
		<comments>https://datablend.be/?p=271#comments</comments>
		<pubDate>Tue, 24 Jul 2012 09:54:22 +0000</pubDate>
		<dc:creator>Davy Suvee</dc:creator>
				<category><![CDATA[blueprints]]></category>
		<category><![CDATA[datomic]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[mongodb]]></category>
		<category><![CDATA[tinkerpop]]></category>

		<guid isPermaLink="false">http://datablend22.lin3.nucleus.be/?p=271</guid>
		<description><![CDATA[Last week, Datablend open-sourced two new Tinkerpop Blueprints implementations: blueprints-mongodb-graph and blueprints-datomic-graph. Tinkerpop is an open source project that provides an entire stack of technologies within the Graph Database space. At the core of this stack is the Blueprints framework. Blueprints can be considered as the JDBC of Graph Databases. By providing a collection of<p><a href="https://datablend.be/?p=271">Continue Reading →</a></p>]]></description>
				<content:encoded><![CDATA[<p style="text-align: justify;">Last week, Datablend open-sourced two new <a href="http://www.tinkerpop.com/" target="_blank">Tinkerpop</a> <a href="https://github.com/tinkerpop/blueprints/wiki" target="_blank">Blueprints</a> implementations: <a href="https://github.com/datablend/blueprints-mongodb" target="_blank">blueprints-mongodb-graph</a> and <a href="https://github.com/datablend/blueprints" target="_blank">blueprints-datomic-graph</a>. Tinkerpop is an open source project that provides an entire stack of technologies within the Graph Database space. At the core of this stack is the Blueprints framework. Blueprints can be considered as the <span class="highlight">JDBC</span> of Graph Databases. By providing a collection of generic interfaces, it allows to develop graph-based applications without introducing explicit dependencies on concrete Graph Database implementations. Additionally, Blueprints provides concrete bindings for the <a href="http://neo4j.org" target="_blank">Neo4J</a>, <a href=" http://www.orientechnologies.com" target="_blank">OrientDB</a> and <a href=" http://www.sparsity-technologies.com/dex" target="_blank">Dex</a> Graph Databases. On top of Blueprints, the Tinkerpop team developed an entire range of graph technologies, including <a href="https://github.com/tinkerpop/gremlin/wiki" target="_blank">Gremlin</a>, a powerful, domain-specific language designed for traversing graphs. Hence, once a Blueprints binding is available for a particular Graph Database, an entire range of technologies can be leveraged.</p>
<p>&nbsp;</p>
<h3>1. mongoDB Graph</h3>
<p style="text-align: justify;">The <a href="http://www.mongodb.org/" target="_blank">mongoDB</a> Blueprints implementation provides users with a <span class="highlight"><em>scalable</span>, <span class="highlight">distributed</em></span> Graph Database implementation.  mongoDB graph does not require any information on the underlying physical mongoDB setup, so read/write scalability can easily be achieved through mongoDB&#8217;s natively supported <span class="highlight"><em>replication</em></span> and <span class="highlight"><em>sharding</em></span> functionalities. Some initial benchmarks (writing/reading 100.000 vertices where each 2 vertices are connected through an edge) on a single mongoDB node shows the following performance:</p>
<script src="https://gist.github.com/3169096.js"></script>
<p style="text-align: justify;">Not too shabby for running it on a single mongoDB instance. Currently, each write is executed as a singular commit to the mongoDB store. Although this process could be improved by performing writes through optimized <span class="highlight"><em>batches</em></span>,  we need to ensure that the transactional semantics of the Blueprints stack are still respected. Further performance optimizations will be released soon, so keep an eye on the <a href="https://github.com/datablend/blueprints-mongodb" target="_blank">blueprints-mongodb-graph </a>github project.</p>
<p>&nbsp;</p>
<h3>2. Datomic Graph</h3>
<p style="text-align: justify;"><a href="http://datomic.com" target='_blank'>Datomic</a> is a novel distributed database system designed to enable scalable, flexible and intelligent applications, running on next-generation cloud architectures. Datomic employs a powerful data model (based upon the concept of <span class="highlight"><em>Datoms</em></span>) and an expressive query language (based upon the concept of <span class="highlight"><em>Datalog</em></span>. Additionally, it introduces an explicit notion of <span class="highlight"><em>time</em></span>, which allows for the execution of queries against both the <span class="highlight"><em>previous</em></span> and <span class="highlight"><em>future states</em></span> of the database. The RDF and SPARQL feel of the Datomic data model and query approach makes it an ideal target for implementing a property graph. Hence, the <a href="https://github.com/datablend/blueprints" target="_blank">blueprints-datomic-graph</a> Blueprints implementation.</p>
<p style="text-align: justify;">Clever use the time-aware nature of the Datomic datastore, makes Datomic Graph the very first <span class="highlight"><em>distributed</em></span>, <span class="highlight"><em>temporal graph database</em></span>: users can perform queries against a specific version of the graph in the past. The code sample below illustrates how a time-aware social graph can be created, stored and queried through the Datomic Graph implementation.</p>
<script src="https://gist.github.com/3169442.js"></script>
<p>&nbsp;</p>
<p style="text-align: justify;">As one would expect, this outputs the following information:</p>
<script src="https://gist.github.com/3169500.js"></script>
<p>&nbsp;</p>
<p style="text-align: justify;">Datomic Graph does not only support versioning of the vertices and edges, but also on the properties of individual vertices/edges. Pretty slick, not? We are currently enhancing the implementation with additional time-based operations, including the <span class="highlight"><em>easy comparison of subgraphs</em></span> over time. So stay tuned!</p>
<p></p>]]></content:encoded>
			<wfw:commentRss>https://datablend.be/?feed=rss2&#038;p=271</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RDF data in Neo4J: the Tinkerpop story</title>
		<link>https://datablend.be/?p=252</link>
		<comments>https://datablend.be/?p=252#comments</comments>
		<pubDate>Sun, 17 Jul 2011 09:38:11 +0000</pubDate>
		<dc:creator>Davy Suvee</dc:creator>
				<category><![CDATA[neo4j]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[sail]]></category>
		<category><![CDATA[tinkerpop]]></category>

		<guid isPermaLink="false">http://datablend22.lin3.nucleus.be/?p=252</guid>
		<description><![CDATA[[information] My previous blog post discussed the use of Neo4J as a RDF triple store. Michael Hunger however informed me that the neo-rdf-sail component is no longer under active development and advised me to have a look at Tinkerpop&#8217;s Sail implementation. [/information] As mentioned in my previous blog post, I recently got asked to implement<p><a href="https://datablend.be/?p=252">Continue Reading →</a></p>]]></description>
				<content:encoded><![CDATA[[information]
<p style="text-align: justify;">My <a target='_blank' href="http://datablend.be/?p=411">previous blog post</a> discussed the use of <a target='_blank' href="http://neo4j.org">Neo4J</a> as a RDF triple store. <a target='_blank' href="http://twitter.com/#!/mesirii">Michael Hunger</a> however informed me that the <a target='_blank' href="http://components.neo4j.org/neo4j-rdf-sail/snapshot">neo-rdf-sail</a> component is no longer under active development and advised me to have a look at <a target='_blank' href="http://www.tinkerpop.com/">Tinkerpop&#8217;s</a> Sail implementation.</p>
[/information]
<p style="text-align: justify;">As mentioned in my <a target='_blank' href="http://datablend.be/?p=411">previous blog post</a>, I recently got asked to implement a storage and querying platform for biological <span class="highlight">RDF</span> (Resource Description Framework) data. Traditional RDF stores are not really an option as my solution should also provide the ability to <span class="highlight">calculate shortest paths</span> between <span class="highlight">random subjects</span>. Calculating shortest path is however one of the strong selling points of <span class="highlight">Graph Databases</span> and more specifically <a target='_blank' href="http://www.neo4j.org">Neo4J</a>. Unfortunately, the <a target='_blank' href="http://components.neo4j.org/neo4j-rdf-sail/snapshot">neo-rdf-sail</a> component, which suits my requirements perfectly, is no longer under active development. <a target='_blank' href="http://www.tinkerpop.com/">Tinkerpop&#8217;s</a> Sail implementation however, fills the void with an even better alternative!</p>
<p>&nbsp;</p>
<h3>1. What is Tinkerpop?</h3>
<p style="text-align: justify;"><a target='_blank' href="http://www.tinkerpop.com/">Tinkerpop</a> is an open source project that provides an entire stack of technologies within the Graph Database space. At the core of this stack is the <a target='_blank' href="https://github.com/tinkerpop/blueprints/wiki">Blueprints</a> framework. Blueprints can be considered as the <span class="highlight">JDBC</span> of Graph Databases. By providing a collection of generic interfaces, it allows to develop graph-based applications, without introducing explicit dependencies on concrete Graph Database implementations. Additionally, Blueprints provides concrete bindings for the Neo4J, <a target='_blank' href=" http://www.orientechnologies.com">OrientDB</a> and <a target='_blank' href=" http://www.sparsity-technologies.com/dex">Dex</a> Graph Databases. On top of Blueprints, the Tinkerpop team developed an entire range of graph technologies, including <a target='_blank' href="https://github.com/tinkerpop/gremlin/wiki">Gremlin</a>, a powerful, domain-specific language designed for traversing graphs. Hence, once a Blueprints binding is available for a particular Graph Database, an entire range of technologies can be leveraged.</p>
<p>&nbsp;</p>
<h3>2. Tinkerpop and Sail</h3>
<p style="text-align: justify;">Last time, I talked about exposing a Neo4J Graph Database (containing RDF triples) through the <em>Sail</em> interface, which is part of the the <em>Sail </em>interface. Once you have your <em>Sail </em> available, storing and querying RDF is analogous to the piece of code shown in my previous blog article.</p>
<script src="https://gist.github.com/1086222.js"></script>
<p>&nbsp;</p>
<p style="text-align: justify;">The two first lines of code require some more clarification. A <em>TransactionalGraph </em> can be run in <em>MANUAL </em>or <em>AUTOMATIC </em>transaction mode. In <em>AUTOMATIC </em>mode, transactions are basically ignored, in the sense that each <em>item  </em> that gets created is immediately persisted in the underlying Graph Database. Although this fits my needs, <em>AUTOMATIC </em>mode is extremely slow in case of Neo4J because of the continuous IO access. <em>MANUAL </em> mode on the other hand is very fast; a new transaction is created at the moment the import of the RDF data file starts and is only committed to the Neo4J data store once all RDF triples are parsed and created. Unfortunately, <em>MANUAL</em> mode does not scale either in my specific situation; as some of my RDF data files contain over 50 million RDF triples, they can not fit into memory (i.e. Java heap space error). Requiring fast imports, I extended the default Neo4J Blueprints binding to support intermediate commits. I based my implementation on Neo4J&#8217;s best practices for <a target='_blank' href="http://wiki.neo4j.org/content/Transactions#Big_transactions">big transactions</a>. The idea is rather simple: you specify the <span class="highlight">maximum number of  items</span> that can be kept in memory, before they should be committed to the Neo4J data store. Once this number is reached, the current transaction is committed and a new one is automatically started. Simple, but very effective!</p>
<script src="https://gist.github.com/1086243.js"></script>
[information]
<p style="text-align: justify;">Based upon the &#8220;MyNeo4jGraph&#8221;-idea described above, the BluePrints team extended their API to support <b>transaction buffers</b>. More information on its use can be found <a target='_blank' http://groups.google.com/group/gremlin-users/msg/7b73f2e367ef5de5?pli=1">here</a> and <a target='_blank' href="https://github.com/tinkerpop/blueprints/issues/146">here</a>.</p>
[/information]
<p>&nbsp;</p>
<h3>3. Shortest path calculation</h3>
<p style="text-align: justify;">Although Blueprints allows you to abstract away the Neo4J implementation details, it still provides you with access to the raw Neo4J data store if needed. Hence, one can still use the graph algorithms provided in the <a target='_blank' href="http://components.neo4j.org/neo4j-graph-algo/1.4">neo4j-graph-algo</a> component to calculate shortest paths between random subjects. The complete source code can be found on the <a target='_blank' href="https://github.com/datablend/neo4j-sail-test">Datablend public GitHub repository</a>.</p>
<p></p>]]></content:encoded>
			<wfw:commentRss>https://datablend.be/?feed=rss2&#038;p=252</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
