You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Adeel Ahmad <aa...@gmail.com> on 2012/07/12 21:06:55 UTC

Fwd: jena/sesame

Hello,

fwd...

Thanks,



---------- Forwarded message ----------
From: Adeel Ahmad <aa...@gmail.com>
Date: 12 July 2012 01:01
Subject: jena/sesame
To: dev@jena.apache.org


Hello,

I wanted to query about the performance, scalability, and portability
issues with jena and sesame.
If I am correct both can be utilized with tinkerpop? However, sesame
provides a direct sail interface?

What I wanted to get your opinion on is it better to load the data using
jena but then query it using sesame? Or, should such load and extract steps
be implemented from the same api?
I find that jena is fast in terms of loading but very slow in terms of
quering data where as the sesame works better in the opposite approach. I
have noticed the adapter bridge between jena and sesame in some respects
provides a solution for that. Is there a roadmap towards merging their
functionality or perhaps making them equally compatible with each other
atleast in terms of how they work with RDF related data? Also, I noticed
that sesame has more of a portability for other languages apart from just
java, whereas jena seems to be more focused towards java support. Are there
any plans of making the bridge for python and php support such as seen on
redland or for making the interfaces more tangible in use across different
databases and lucene/hadoop streams?

-- 
Thanks,

Adeel Ahmad



-- 
Thanks,

Adeel Ahmad

Re: Fwd: jena/sesame

Posted by Andy Seaborne <an...@apache.org>.

On 17/07/12 04:23, Adeel Ahmad wrote:
> Hello,
>
> I was not hinting at any graph related algorithms as RDF is an
> object-predicate-subject graph structure itself. I was merely trying to
> assess the differential between Sesame and Jena toolset and the usability
> of the two in different big data related applications, especially when RDF
> utilizes XML

RDF/XML is not the common syntax at scale.  Large scale data is Turtle 
or N-Triples, and the jena parsers are quite fast on those.   Jena's 
RDF/XML is precise; and RDF/XML has lots of features that should be 
checked for correct parsing.

> which can be slow to work with in a huge data set and the
> resolution of predicate-based graphs for a) path searching and b)dynamic
> relationship and association finding. I did some benchmarking on my use of
> the tools and came to the same conclusion. I have found Jena

Can you share the benchmarks?

And which version? Which storage layer?

> to be a bit
> limiting in worst and average case scenarios so have tended to stick with
> sesame/allegrograph or lower level rdflibs for optimization and flexibility
> alongside sparql queries. Jena I feel still needs to grow in terms of
> language integration support and a more stable set of algorithmic
> performance at least in average/worst case - as those usually tend to be
> normal case scenarios in the real-world, best case rarely ever happens
> after it reaches past production. I understand Jena is able to do alot more
> things with OWL but I don't find plausible why I need to limit the
> reasoning on handcrafted ontologies. I feel RDF/RDFS/RDFa with Sparql are
> sufficient. Why induce the automated inference ability to logical reasoning
> and I find that is what OWL actually does. It is not very practical to
> handcraft all the ontologies of the web especially in a changing landscape
> making data more static in process and error prone and not very contextual
> friendly for linked data. I am finding that Semantic Web tools are lagging
> behind in supporting large-scale practical application usage and are still
> clinging to a scientific approach to experimental development work where
> scalability, reliability, and performance don't really come into play as
> much compared to the accuracy of result generation.

That's not how Jena TDB needs to be used.

You can use the inference engine and a decent part of OWL but that's not 
the scalable solution as the engine is in-memory.  You can process RDFS 
at scale by inferring on load and running at raw speed at runtime.

	Andy

>
> Thanks,
>
> Adeel
>
> On 12 July 2012 22:55, Andy Seaborne <an...@apache.org> wrote:
>
>> On 12/07/12 20:06, Adeel Ahmad wrote:
>>
>>> Hello,
>>>
>>> fwd...
>>>
>>> Thanks,
>>>
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Adeel Ahmad <aa...@gmail.com>
>>> Date: 12 July 2012 01:01
>>> Subject: jena/sesame
>>> To: dev@jena.apache.org
>>>
>>>
>>> Hello,
>>>
>>> I wanted to query about the performance, scalability, and portability
>>> issues with jena and sesame.
>>> If I am correct both can be utilized with tinkerpop? However, sesame
>>> provides a direct sail interface?
>>>
>>> What I wanted to get your opinion on is it better to load the data using
>>> jena but then query it using sesame? Or, should such load and extract
>>> steps
>>> be implemented from the same api?
>>> I find that jena is fast in terms of loading but very slow in terms of
>>> quering data where as the sesame works better in the opposite approach. I
>>> have noticed the adapter bridge between jena and sesame in some respects
>>> provides a solution for that. Is there a roadmap towards merging their
>>> functionality or perhaps making them equally compatible with each other
>>> atleast in terms of how they work with RDF related data? Also, I noticed
>>> that sesame has more of a portability for other languages apart from just
>>> java, whereas jena seems to be more focused towards java support. Are
>>> there
>>> any plans of making the bridge for python and php support such as seen on
>>> redland or for making the interfaces more tangible in use across different
>>> databases and lucene/hadoop streams?
>>>
>>>
>> I'm not completely clear in what context your asking this.
>>
>> SPARQL is the standard RDF query language and is supported (client,
>> server) by both Jena and Sesame.  The graph systems your hinting at for
>> graph analysis problems are of a somewhat different style.
>>
>> I don't know of a tinkerpop blueprint for Jena; there is one for Sesame -
>> but layering other graph languages like this does not lead to performance.
>>   They end up using a triple-by-triple access to the data when the
>> query/storage engines want larger chunks to work with and be able to
>> utilize large volume join algorithms.
>>
>> The last time I used neo4j and it's RDF incarnation, the performance and
>> scale was not good.  A few million triples tops.  The indexing is not
>> suitable for joins - it supports the find-walk style of algorithm.
>>
>>> I find that jena is fast in terms of loading but very slow in terms of
>>> quering data where as the sesame works better in the opposite approach. I
>>> have noticed the adapter bridge between jena and sesame in some respects
>>> provides a solution for that.
>>
>> To what figures are you referring?
>>
>> If the data is loaded into system X, it has to be accessed at the lowest
>> level by X.
>>
>> All the adapter does is provide one API over the other - the query engine
>> used is the same as the storage.
>>
>> SPARQL gives you client-server separation but the Jena query engine
>> implements SPARQL for Jena storage and Sesame does for Sesame storage. You
>> can't (with performance) load one and then use the other (the file formats
>> are different)
>>
>>          Andy
>>
>

Re: Fwd: jena/sesame

Posted by Adeel Ahmad <aa...@gmail.com>.

Hello,

I was not hinting at any graph related algorithms as RDF is an
object-predicate-subject graph structure itself. I was merely trying to
assess the differential between Sesame and Jena toolset and the usability
of the two in different big data related applications, especially when RDF
utilizes XML which can be slow to work with in a huge data set and the
resolution of predicate-based graphs for a) path searching and b)dynamic
relationship and association finding. I did some benchmarking on my use of
the tools and came to the same conclusion. I have found Jena to be a bit
limiting in worst and average case scenarios so have tended to stick with
sesame/allegrograph or lower level rdflibs for optimization and flexibility
alongside sparql queries. Jena I feel still needs to grow in terms of
language integration support and a more stable set of algorithmic
performance at least in average/worst case - as those usually tend to be
normal case scenarios in the real-world, best case rarely ever happens
after it reaches past production. I understand Jena is able to do alot more
things with OWL but I don't find plausible why I need to limit the
reasoning on handcrafted ontologies. I feel RDF/RDFS/RDFa with Sparql are
sufficient. Why induce the automated inference ability to logical reasoning
and I find that is what OWL actually does. It is not very practical to
handcraft all the ontologies of the web especially in a changing landscape
making data more static in process and error prone and not very contextual
friendly for linked data. I am finding that Semantic Web tools are lagging
behind in supporting large-scale practical application usage and are still
clinging to a scientific approach to experimental development work where
scalability, reliability, and performance don't really come into play as
much compared to the accuracy of result generation.

Thanks,

Adeel

On 12 July 2012 22:55, Andy Seaborne <an...@apache.org> wrote:

> On 12/07/12 20:06, Adeel Ahmad wrote:
>
>> Hello,
>>
>> fwd...
>>
>> Thanks,
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Adeel Ahmad <aa...@gmail.com>
>> Date: 12 July 2012 01:01
>> Subject: jena/sesame
>> To: dev@jena.apache.org
>>
>>
>> Hello,
>>
>> I wanted to query about the performance, scalability, and portability
>> issues with jena and sesame.
>> If I am correct both can be utilized with tinkerpop? However, sesame
>> provides a direct sail interface?
>>
>> What I wanted to get your opinion on is it better to load the data using
>> jena but then query it using sesame? Or, should such load and extract
>> steps
>> be implemented from the same api?
>> I find that jena is fast in terms of loading but very slow in terms of
>> quering data where as the sesame works better in the opposite approach. I
>> have noticed the adapter bridge between jena and sesame in some respects
>> provides a solution for that. Is there a roadmap towards merging their
>> functionality or perhaps making them equally compatible with each other
>> atleast in terms of how they work with RDF related data? Also, I noticed
>> that sesame has more of a portability for other languages apart from just
>> java, whereas jena seems to be more focused towards java support. Are
>> there
>> any plans of making the bridge for python and php support such as seen on
>> redland or for making the interfaces more tangible in use across different
>> databases and lucene/hadoop streams?
>>
>>
> I'm not completely clear in what context your asking this.
>
> SPARQL is the standard RDF query language and is supported (client,
> server) by both Jena and Sesame.  The graph systems your hinting at for
> graph analysis problems are of a somewhat different style.
>
> I don't know of a tinkerpop blueprint for Jena; there is one for Sesame -
> but layering other graph languages like this does not lead to performance.
>  They end up using a triple-by-triple access to the data when the
> query/storage engines want larger chunks to work with and be able to
> utilize large volume join algorithms.
>
> The last time I used neo4j and it's RDF incarnation, the performance and
> scale was not good.  A few million triples tops.  The indexing is not
> suitable for joins - it supports the find-walk style of algorithm.
>
> > I find that jena is fast in terms of loading but very slow in terms of
> > quering data where as the sesame works better in the opposite approach. I
> > have noticed the adapter bridge between jena and sesame in some respects
> > provides a solution for that.
>
> To what figures are you referring?
>
> If the data is loaded into system X, it has to be accessed at the lowest
> level by X.
>
> All the adapter does is provide one API over the other - the query engine
> used is the same as the storage.
>
> SPARQL gives you client-server separation but the Jena query engine
> implements SPARQL for Jena storage and Sesame does for Sesame storage. You
> can't (with performance) load one and then use the other (the file formats
> are different)
>
>         Andy
>

Re: Fwd: jena/sesame

Posted by Andy Seaborne <an...@apache.org>.

On 12/07/12 20:06, Adeel Ahmad wrote:
> Hello,
>
> fwd...
>
> Thanks,
>
>
>
> ---------- Forwarded message ----------
> From: Adeel Ahmad <aa...@gmail.com>
> Date: 12 July 2012 01:01
> Subject: jena/sesame
> To: dev@jena.apache.org
>
>
> Hello,
>
> I wanted to query about the performance, scalability, and portability
> issues with jena and sesame.
> If I am correct both can be utilized with tinkerpop? However, sesame
> provides a direct sail interface?
>
> What I wanted to get your opinion on is it better to load the data using
> jena but then query it using sesame? Or, should such load and extract steps
> be implemented from the same api?
> I find that jena is fast in terms of loading but very slow in terms of
> quering data where as the sesame works better in the opposite approach. I
> have noticed the adapter bridge between jena and sesame in some respects
> provides a solution for that. Is there a roadmap towards merging their
> functionality or perhaps making them equally compatible with each other
> atleast in terms of how they work with RDF related data? Also, I noticed
> that sesame has more of a portability for other languages apart from just
> java, whereas jena seems to be more focused towards java support. Are there
> any plans of making the bridge for python and php support such as seen on
> redland or for making the interfaces more tangible in use across different
> databases and lucene/hadoop streams?
>

I'm not completely clear in what context your asking this.

SPARQL is the standard RDF query language and is supported (client, 
server) by both Jena and Sesame.  The graph systems your hinting at for 
graph analysis problems are of a somewhat different style.

I don't know of a tinkerpop blueprint for Jena; there is one for Sesame 
- but layering other graph languages like this does not lead to 
performance.  They end up using a triple-by-triple access to the data 
when the query/storage engines want larger chunks to work with and be 
able to utilize large volume join algorithms.

The last time I used neo4j and it's RDF incarnation, the performance and 
scale was not good.  A few million triples tops.  The indexing is not 
suitable for joins - it supports the find-walk style of algorithm.

 > I find that jena is fast in terms of loading but very slow in terms of
 > quering data where as the sesame works better in the opposite approach. I
 > have noticed the adapter bridge between jena and sesame in some respects
 > provides a solution for that.

To what figures are you referring?

If the data is loaded into system X, it has to be accessed at the lowest 
level by X.

All the adapter does is provide one API over the other - the query 
engine used is the same as the storage.

SPARQL gives you client-server separation but the Jena query engine 
implements SPARQL for Jena storage and Sesame does for Sesame storage. 
You can't (with performance) load one and then use the other (the file 
formats are different)

	Andy