You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Fredah B <fr...@gmail.com> on 2015/05/07 14:29:33 UTC

Requesting clarification on Jena internal structure with regards to compression

Dear Team,


I plan on using your SPARQL engine for my project implementation. I’m
impressed by the tremendous work you have put in to make this engine a
success however I did notice that the underlying infrastructure and
compression technique used are encapsulated. I need to fully understand how
the data is processed from start to finish especially with regards to the
compression. Are there by any chance papers that have been written that
cover the compression and decompression used in your engine or is it
possible to refer me to someone who may be able to explain it to me?


Also, is compression default or is turned on and off depending on the data
load of the system? I was also wondering how you store the data internally.
As in, what format is the data stored? Is it an internally created
representation or one of the standard RDF representations?


I would really appreciate your assistance in answering these questions and
look forward to hearing from you soon.


Best Regards,


Fredah

Re: Requesting clarification on Jena internal structure with regards to compression

Posted by Andy Seaborne <an...@apache.org>.
Fredah message was sent to a number of RDF system lists. The text isn't 
directed at Jena specifically, it is the same text to all lists.

Does using a dictionary count as compression?  TDB (and SDB) use a term 
dictionary (= node table) unlike, say the old RDB system that stored 
terms inline in the triple table.

Compression in TDB is interesting for scale but it is a tradeoff.  By 
using memory mapped files, the on disk form and accessed form in Java 
are the same.  So the "decompress on read" style does not apply; memory 
mapped files avoid that copy/decompress and only necessary bytes are 
touched.  And the OS does the work of caching and it's quite well tuned 
for that.

	Andy

On 10/05/15 17:32, Claude Warren wrote:
> Fredah,
>
> As far as I know the only compression in the system is in the interaction
> with remote systems where the compression flag can be enabled to compress
> HTTP/S responses from Fuseki and from federated queries.
>
> I suppose some storage engines could implement compression but that would
> be on an engine by engine basis.
>
> How the data are stored are also determined on an engine by engine basis.
>  From what I can tell most implementations use TDB (a native storeage
> engine), Andy Seaborne would be able to speak to how that stores data but
> it is a native format with several indexes.  Another possible storage
> engine is SDB, but that is mostly retired.  It uses a relational database
> to store the data in several tables with several indexes.  There is an in
> memory engine, and I have implemented a bloom filter based engine built on
> top of a relational storage model.  I suspect there are other storage
> engines available but I don't know what they are or how they are
> implemented.
>
> Claude
>
> On Thu, May 7, 2015 at 1:29 PM, Fredah B <fr...@gmail.com> wrote:
>
>> Dear Team,
>>
>>
>> I plan on using your SPARQL engine for my project implementation. I’m
>> impressed by the tremendous work you have put in to make this engine a
>> success however I did notice that the underlying infrastructure and
>> compression technique used are encapsulated. I need to fully understand how
>> the data is processed from start to finish especially with regards to the
>> compression. Are there by any chance papers that have been written that
>> cover the compression and decompression used in your engine or is it
>> possible to refer me to someone who may be able to explain it to me?
>>
>>
>> Also, is compression default or is turned on and off depending on the data
>> load of the system? I was also wondering how you store the data internally.
>> As in, what format is the data stored? Is it an internally created
>> representation or one of the standard RDF representations?
>>
>>
>> I would really appreciate your assistance in answering these questions and
>> look forward to hearing from you soon.
>>
>>
>> Best Regards,
>>
>>
>> Fredah
>>
>
>
>


Re: Requesting clarification on Jena internal structure with regards to compression

Posted by Claude Warren <cl...@xenei.com>.
Fredah,

As far as I know the only compression in the system is in the interaction
with remote systems where the compression flag can be enabled to compress
HTTP/S responses from Fuseki and from federated queries.

I suppose some storage engines could implement compression but that would
be on an engine by engine basis.

How the data are stored are also determined on an engine by engine basis.
>From what I can tell most implementations use TDB (a native storeage
engine), Andy Seaborne would be able to speak to how that stores data but
it is a native format with several indexes.  Another possible storage
engine is SDB, but that is mostly retired.  It uses a relational database
to store the data in several tables with several indexes.  There is an in
memory engine, and I have implemented a bloom filter based engine built on
top of a relational storage model.  I suspect there are other storage
engines available but I don't know what they are or how they are
implemented.

Claude

On Thu, May 7, 2015 at 1:29 PM, Fredah B <fr...@gmail.com> wrote:

> Dear Team,
>
>
> I plan on using your SPARQL engine for my project implementation. I’m
> impressed by the tremendous work you have put in to make this engine a
> success however I did notice that the underlying infrastructure and
> compression technique used are encapsulated. I need to fully understand how
> the data is processed from start to finish especially with regards to the
> compression. Are there by any chance papers that have been written that
> cover the compression and decompression used in your engine or is it
> possible to refer me to someone who may be able to explain it to me?
>
>
> Also, is compression default or is turned on and off depending on the data
> load of the system? I was also wondering how you store the data internally.
> As in, what format is the data stored? Is it an internally created
> representation or one of the standard RDF representations?
>
>
> I would really appreciate your assistance in answering these questions and
> look forward to hearing from you soon.
>
>
> Best Regards,
>
>
> Fredah
>



-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren