You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Kimball, Adam" <ad...@thermofisher.com> on 2021/01/29 17:48:22 UTC

Efficient text storage

Hi everyone,

I’ve got what I think might be a very basic question.  Looking forward to learning the answer!

I need to store some text in my graph.  I would expect that text to be quite repetitive such that many triples will want to point to the same bit of text:

     ex:myExample ex:hasText “admin” .
     ex:yourExample ex:hasText “admin” .
     ex:theirExample ex:hasText “admin .

It would be no problem to create a node for that specific bit of text, such that we could rewrite this as:

                ex:myExample ex:hasText ex:someText-admin .
     ex:yourExample ex:hasText ex:someText-admin .
     ex:theirExample ex:hasText ex:someText-admin .
     ex:someText-admin ex:value “admin” .

It complicates the model a tiny bit, but I’m fine doing it.  I just want to make sure that it makes good sense from a performance or footprint perspective.  Maybe Jena is already doing this behind the scenes?  If that is the case, I’d probably prefer to keep it simple and use the former method.

Any thoughts?
Adam


Re: Efficient text storage

Posted by Andy Seaborne <an...@apache.org>.

On 29/01/2021 17:48, Kimball, Adam wrote:
> Hi everyone,
> 
> I’ve got what I think might be a very basic question.  Looking forward to learning the answer!
> 
> I need to store some text in my graph.  I would expect that text to be quite repetitive such that many triples will want to point to the same bit of text:
> 
>       ex:myExample ex:hasText “admin” .
>       ex:yourExample ex:hasText “admin” .
>       ex:theirExample ex:hasText “admin .

In TDB, there is only one "admin" literal stored.

In a smaller graph in-memory, it probably isn't going to make 
performance difference.

     Andy

> 
> It would be no problem to create a node for that specific bit of text, such that we could rewrite this as:
> 
>                  ex:myExample ex:hasText ex:someText-admin .
>       ex:yourExample ex:hasText ex:someText-admin .
>       ex:theirExample ex:hasText ex:someText-admin .
>       ex:someText-admin ex:value “admin” .

In-memory (actually, in any parser run), URI nodes are cached and reused.

> 
> It complicates the model a tiny bit, but I’m fine doing it.  I just want to make sure that it makes good sense from a performance or footprint perspective.  Maybe Jena is already doing this behind the scenes?  If that is the case, I’d probably prefer to keep it simple and use the former method.
> 
> Any thoughts?

Don't optimize with data.

> Adam
> 
>