You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by George News <ge...@gmx.net> on 2017/10/11 09:11:50 UTC

Re: Backup TDB and named model

Hi,

Sorry for the late response on this issue but I was trying to solve
other stuff ;)

On 2017-09-25 16:29, ajs6f@apache.org wrote:
> 
> George News wrote on 9/19/17 6:19 AM:
>> Hi,
>>
> <snipped>
>>
>> Then the questions are:
>> - Is there any other way of moving data from one graph to another
>> without having to copy it? Would it be possible to implement a rename
>> option that only change the graph name in stead of copying/creating a
>> new graph with the whole set of triples?
> 
> Did you try using MOVE, as Andy already suggested?
> 
>>
>> - Is there any way of doing the backup using the shell in stead of
>> having to do it programmatically? I don't know if this option will make
>> it possible not to upload stuff to memory.
> 
> Did you try using tdbdump, as I already suggested?

Yes I did. It works to get an offline backup. Nice feature, although I
have to stop wildfly to get access to tdb data as the webservice I'm
developing is getting a pointer to the dataset.

sudo /opt/apache-jena-3.4.0/bin/tdbdump --loc=.  > /home/mine/test_all.quad

And then with grep I just store the graphs I want.
> 
>>
>> - I'm thinking on duplicating data when I'm storing it, so in stead of
>> having to copy and delete, I have to delete on graph as the other has
>> been created in parallel. This way I will avoid the problem with Java
>> heap space. Do you think is sensible? Performance wise I guess a the end
>> I will be spending the same amount of time, but deferring it on every
>> new entity registration.
>>
>> - For big TDB, what can I do to get better performance? Will it be
>> better if I move to Virtuoso as the triple store engine? Any other
>> option? I really need to improve performance :(
>
> Please give much more detail about what your data looks like, what kind
> of queries you are making, how they are arriving at the db, etc.

The final and adopted solution which is the quicker is to instead of
copy the graph and change the name, I create a graph with one name and
then a new graph. This way there is no need to copy data or perform any
operation over the TDB besides the new graph creation.

I'm opening a new thread to talk about performance on SPARQL sentences
and how I can speed up the system. This way we don't mix stuff.


> 
> ajs6f
> 
> 
> 
>>
>> Thanks a lot for you help
>>
>> Regards,
>> George
>>
>>
>>
>>
>>
>>
>> On 2017-08-30 12:20, Andy Seaborne wrote:
>>>
>>>
>>> On 30/08/17 08:36, george.news@gmx.net wrote:
>>>> *From: *Andy Seaborne <ma...@apache.org>
>>>> *Sent: *martes, 29 de agosto de 2017 19:31
>>>> *To: *users@jena.apache.org <ma...@jena.apache.org>
>>>> *Subject: *Re: Backup TDB and named model
>>>>
>>>> On 29/08/17 15:45, ajs6f@apache.org wrote:
>>>>
>>>>  > tdbdump (along with all of the TDB shell utilities) is available in
>>>> the
>>>>
>>>>  > Jena full distribution:
>>>>
>>>>  >
>>>>
>>>>  > https://jena.apache.org/download/index.cgi
>>>>
>>>>  >
>>>>
>>>>  >
>>>>
>>>>  > ajs6f
>>>>
>>>>  >
>>>>
>>>>  > George News wrote on 8/29/17 2:30 AM:
>>>>
>>>>  >> Hi,
>>>>
>>>>  >>
>>>>
>>>>  >> I have a named graph that is becoming very big, and therefore
>>>> searches
>>>>
>>>>  >> on it are quite slow. I'm planning on make a backup from time to
>>>> time
>>>>
>>>>  >> and reset the data in the original.
>>>>
>>>>  >>
>>>>
>>>>  >> The code that I'm currently using is the one below, which
>>>> summarizing
>>>>
>>>>  >> consists on creating a new graph based on the original one, delete
>>>> the
>>>>
>>>>  >> original and create it from scratch.
>>>>
>>>>  >>
>>>>
>>>>  >> public void reset() {
>>>>
>>>>  >>   dataset.begin(ReadWrite.WRITE);
>>>>
>>>>  >>   try {
>>>>
>>>>  >>     LocalDateTime date = LocalDateTime.now();
>>>>
>>>>  >>     DateTimeFormatter formatter =
>>>>
>>>>  >> DateTimeFormatter.ofPattern("yyyyMMddHHmm");
>>>>
>>>>  >>     String dateString = date.format(formatter);
>>>>
>>>>  >>     String backupModelName = modelName + "-" + dateString;
>>>>
>>>>  >>     dataset.addNamedModel(backupModelName, getModel());
>>>>
>>>> A SPARQL UPDATE of using "MOVE" is neater.
>>>>
>>>> For TDB, there is little choice but to do some kind of copy to rename.
>>>>
>>>> It is a change to the quads for the graph with no indirection to flip
>>>>
>>>> the name in the storage.
>>>>
>>>>       Andy
>>>>
>>>> Thanks. Can you provide an example please? When you say it’s neater,
>>>> isnit also quicker and more robust?
>>>
>>>
>>> Personal preference:
>>>
>>> Txn.executeWrite(dataset,()->
>>>              UpdateAction.parseExecute("MOVE <g1> TO <g2>")
>>>                 );
>>>
>>> You need to sort out the <g1> and <g2>
>>> (untested)
>>>
>>> MOVE works remotely.
>>> You could use the RDFConnection as well.
>>>
>>> It's not likely to be quicker - it's got to do the same amount of work
>>> and there is no TDB magic for this.
>>>
>>>     Andy
>>>
>>>>
>>>
>