You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Dominique Vandensteen <do...@cogni.zone> on 2016/03/18 10:16:00 UTC

arq:spillToDiskThreshold issue

Hi,
I'm having problems handling "big" graphs (50M to 100M triples at current
stage) in my fuseki servers using sparql.
The 2 actions I need todo are "DROP GRAPH <...>" and "MOVE <...> TO <...>".
Doing these action with these graphs I get OutOfMemory errors. Some
investigation pionted me to http://markmail.org/message/hjisrglx4eicrxyt
and
http://mail-archives.apache.org/mod_mbox/jena-users/201504.mbox/%3CCAJ+MTwad1vfcnjArO37xKiwgYj7mRniLLZVmSx1_nrJ+RRf56Q@mail.gmail.com%3E

Using this config:
<#yourdatasetname> rdf:type tdb:DatasetTDB ;
   ja:context [ ja:cxtName "tdb:transactionJournalWriteBlockMode" ;
ja:cxtValue "mapped" ] ;
   ja:context [ ja:cxtName "arq:spillToDiskThreshold" ; ja:cxtValue
10000 .
] .
Solves my problem but brings up another problem. My temp folder gets filled
up with JenaTempByteBuffer-...UUID...tmp files until my disk is full. These
files remain locked so I cannot delete them.
The files seem to be created
by org.apache.jena.tdb.base.file.BufferAllocatorMapped but are for some
reason not released.
Is there any way to work around this issue?

I'm using
-fuseki 2.3.1
-jvm 1.8.0_25 64bit
-windows 10




Dominique Vandensteen
Head of development

+ 32 474 870856
domi.vds@cogni.zone
skype: domi.vds

Re: arq:spillToDiskThreshold issue

Posted by Dominique Vandensteen <do...@cogni.zone>.
I don't think my solution is a solution that could be used in production.
Anyway, the patched file is in attachement.

D.

On 27/03/2016 13:56, Andy Seaborne wrote:
> On 19/03/16 13:35, Dominique Vandensteen wrote:
>> I don't think having enough memory is a working solution because we will
>> need the big amount of memory only on rare occasions, so most of the
>> time the memory will be "wasted".
>>
>> During my investigation I came up with 2 causes of the problem:
>> 1. The close method of
>> org.apache.jena.tdb.base.file.BufferAllocatorMapped is never called.
>> I quickly fixed this by adding a ThreadLocal which is used to close all
>> instances at transaction end. I will clean this up and use a
>> WeakReference which in my opinion is a cleaner solution.
>>
>> 2. An issue in the JVM that is described here:
>> http://stackoverflow.com/questions/13065358/java-7-filechannel-not-closing-properly-after-calling-a-map-method#32062298 
>>
>>
>>
>> By implementing these 2 fixes I was able to use the
>> arq:spillToDiskThreshold option in windows.
>
> Great - do you have a patch or pull request for that?
>
>     Andy
>
>>
>> Dominique
>>
>> On 18/03/2016 22:27, Stephen Allen wrote:
>>> On Fri, Mar 18, 2016 at 2:20 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>> On 18/03/16 09:16, Dominique Vandensteen wrote:
>>>>
>>>>> Hi,
>>>>> I'm having problems handling "big" graphs (50M to 100M triples at
>>>>> current
>>>>> stage) in my fuseki servers using sparql.
>>>>> The 2 actions I need todo are "DROP GRAPH <...>" and "MOVE <...> TO
>>>>> <...>".
>>>>> Doing these action with these graphs I get OutOfMemory errors. Some
>>>>> investigation pionted me to
>>>>> http://markmail.org/message/hjisrglx4eicrxyt
>>>>> and
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/jena-users/201504.mbox/%3CCAJ+MTwad1vfcnjArO37xKiwgYj7mRniLLZVmSx1_nrJ+RRf56Q@mail.gmail.com%3E 
>>>>>
>>>>>
>>>>>
>>>>> Using this config:
>>>>> <#yourdatasetname> rdf:type tdb:DatasetTDB ;
>>>>>      ja:context [ ja:cxtName "tdb:transactionJournalWriteBlockMode" ;
>>>>> ja:cxtValue "mapped" ] ;
>>>>>      ja:context [ ja:cxtName "arq:spillToDiskThreshold" ; ja:cxtValue
>>>>> 10000 .
>>>>> ] .
>>>>> Solves my problem but brings up another problem. My temp folder gets
>>>>> filled
>>>>> up with JenaTempByteBuffer-...UUID...tmp files until my disk is full.
>>>>> These
>>>>> files remain locked so I cannot delete them.
>>>>> The files seem to be created
>>>>> by org.apache.jena.tdb.base.file.BufferAllocatorMapped but are for 
>>>>> some
>>>>> reason not released.
>>>>> Is there any way to work around this issue?
>>>>>
>>>>> I'm using
>>>>> -fuseki 2.3.1
>>>>> -jvm 1.8.0_25 64bit
>>>>> -windows 10
>>>>>
>>>> mapped + Windows => files don't go away until the JVM exits [1] and 
>>>> even
>>>> then it does not seem to be reliable according to some reports.
>>>>
>>>> I thought BufferAllocatorDirect was supposed to get round this but it
>>>> allocates on direct memory (AKA malloc).
>>>>
>>>> It would need a spill to plain file implementation of BufferAllocator
>>>> which we don't seem to have.
>>>>
>>>>          Andy
>>>>
>>>> [1]
>>>> http://bugs.java.com/view_bug.do?bug_id=4724038
>>>> and others.
>>>>
>>>>>
>>> You can use the off-JVM memory that Andy mentions by changing the
>>> "mapped"
>>> to "direct" in your config file.  That is similar to using a memory
>>> mapped
>>> file, except that you are limited by the amount of memory that you have
>>> (but if you have enough virtual memory, then there should be no 
>>> problem).
>>>
>>> That first setting is only for TDB's storage of unwritten blocks.  But
>>> when
>>> you do large updates, Jena needs to temporarily store all of the tuples
>>> generated by the WHERE clause in memory before applying them in the
>>> update.
>>> This is where the spillToDisk comes in, it serializes those temporary
>>> tuples on disk in a regular file instead of holding them in an 
>>> in-memory
>>> array.  That file is not memory mapped, so there should be no problem
>>> with
>>> removing it after the update is complete.
>>>
>>> So basically, if "direct" works for you, then go with that (or use a
>>> different OS like Linux for the memory mapped approach).
>>>
>>> -Stephen
>>>
>>
>>
>
>


Re: arq:spillToDiskThreshold issue

Posted by Andy Seaborne <an...@apache.org>.
On 19/03/16 13:35, Dominique Vandensteen wrote:
> I don't think having enough memory is a working solution because we will
> need the big amount of memory only on rare occasions, so most of the
> time the memory will be "wasted".
>
> During my investigation I came up with 2 causes of the problem:
> 1. The close method of
> org.apache.jena.tdb.base.file.BufferAllocatorMapped is never called.
> I quickly fixed this by adding a ThreadLocal which is used to close all
> instances at transaction end. I will clean this up and use a
> WeakReference which in my opinion is a cleaner solution.
>
> 2. An issue in the JVM that is described here:
> http://stackoverflow.com/questions/13065358/java-7-filechannel-not-closing-properly-after-calling-a-map-method#32062298
>
>
> By implementing these 2 fixes I was able to use the
> arq:spillToDiskThreshold option in windows.

Great - do you have a patch or pull request for that?

	Andy

>
> Dominique
>
> On 18/03/2016 22:27, Stephen Allen wrote:
>> On Fri, Mar 18, 2016 at 2:20 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>> On 18/03/16 09:16, Dominique Vandensteen wrote:
>>>
>>>> Hi,
>>>> I'm having problems handling "big" graphs (50M to 100M triples at
>>>> current
>>>> stage) in my fuseki servers using sparql.
>>>> The 2 actions I need todo are "DROP GRAPH <...>" and "MOVE <...> TO
>>>> <...>".
>>>> Doing these action with these graphs I get OutOfMemory errors. Some
>>>> investigation pionted me to
>>>> http://markmail.org/message/hjisrglx4eicrxyt
>>>> and
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/jena-users/201504.mbox/%3CCAJ+MTwad1vfcnjArO37xKiwgYj7mRniLLZVmSx1_nrJ+RRf56Q@mail.gmail.com%3E
>>>>
>>>>
>>>> Using this config:
>>>> <#yourdatasetname> rdf:type tdb:DatasetTDB ;
>>>>      ja:context [ ja:cxtName "tdb:transactionJournalWriteBlockMode" ;
>>>> ja:cxtValue "mapped" ] ;
>>>>      ja:context [ ja:cxtName "arq:spillToDiskThreshold" ; ja:cxtValue
>>>> 10000 .
>>>> ] .
>>>> Solves my problem but brings up another problem. My temp folder gets
>>>> filled
>>>> up with JenaTempByteBuffer-...UUID...tmp files until my disk is full.
>>>> These
>>>> files remain locked so I cannot delete them.
>>>> The files seem to be created
>>>> by org.apache.jena.tdb.base.file.BufferAllocatorMapped but are for some
>>>> reason not released.
>>>> Is there any way to work around this issue?
>>>>
>>>> I'm using
>>>> -fuseki 2.3.1
>>>> -jvm 1.8.0_25 64bit
>>>> -windows 10
>>>>
>>> mapped + Windows => files don't go away until the JVM exits [1] and even
>>> then it does not seem to be reliable according to some reports.
>>>
>>> I thought BufferAllocatorDirect was supposed to get round this but it
>>> allocates on direct memory (AKA malloc).
>>>
>>> It would need a spill to plain file implementation of BufferAllocator
>>> which we don't seem to have.
>>>
>>>          Andy
>>>
>>> [1]
>>> http://bugs.java.com/view_bug.do?bug_id=4724038
>>> and others.
>>>
>>>>
>> You can use the off-JVM memory that Andy mentions by changing the
>> "mapped"
>> to "direct" in your config file.  That is similar to using a memory
>> mapped
>> file, except that you are limited by the amount of memory that you have
>> (but if you have enough virtual memory, then there should be no problem).
>>
>> That first setting is only for TDB's storage of unwritten blocks.  But
>> when
>> you do large updates, Jena needs to temporarily store all of the tuples
>> generated by the WHERE clause in memory before applying them in the
>> update.
>> This is where the spillToDisk comes in, it serializes those temporary
>> tuples on disk in a regular file instead of holding them in an in-memory
>> array.  That file is not memory mapped, so there should be no problem
>> with
>> removing it after the update is complete.
>>
>> So basically, if "direct" works for you, then go with that (or use a
>> different OS like Linux for the memory mapped approach).
>>
>> -Stephen
>>
>
>


Re: arq:spillToDiskThreshold issue

Posted by Dominique Vandensteen <do...@cogni.zone>.
I don't think having enough memory is a working solution because we will 
need the big amount of memory only on rare occasions, so most of the 
time the memory will be "wasted".

During my investigation I came up with 2 causes of the problem:
1. The close method of 
org.apache.jena.tdb.base.file.BufferAllocatorMapped is never called.
I quickly fixed this by adding a ThreadLocal which is used to close all 
instances at transaction end. I will clean this up and use a 
WeakReference which in my opinion is a cleaner solution.

2. An issue in the JVM that is described here: 
http://stackoverflow.com/questions/13065358/java-7-filechannel-not-closing-properly-after-calling-a-map-method#32062298

By implementing these 2 fixes I was able to use the 
arq:spillToDiskThreshold option in windows.

Dominique

On 18/03/2016 22:27, Stephen Allen wrote:
> On Fri, Mar 18, 2016 at 2:20 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 18/03/16 09:16, Dominique Vandensteen wrote:
>>
>>> Hi,
>>> I'm having problems handling "big" graphs (50M to 100M triples at current
>>> stage) in my fuseki servers using sparql.
>>> The 2 actions I need todo are "DROP GRAPH <...>" and "MOVE <...> TO
>>> <...>".
>>> Doing these action with these graphs I get OutOfMemory errors. Some
>>> investigation pionted me to http://markmail.org/message/hjisrglx4eicrxyt
>>> and
>>>
>>> http://mail-archives.apache.org/mod_mbox/jena-users/201504.mbox/%3CCAJ+MTwad1vfcnjArO37xKiwgYj7mRniLLZVmSx1_nrJ+RRf56Q@mail.gmail.com%3E
>>>
>>> Using this config:
>>> <#yourdatasetname> rdf:type tdb:DatasetTDB ;
>>>      ja:context [ ja:cxtName "tdb:transactionJournalWriteBlockMode" ;
>>> ja:cxtValue "mapped" ] ;
>>>      ja:context [ ja:cxtName "arq:spillToDiskThreshold" ; ja:cxtValue
>>> 10000 .
>>> ] .
>>> Solves my problem but brings up another problem. My temp folder gets
>>> filled
>>> up with JenaTempByteBuffer-...UUID...tmp files until my disk is full.
>>> These
>>> files remain locked so I cannot delete them.
>>> The files seem to be created
>>> by org.apache.jena.tdb.base.file.BufferAllocatorMapped but are for some
>>> reason not released.
>>> Is there any way to work around this issue?
>>>
>>> I'm using
>>> -fuseki 2.3.1
>>> -jvm 1.8.0_25 64bit
>>> -windows 10
>>>
>> mapped + Windows => files don't go away until the JVM exits [1] and even
>> then it does not seem to be reliable according to some reports.
>>
>> I thought BufferAllocatorDirect was supposed to get round this but it
>> allocates on direct memory (AKA malloc).
>>
>> It would need a spill to plain file implementation of BufferAllocator
>> which we don't seem to have.
>>
>>          Andy
>>
>> [1]
>> http://bugs.java.com/view_bug.do?bug_id=4724038
>> and others.
>>
>>>
> You can use the off-JVM memory that Andy mentions by changing the "mapped"
> to "direct" in your config file.  That is similar to using a memory mapped
> file, except that you are limited by the amount of memory that you have
> (but if you have enough virtual memory, then there should be no problem).
>
> That first setting is only for TDB's storage of unwritten blocks.  But when
> you do large updates, Jena needs to temporarily store all of the tuples
> generated by the WHERE clause in memory before applying them in the update.
> This is where the spillToDisk comes in, it serializes those temporary
> tuples on disk in a regular file instead of holding them in an in-memory
> array.  That file is not memory mapped, so there should be no problem with
> removing it after the update is complete.
>
> So basically, if "direct" works for you, then go with that (or use a
> different OS like Linux for the memory mapped approach).
>
> -Stephen
>


Re: arq:spillToDiskThreshold issue

Posted by Andy Seaborne <an...@apache.org>.
On 18/03/16 21:27, Stephen Allen wrote:

> You can use the off-JVM memory that Andy mentions by changing the "mapped"
> to "direct" in your config file.  That is similar to using a memory mapped
> file, except that you are limited by the amount of memory that you have
> (but if you have enough virtual memory, then there should be no problem).
>
> That first setting is only for TDB's storage of unwritten blocks.  But when
> you do large updates, Jena needs to temporarily store all of the tuples
> generated by the WHERE clause in memory before applying them in the update.
> This is where the spillToDisk comes in, it serializes those temporary
> tuples on disk in a regular file instead of holding them in an in-memory
> array.  That file is not memory mapped, so there should be no problem with
> removing it after the update is complete.
>
> So basically, if "direct" works for you, then go with that (or use a
> different OS like Linux for the memory mapped approach).
>
> -Stephen
>

Stephen,

(I got confused by the multiple use of "direct")

Would it make sense to implement the spill space using a BlockMgr? We 
have several implementations of BlockMgr including one that uses 
traditional file I/O as opposed to memory mapped files.

	Andy


Re: arq:spillToDiskThreshold issue

Posted by Stephen Allen <sa...@apache.org>.
On Fri, Mar 18, 2016 at 2:20 PM, Andy Seaborne <an...@apache.org> wrote:

> On 18/03/16 09:16, Dominique Vandensteen wrote:
>
>> Hi,
>> I'm having problems handling "big" graphs (50M to 100M triples at current
>> stage) in my fuseki servers using sparql.
>> The 2 actions I need todo are "DROP GRAPH <...>" and "MOVE <...> TO
>> <...>".
>> Doing these action with these graphs I get OutOfMemory errors. Some
>> investigation pionted me to http://markmail.org/message/hjisrglx4eicrxyt
>> and
>>
>> http://mail-archives.apache.org/mod_mbox/jena-users/201504.mbox/%3CCAJ+MTwad1vfcnjArO37xKiwgYj7mRniLLZVmSx1_nrJ+RRf56Q@mail.gmail.com%3E
>>
>> Using this config:
>> <#yourdatasetname> rdf:type tdb:DatasetTDB ;
>>     ja:context [ ja:cxtName "tdb:transactionJournalWriteBlockMode" ;
>> ja:cxtValue "mapped" ] ;
>>     ja:context [ ja:cxtName "arq:spillToDiskThreshold" ; ja:cxtValue
>> 10000 .
>> ] .
>> Solves my problem but brings up another problem. My temp folder gets
>> filled
>> up with JenaTempByteBuffer-...UUID...tmp files until my disk is full.
>> These
>> files remain locked so I cannot delete them.
>> The files seem to be created
>> by org.apache.jena.tdb.base.file.BufferAllocatorMapped but are for some
>> reason not released.
>> Is there any way to work around this issue?
>>
>> I'm using
>> -fuseki 2.3.1
>> -jvm 1.8.0_25 64bit
>> -windows 10
>>
>
> mapped + Windows => files don't go away until the JVM exits [1] and even
> then it does not seem to be reliable according to some reports.
>
> I thought BufferAllocatorDirect was supposed to get round this but it
> allocates on direct memory (AKA malloc).
>
> It would need a spill to plain file implementation of BufferAllocator
> which we don't seem to have.
>
>         Andy
>
> [1]
> http://bugs.java.com/view_bug.do?bug_id=4724038
> and others.
>
>>
>>
You can use the off-JVM memory that Andy mentions by changing the "mapped"
to "direct" in your config file.  That is similar to using a memory mapped
file, except that you are limited by the amount of memory that you have
(but if you have enough virtual memory, then there should be no problem).

That first setting is only for TDB's storage of unwritten blocks.  But when
you do large updates, Jena needs to temporarily store all of the tuples
generated by the WHERE clause in memory before applying them in the update.
This is where the spillToDisk comes in, it serializes those temporary
tuples on disk in a regular file instead of holding them in an in-memory
array.  That file is not memory mapped, so there should be no problem with
removing it after the update is complete.

So basically, if "direct" works for you, then go with that (or use a
different OS like Linux for the memory mapped approach).

-Stephen

Re: arq:spillToDiskThreshold issue

Posted by Andy Seaborne <an...@apache.org>.
On 18/03/16 09:16, Dominique Vandensteen wrote:
> Hi,
> I'm having problems handling "big" graphs (50M to 100M triples at current
> stage) in my fuseki servers using sparql.
> The 2 actions I need todo are "DROP GRAPH <...>" and "MOVE <...> TO <...>".
> Doing these action with these graphs I get OutOfMemory errors. Some
> investigation pionted me to http://markmail.org/message/hjisrglx4eicrxyt
> and
> http://mail-archives.apache.org/mod_mbox/jena-users/201504.mbox/%3CCAJ+MTwad1vfcnjArO37xKiwgYj7mRniLLZVmSx1_nrJ+RRf56Q@mail.gmail.com%3E
>
> Using this config:
> <#yourdatasetname> rdf:type tdb:DatasetTDB ;
>     ja:context [ ja:cxtName "tdb:transactionJournalWriteBlockMode" ;
> ja:cxtValue "mapped" ] ;
>     ja:context [ ja:cxtName "arq:spillToDiskThreshold" ; ja:cxtValue
> 10000 .
> ] .
> Solves my problem but brings up another problem. My temp folder gets filled
> up with JenaTempByteBuffer-...UUID...tmp files until my disk is full. These
> files remain locked so I cannot delete them.
> The files seem to be created
> by org.apache.jena.tdb.base.file.BufferAllocatorMapped but are for some
> reason not released.
> Is there any way to work around this issue?
>
> I'm using
> -fuseki 2.3.1
> -jvm 1.8.0_25 64bit
> -windows 10

mapped + Windows => files don't go away until the JVM exits [1] and even 
then it does not seem to be reliable according to some reports.

I thought BufferAllocatorDirect was supposed to get round this but it 
allocates on direct memory (AKA malloc).

It would need a spill to plain file implementation of BufferAllocator 
which we don't seem to have.

	Andy

[1]
http://bugs.java.com/view_bug.do?bug_id=4724038
and others.

>
>
>
>
> Dominique Vandensteen
> Head of development
>
> + 32 474 870856
> domi.vds@cogni.zone
> skype: domi.vds
>