You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Dominique Vandensteen <do...@cogni.zone> on 2016/04/06 20:26:07 UTC
Re: arq:spillToDiskThreshold issue

I don't think my solution is a solution that could be used in production.
Anyway, the patched file is in attachement.

D.

On 27/03/2016 13:56, Andy Seaborne wrote:
> On 19/03/16 13:35, Dominique Vandensteen wrote:
>> I don't think having enough memory is a working solution because we will
>> need the big amount of memory only on rare occasions, so most of the
>> time the memory will be "wasted".
>>
>> During my investigation I came up with 2 causes of the problem:
>> 1. The close method of
>> org.apache.jena.tdb.base.file.BufferAllocatorMapped is never called.
>> I quickly fixed this by adding a ThreadLocal which is used to close all
>> instances at transaction end. I will clean this up and use a
>> WeakReference which in my opinion is a cleaner solution.
>>
>> 2. An issue in the JVM that is described here:
>> http://stackoverflow.com/questions/13065358/java-7-filechannel-not-closing-properly-after-calling-a-map-method#32062298 
>>
>>
>>
>> By implementing these 2 fixes I was able to use the
>> arq:spillToDiskThreshold option in windows.
>
> Great - do you have a patch or pull request for that?
>
>     Andy
>
>>
>> Dominique
>>
>> On 18/03/2016 22:27, Stephen Allen wrote:
>>> On Fri, Mar 18, 2016 at 2:20 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>> On 18/03/16 09:16, Dominique Vandensteen wrote:
>>>>
>>>>> Hi,
>>>>> I'm having problems handling "big" graphs (50M to 100M triples at
>>>>> current
>>>>> stage) in my fuseki servers using sparql.
>>>>> The 2 actions I need todo are "DROP GRAPH <...>" and "MOVE <...> TO
>>>>> <...>".
>>>>> Doing these action with these graphs I get OutOfMemory errors. Some
>>>>> investigation pionted me to
>>>>> http://markmail.org/message/hjisrglx4eicrxyt
>>>>> and
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/jena-users/201504.mbox/%3CCAJ+MTwad1vfcnjArO37xKiwgYj7mRniLLZVmSx1_nrJ+RRf56Q@mail.gmail.com%3E 
>>>>>
>>>>>
>>>>>
>>>>> Using this config:
>>>>> <#yourdatasetname> rdf:type tdb:DatasetTDB ;
>>>>>      ja:context [ ja:cxtName "tdb:transactionJournalWriteBlockMode" ;
>>>>> ja:cxtValue "mapped" ] ;
>>>>>      ja:context [ ja:cxtName "arq:spillToDiskThreshold" ; ja:cxtValue
>>>>> 10000 .
>>>>> ] .
>>>>> Solves my problem but brings up another problem. My temp folder gets
>>>>> filled
>>>>> up with JenaTempByteBuffer-...UUID...tmp files until my disk is full.
>>>>> These
>>>>> files remain locked so I cannot delete them.
>>>>> The files seem to be created
>>>>> by org.apache.jena.tdb.base.file.BufferAllocatorMapped but are for 
>>>>> some
>>>>> reason not released.
>>>>> Is there any way to work around this issue?
>>>>>
>>>>> I'm using
>>>>> -fuseki 2.3.1
>>>>> -jvm 1.8.0_25 64bit
>>>>> -windows 10
>>>>>
>>>> mapped + Windows => files don't go away until the JVM exits [1] and 
>>>> even
>>>> then it does not seem to be reliable according to some reports.
>>>>
>>>> I thought BufferAllocatorDirect was supposed to get round this but it
>>>> allocates on direct memory (AKA malloc).
>>>>
>>>> It would need a spill to plain file implementation of BufferAllocator
>>>> which we don't seem to have.
>>>>
>>>>          Andy
>>>>
>>>> [1]
>>>> http://bugs.java.com/view_bug.do?bug_id=4724038
>>>> and others.
>>>>
>>>>>
>>> You can use the off-JVM memory that Andy mentions by changing the
>>> "mapped"
>>> to "direct" in your config file.  That is similar to using a memory
>>> mapped
>>> file, except that you are limited by the amount of memory that you have
>>> (but if you have enough virtual memory, then there should be no 
>>> problem).
>>>
>>> That first setting is only for TDB's storage of unwritten blocks.  But
>>> when
>>> you do large updates, Jena needs to temporarily store all of the tuples
>>> generated by the WHERE clause in memory before applying them in the
>>> update.
>>> This is where the spillToDisk comes in, it serializes those temporary
>>> tuples on disk in a regular file instead of holding them in an 
>>> in-memory
>>> array.  That file is not memory mapped, so there should be no problem
>>> with
>>> removing it after the update is complete.
>>>
>>> So basically, if "direct" works for you, then go with that (or use a
>>> different OS like Linux for the memory mapped approach).
>>>
>>> -Stephen
>>>
>>
>>
>
>