You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Maria Jackson <ma...@gmail.com> on 2015/04/09 15:57:50 UTC
Loading quads in Jena
Hi,
I am trying to load dbpedia.nq in Jena using tdbloader. The loading seems
to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
has. Also I have set heap size to 2GB. Is there some way by which I may
speed-up loading of quads in Jena?
Re: Loading quads in Jena
Posted by Andy Seaborne <an...@apache.org>.
TDB uses memory mapped files. Increasing heap takes space away from
that. It shows up in VIRT under top(1).
I don't know about that distribution of Linux I'm afraid.
You will see the occasional very low batch rate (the GC runs) but it
looks like this is more continuous.
Is this disk or SSD?
Is this a shared or virtualised machine in anyway?
A possibility is that the I/O path is being slowed down.
Have you tried tdbloader2? It works in a different way and so can have
different characteristics on the same hardware.
Andy
On 11/04/15 14:46, Maria Jackson wrote:
> I am able to set heap size to 4GB using: export JAVA_TOOL_OPTIONS='-Xmx2G'
>
> But I still Jena is using just 1% of RAM as shown by top command and it is
> still loading quads at the rate: (Batch: 979 / Avg: 7,422).
>
> On Sat, Apr 11, 2015 at 6:11 PM, Maria Jackson <ma...@gmail.com>
> wrote:
>
>> Also here are the statistics of Jena which I obtained:
>>
>> $ jstat -gccapacity 46306
>> NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX
>> OGC OC PGCMN PGCMX PGC PC YGC FGC
>> 343552.0 349696.0 74752.0 3072.0 3584.0 67584.0 686592.0 698880.0
>> 518656.0 518656.0 21504.0 169984.0 21504.0 21504.0 748 22
>> $ jstat -gc 46306
>> S0C S1C S0U S1U EC EU OC OU PC
>> PU YGC YGCT FGC FGCT GCT
>> 3584.0 3584.0 0.0 3008.0 67584.0 54298.8 518656.0 249814.2
>> 21504.0 13605.7 771 71.630 22 5.364 76.995
>>
>>
>> On Sat, Apr 11, 2015 at 5:49 PM, Maria Jackson <
>> maria.jackson.phd@gmail.com> wrote:
>>
>>> Dear Andy,
>>>
>>> In the beginning I was getting the following message:
>>> 17:06:12 INFO loader :: Add: 15,200,000 quads (Batch:
>>> 21,114 / Avg: 23,491)
>>>
>>> But now I am getting the following message:
>>> 17:43:14 INFO loader :: Add: 21,300,000 quads (Batch: 979
>>> / Avg: 7,422)
>>>
>>> Actually top command shows me, Jena is using "1%" RAM and my actual RAM
>>> size is 96 GB. I think it does not include the space required by heap.
>>>
>>> Also I increased the heap space to following (as a non-root user):
>>> $ export JVM_ARGS="-Xmx2048m"
>>>
>>> After doing so I am getting the following heap information?
>>>
>>> $ java -XX:+PrintFlagsFinal -version | grep -iE
>>> 'HeapSize|PermSize|ThreadStackSize'
>>> uintx AdaptivePermSizeWeight = 20
>>> {product}
>>> intx CompilerThreadStackSize = 0
>>> {pd product}
>>> uintx ErgoHeapSizeLimit = 0
>>> {product}
>>> uintx HeapSizePerGCThread = 87241520
>>> {product}
>>> uintx InitialHeapSize := 1054841216
>>> {product}
>>> uintx LargePageHeapSizeThreshold = 134217728
>>> {product}
>>> uintx MaxHeapSize := 16877879296
>>> {product}
>>> uintx MaxPermSize = 174063616
>>> {pd product}
>>> uintx PermSize = 21757952
>>> {pd product}
>>> intx ThreadStackSize = 1024
>>> {pd product}
>>> intx VMThreadStackSize = 1024
>>> {pd product}
>>> java version "1.7.0_75"
>>> OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
>>> OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
>>>
>>>
>>> Also does OS setup restricts the amount of space for mmap file? I am
>>> using
>>>
>>> Distributor ID: Scientific
>>> Description: Scientific Linux release 6.6 (Carbon)
>>>
>>> If yes, can you please direct me as to how can I increase the space of
>>> memory mapped files?
>>>
>>>
>>>
>>> On Thu, Apr 9, 2015 at 11:11 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>> On 09/04/15 14:57, Maria Jackson wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to load dbpedia.nq in Jena using tdbloader. The loading
>>>>> seems
>>>>> to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
>>>>> has. Also I have set heap size to 2GB. Is there some way by which I may
>>>>> speed-up loading of quads in Jena?
>>>>>
>>>>>
>>>> What does the log say? Starting rate and as it progresses?
>>>>
>>>> When you say "1% of 96GB RAM" does that include all the process RAM (not
>>>> just heap) because TDB uses out-of-heap RAM for memory mapped files.
>>>>
>>>> The other fact is whether your OS setup restricts the amount of space
>>>> for mmap files.
>>>>
>>>> Andy
>>>>
>>>>
>>>
>>
>
Re: Loading quads in Jena
Posted by Maria Jackson <ma...@gmail.com>.
I am able to set heap size to 4GB using: export JAVA_TOOL_OPTIONS='-Xmx2G'
But I still Jena is using just 1% of RAM as shown by top command and it is
still loading quads at the rate: (Batch: 979 / Avg: 7,422).
On Sat, Apr 11, 2015 at 6:11 PM, Maria Jackson <ma...@gmail.com>
wrote:
> Also here are the statistics of Jena which I obtained:
>
> $ jstat -gccapacity 46306
> NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX
> OGC OC PGCMN PGCMX PGC PC YGC FGC
> 343552.0 349696.0 74752.0 3072.0 3584.0 67584.0 686592.0 698880.0
> 518656.0 518656.0 21504.0 169984.0 21504.0 21504.0 748 22
> $ jstat -gc 46306
> S0C S1C S0U S1U EC EU OC OU PC
> PU YGC YGCT FGC FGCT GCT
> 3584.0 3584.0 0.0 3008.0 67584.0 54298.8 518656.0 249814.2
> 21504.0 13605.7 771 71.630 22 5.364 76.995
>
>
> On Sat, Apr 11, 2015 at 5:49 PM, Maria Jackson <
> maria.jackson.phd@gmail.com> wrote:
>
>> Dear Andy,
>>
>> In the beginning I was getting the following message:
>> 17:06:12 INFO loader :: Add: 15,200,000 quads (Batch:
>> 21,114 / Avg: 23,491)
>>
>> But now I am getting the following message:
>> 17:43:14 INFO loader :: Add: 21,300,000 quads (Batch: 979
>> / Avg: 7,422)
>>
>> Actually top command shows me, Jena is using "1%" RAM and my actual RAM
>> size is 96 GB. I think it does not include the space required by heap.
>>
>> Also I increased the heap space to following (as a non-root user):
>> $ export JVM_ARGS="-Xmx2048m"
>>
>> After doing so I am getting the following heap information?
>>
>> $ java -XX:+PrintFlagsFinal -version | grep -iE
>> 'HeapSize|PermSize|ThreadStackSize'
>> uintx AdaptivePermSizeWeight = 20
>> {product}
>> intx CompilerThreadStackSize = 0
>> {pd product}
>> uintx ErgoHeapSizeLimit = 0
>> {product}
>> uintx HeapSizePerGCThread = 87241520
>> {product}
>> uintx InitialHeapSize := 1054841216
>> {product}
>> uintx LargePageHeapSizeThreshold = 134217728
>> {product}
>> uintx MaxHeapSize := 16877879296
>> {product}
>> uintx MaxPermSize = 174063616
>> {pd product}
>> uintx PermSize = 21757952
>> {pd product}
>> intx ThreadStackSize = 1024
>> {pd product}
>> intx VMThreadStackSize = 1024
>> {pd product}
>> java version "1.7.0_75"
>> OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
>> OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
>>
>>
>> Also does OS setup restricts the amount of space for mmap file? I am
>> using
>>
>> Distributor ID: Scientific
>> Description: Scientific Linux release 6.6 (Carbon)
>>
>> If yes, can you please direct me as to how can I increase the space of
>> memory mapped files?
>>
>>
>>
>> On Thu, Apr 9, 2015 at 11:11 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>> On 09/04/15 14:57, Maria Jackson wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to load dbpedia.nq in Jena using tdbloader. The loading
>>>> seems
>>>> to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
>>>> has. Also I have set heap size to 2GB. Is there some way by which I may
>>>> speed-up loading of quads in Jena?
>>>>
>>>>
>>> What does the log say? Starting rate and as it progresses?
>>>
>>> When you say "1% of 96GB RAM" does that include all the process RAM (not
>>> just heap) because TDB uses out-of-heap RAM for memory mapped files.
>>>
>>> The other fact is whether your OS setup restricts the amount of space
>>> for mmap files.
>>>
>>> Andy
>>>
>>>
>>
>
Re: Loading quads in Jena
Posted by Maria Jackson <ma...@gmail.com>.
Also here are the statistics of Jena which I obtained:
$ jstat -gccapacity 46306
NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX
OGC OC PGCMN PGCMX PGC PC YGC FGC
343552.0 349696.0 74752.0 3072.0 3584.0 67584.0 686592.0 698880.0
518656.0 518656.0 21504.0 169984.0 21504.0 21504.0 748 22
$ jstat -gc 46306
S0C S1C S0U S1U EC EU OC OU PC
PU YGC YGCT FGC FGCT GCT
3584.0 3584.0 0.0 3008.0 67584.0 54298.8 518656.0 249814.2 21504.0
13605.7 771 71.630 22 5.364 76.995
On Sat, Apr 11, 2015 at 5:49 PM, Maria Jackson <ma...@gmail.com>
wrote:
> Dear Andy,
>
> In the beginning I was getting the following message:
> 17:06:12 INFO loader :: Add: 15,200,000 quads (Batch:
> 21,114 / Avg: 23,491)
>
> But now I am getting the following message:
> 17:43:14 INFO loader :: Add: 21,300,000 quads (Batch: 979 /
> Avg: 7,422)
>
> Actually top command shows me, Jena is using "1%" RAM and my actual RAM
> size is 96 GB. I think it does not include the space required by heap.
>
> Also I increased the heap space to following (as a non-root user):
> $ export JVM_ARGS="-Xmx2048m"
>
> After doing so I am getting the following heap information?
>
> $ java -XX:+PrintFlagsFinal -version | grep -iE
> 'HeapSize|PermSize|ThreadStackSize'
> uintx AdaptivePermSizeWeight = 20
> {product}
> intx CompilerThreadStackSize = 0
> {pd product}
> uintx ErgoHeapSizeLimit = 0
> {product}
> uintx HeapSizePerGCThread = 87241520
> {product}
> uintx InitialHeapSize := 1054841216
> {product}
> uintx LargePageHeapSizeThreshold = 134217728
> {product}
> uintx MaxHeapSize := 16877879296
> {product}
> uintx MaxPermSize = 174063616
> {pd product}
> uintx PermSize = 21757952
> {pd product}
> intx ThreadStackSize = 1024
> {pd product}
> intx VMThreadStackSize = 1024
> {pd product}
> java version "1.7.0_75"
> OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
> OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
>
>
> Also does OS setup restricts the amount of space for mmap file? I am using
>
> Distributor ID: Scientific
> Description: Scientific Linux release 6.6 (Carbon)
>
> If yes, can you please direct me as to how can I increase the space of
> memory mapped files?
>
>
>
> On Thu, Apr 9, 2015 at 11:11 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 09/04/15 14:57, Maria Jackson wrote:
>>
>>> Hi,
>>>
>>> I am trying to load dbpedia.nq in Jena using tdbloader. The loading seems
>>> to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
>>> has. Also I have set heap size to 2GB. Is there some way by which I may
>>> speed-up loading of quads in Jena?
>>>
>>>
>> What does the log say? Starting rate and as it progresses?
>>
>> When you say "1% of 96GB RAM" does that include all the process RAM (not
>> just heap) because TDB uses out-of-heap RAM for memory mapped files.
>>
>> The other fact is whether your OS setup restricts the amount of space for
>> mmap files.
>>
>> Andy
>>
>>
>
Re: Loading quads in Jena
Posted by Maria Jackson <ma...@gmail.com>.
Dear Andy,
In the beginning I was getting the following message:
17:06:12 INFO loader :: Add: 15,200,000 quads (Batch: 21,114
/ Avg: 23,491)
But now I am getting the following message:
17:43:14 INFO loader :: Add: 21,300,000 quads (Batch: 979 /
Avg: 7,422)
Actually top command shows me, Jena is using "1%" RAM and my actual RAM
size is 96 GB. I think it does not include the space required by heap.
Also I increased the heap space to following (as a non-root user):
$ export JVM_ARGS="-Xmx2048m"
After doing so I am getting the following heap information?
$ java -XX:+PrintFlagsFinal -version | grep -iE
'HeapSize|PermSize|ThreadStackSize'
uintx AdaptivePermSizeWeight = 20
{product}
intx CompilerThreadStackSize = 0
{pd product}
uintx ErgoHeapSizeLimit = 0
{product}
uintx HeapSizePerGCThread = 87241520
{product}
uintx InitialHeapSize := 1054841216
{product}
uintx LargePageHeapSizeThreshold = 134217728
{product}
uintx MaxHeapSize := 16877879296
{product}
uintx MaxPermSize = 174063616
{pd product}
uintx PermSize = 21757952
{pd product}
intx ThreadStackSize = 1024
{pd product}
intx VMThreadStackSize = 1024
{pd product}
java version "1.7.0_75"
OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
Also does OS setup restricts the amount of space for mmap file? I am using
Distributor ID: Scientific
Description: Scientific Linux release 6.6 (Carbon)
If yes, can you please direct me as to how can I increase the space of
memory mapped files?
On Thu, Apr 9, 2015 at 11:11 PM, Andy Seaborne <an...@apache.org> wrote:
> On 09/04/15 14:57, Maria Jackson wrote:
>
>> Hi,
>>
>> I am trying to load dbpedia.nq in Jena using tdbloader. The loading seems
>> to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
>> has. Also I have set heap size to 2GB. Is there some way by which I may
>> speed-up loading of quads in Jena?
>>
>>
> What does the log say? Starting rate and as it progresses?
>
> When you say "1% of 96GB RAM" does that include all the process RAM (not
> just heap) because TDB uses out-of-heap RAM for memory mapped files.
>
> The other fact is whether your OS setup restricts the amount of space for
> mmap files.
>
> Andy
>
>
Re: Loading quads in Jena
Posted by Andy Seaborne <an...@apache.org>.
On 09/04/15 14:57, Maria Jackson wrote:
> Hi,
>
> I am trying to load dbpedia.nq in Jena using tdbloader. The loading seems
> to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
> has. Also I have set heap size to 2GB. Is there some way by which I may
> speed-up loading of quads in Jena?
>
What does the log say? Starting rate and as it progresses?
When you say "1% of 96GB RAM" does that include all the process RAM (not
just heap) because TDB uses out-of-heap RAM for memory mapped files.
The other fact is whether your OS setup restricts the amount of space
for mmap files.
Andy