You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Maria Jackson <ma...@gmail.com> on 2015/04/09 15:57:50 UTC

Loading quads in Jena

Hi,

I am trying to load dbpedia.nq in Jena using tdbloader. The loading seems
to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
has. Also I have set heap size to 2GB. Is there some way by which I may
speed-up loading of quads in Jena?

Re: Loading quads in Jena

Posted by Andy Seaborne <an...@apache.org>.
TDB uses memory mapped files. Increasing heap takes space away from 
that.  It shows up in VIRT under top(1).

I don't know about that distribution of Linux I'm afraid.

You will see the occasional very low batch rate (the GC runs) but it 
looks like this is more continuous.

Is this disk or SSD?
Is this a shared or virtualised machine in anyway?

A possibility is that the I/O path is being slowed down.

Have you tried tdbloader2?  It works in a different way and so can have 
different characteristics on the same hardware.

	Andy



On 11/04/15 14:46, Maria Jackson wrote:
> I am able to set heap size to 4GB using: export JAVA_TOOL_OPTIONS='-Xmx2G'
>
> But I still Jena is using just 1% of RAM as shown by top command and it is
> still loading quads at the rate: (Batch: 979 / Avg: 7,422).
>
> On Sat, Apr 11, 2015 at 6:11 PM, Maria Jackson <ma...@gmail.com>
> wrote:
>
>> Also here are the statistics of Jena which I obtained:
>>
>> $ jstat -gccapacity 46306
>>   NGCMN    NGCMX     NGC     S0C   S1C       EC      OGCMN      OGCMX
>> OGC         OC      PGCMN    PGCMX     PGC       PC     YGC    FGC
>> 343552.0 349696.0  74752.0 3072.0 3584.0  67584.0   686592.0   698880.0
>> 518656.0   518656.0  21504.0 169984.0  21504.0  21504.0    748    22
>> $ jstat -gc 46306
>>   S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC
>>    PU    YGC     YGCT    FGC    FGCT     GCT
>> 3584.0 3584.0  0.0   3008.0 67584.0  54298.8   518656.0   249814.2
>>   21504.0 13605.7    771   71.630  22      5.364   76.995
>>
>>
>> On Sat, Apr 11, 2015 at 5:49 PM, Maria Jackson <
>> maria.jackson.phd@gmail.com> wrote:
>>
>>> Dear Andy,
>>>
>>> In the beginning I was getting the following message:
>>> 17:06:12 INFO  loader               :: Add: 15,200,000 quads (Batch:
>>> 21,114 / Avg: 23,491)
>>>
>>> But now I am getting the following message:
>>> 17:43:14 INFO  loader               :: Add: 21,300,000 quads (Batch: 979
>>> / Avg: 7,422)
>>>
>>> Actually top command shows me, Jena is using "1%" RAM and my actual RAM
>>> size is 96 GB. I think it does not include the space required by heap.
>>>
>>> Also I increased the heap space to following (as a non-root user):
>>>      $ export JVM_ARGS="-Xmx2048m"
>>>
>>> After doing so I am getting the following heap information?
>>>
>>>      $ java -XX:+PrintFlagsFinal -version | grep -iE
>>> 'HeapSize|PermSize|ThreadStackSize'
>>>          uintx AdaptivePermSizeWeight                    = 20
>>>   {product}
>>>           intx CompilerThreadStackSize                   = 0
>>> {pd product}
>>>          uintx ErgoHeapSizeLimit                         = 0
>>> {product}
>>>          uintx HeapSizePerGCThread                       = 87241520
>>>   {product}
>>>          uintx InitialHeapSize                          := 1054841216
>>>   {product}
>>>          uintx LargePageHeapSizeThreshold                = 134217728
>>> {product}
>>>          uintx MaxHeapSize                              := 16877879296
>>> {product}
>>>          uintx MaxPermSize                               = 174063616
>>> {pd product}
>>>          uintx PermSize                                  = 21757952
>>>   {pd product}
>>>           intx ThreadStackSize                           = 1024
>>>   {pd product}
>>>           intx VMThreadStackSize                         = 1024
>>>   {pd product}
>>>      java version "1.7.0_75"
>>>      OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
>>>      OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
>>>
>>>
>>> Also does OS setup restricts the amount of space for mmap file? I am
>>> using
>>>
>>>      Distributor ID: Scientific
>>>      Description: Scientific Linux release 6.6 (Carbon)
>>>
>>> If yes, can you please direct me as to how can I increase the space of
>>> memory mapped files?
>>>
>>>
>>>
>>> On Thu, Apr 9, 2015 at 11:11 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>> On 09/04/15 14:57, Maria Jackson wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to load dbpedia.nq in Jena using tdbloader. The loading
>>>>> seems
>>>>> to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
>>>>> has. Also I have set heap size to 2GB. Is there some way by which I may
>>>>> speed-up loading of quads in Jena?
>>>>>
>>>>>
>>>> What does the log say? Starting rate and as it progresses?
>>>>
>>>> When you say "1% of 96GB RAM" does that include all the process RAM (not
>>>> just heap) because TDB uses out-of-heap RAM for memory mapped files.
>>>>
>>>> The other fact is whether your OS setup restricts the amount of space
>>>> for mmap files.
>>>>
>>>>          Andy
>>>>
>>>>
>>>
>>
>


Re: Loading quads in Jena

Posted by Maria Jackson <ma...@gmail.com>.
I am able to set heap size to 4GB using: export JAVA_TOOL_OPTIONS='-Xmx2G'

But I still Jena is using just 1% of RAM as shown by top command and it is
still loading quads at the rate: (Batch: 979 / Avg: 7,422).

On Sat, Apr 11, 2015 at 6:11 PM, Maria Jackson <ma...@gmail.com>
wrote:

> Also here are the statistics of Jena which I obtained:
>
> $ jstat -gccapacity 46306
>  NGCMN    NGCMX     NGC     S0C   S1C       EC      OGCMN      OGCMX
> OGC         OC      PGCMN    PGCMX     PGC       PC     YGC    FGC
> 343552.0 349696.0  74752.0 3072.0 3584.0  67584.0   686592.0   698880.0
> 518656.0   518656.0  21504.0 169984.0  21504.0  21504.0    748    22
> $ jstat -gc 46306
>  S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC
>   PU    YGC     YGCT    FGC    FGCT     GCT
> 3584.0 3584.0  0.0   3008.0 67584.0  54298.8   518656.0   249814.2
>  21504.0 13605.7    771   71.630  22      5.364   76.995
>
>
> On Sat, Apr 11, 2015 at 5:49 PM, Maria Jackson <
> maria.jackson.phd@gmail.com> wrote:
>
>> Dear Andy,
>>
>> In the beginning I was getting the following message:
>> 17:06:12 INFO  loader               :: Add: 15,200,000 quads (Batch:
>> 21,114 / Avg: 23,491)
>>
>> But now I am getting the following message:
>> 17:43:14 INFO  loader               :: Add: 21,300,000 quads (Batch: 979
>> / Avg: 7,422)
>>
>> Actually top command shows me, Jena is using "1%" RAM and my actual RAM
>> size is 96 GB. I think it does not include the space required by heap.
>>
>> Also I increased the heap space to following (as a non-root user):
>>     $ export JVM_ARGS="-Xmx2048m"
>>
>> After doing so I am getting the following heap information?
>>
>>     $ java -XX:+PrintFlagsFinal -version | grep -iE
>> 'HeapSize|PermSize|ThreadStackSize'
>>         uintx AdaptivePermSizeWeight                    = 20
>>  {product}
>>          intx CompilerThreadStackSize                   = 0
>> {pd product}
>>         uintx ErgoHeapSizeLimit                         = 0
>> {product}
>>         uintx HeapSizePerGCThread                       = 87241520
>>  {product}
>>         uintx InitialHeapSize                          := 1054841216
>>  {product}
>>         uintx LargePageHeapSizeThreshold                = 134217728
>> {product}
>>         uintx MaxHeapSize                              := 16877879296
>> {product}
>>         uintx MaxPermSize                               = 174063616
>> {pd product}
>>         uintx PermSize                                  = 21757952
>>  {pd product}
>>          intx ThreadStackSize                           = 1024
>>  {pd product}
>>          intx VMThreadStackSize                         = 1024
>>  {pd product}
>>     java version "1.7.0_75"
>>     OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
>>     OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
>>
>>
>> Also does OS setup restricts the amount of space for mmap file? I am
>> using
>>
>>     Distributor ID: Scientific
>>     Description: Scientific Linux release 6.6 (Carbon)
>>
>> If yes, can you please direct me as to how can I increase the space of
>> memory mapped files?
>>
>>
>>
>> On Thu, Apr 9, 2015 at 11:11 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>> On 09/04/15 14:57, Maria Jackson wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to load dbpedia.nq in Jena using tdbloader. The loading
>>>> seems
>>>> to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
>>>> has. Also I have set heap size to 2GB. Is there some way by which I may
>>>> speed-up loading of quads in Jena?
>>>>
>>>>
>>> What does the log say? Starting rate and as it progresses?
>>>
>>> When you say "1% of 96GB RAM" does that include all the process RAM (not
>>> just heap) because TDB uses out-of-heap RAM for memory mapped files.
>>>
>>> The other fact is whether your OS setup restricts the amount of space
>>> for mmap files.
>>>
>>>         Andy
>>>
>>>
>>
>

Re: Loading quads in Jena

Posted by Maria Jackson <ma...@gmail.com>.
Also here are the statistics of Jena which I obtained:

$ jstat -gccapacity 46306
 NGCMN    NGCMX     NGC     S0C   S1C       EC      OGCMN      OGCMX
OGC         OC      PGCMN    PGCMX     PGC       PC     YGC    FGC
343552.0 349696.0  74752.0 3072.0 3584.0  67584.0   686592.0   698880.0
518656.0   518656.0  21504.0 169984.0  21504.0  21504.0    748    22
$ jstat -gc 46306
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC
  PU    YGC     YGCT    FGC    FGCT     GCT
3584.0 3584.0  0.0   3008.0 67584.0  54298.8   518656.0   249814.2  21504.0
13605.7    771   71.630  22      5.364   76.995


On Sat, Apr 11, 2015 at 5:49 PM, Maria Jackson <ma...@gmail.com>
wrote:

> Dear Andy,
>
> In the beginning I was getting the following message:
> 17:06:12 INFO  loader               :: Add: 15,200,000 quads (Batch:
> 21,114 / Avg: 23,491)
>
> But now I am getting the following message:
> 17:43:14 INFO  loader               :: Add: 21,300,000 quads (Batch: 979 /
> Avg: 7,422)
>
> Actually top command shows me, Jena is using "1%" RAM and my actual RAM
> size is 96 GB. I think it does not include the space required by heap.
>
> Also I increased the heap space to following (as a non-root user):
>     $ export JVM_ARGS="-Xmx2048m"
>
> After doing so I am getting the following heap information?
>
>     $ java -XX:+PrintFlagsFinal -version | grep -iE
> 'HeapSize|PermSize|ThreadStackSize'
>         uintx AdaptivePermSizeWeight                    = 20
>  {product}
>          intx CompilerThreadStackSize                   = 0
> {pd product}
>         uintx ErgoHeapSizeLimit                         = 0
> {product}
>         uintx HeapSizePerGCThread                       = 87241520
>  {product}
>         uintx InitialHeapSize                          := 1054841216
>  {product}
>         uintx LargePageHeapSizeThreshold                = 134217728
> {product}
>         uintx MaxHeapSize                              := 16877879296
> {product}
>         uintx MaxPermSize                               = 174063616
> {pd product}
>         uintx PermSize                                  = 21757952
>  {pd product}
>          intx ThreadStackSize                           = 1024
>  {pd product}
>          intx VMThreadStackSize                         = 1024
>  {pd product}
>     java version "1.7.0_75"
>     OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
>     OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
>
>
> Also does OS setup restricts the amount of space for mmap file? I am using
>
>     Distributor ID: Scientific
>     Description: Scientific Linux release 6.6 (Carbon)
>
> If yes, can you please direct me as to how can I increase the space of
> memory mapped files?
>
>
>
> On Thu, Apr 9, 2015 at 11:11 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 09/04/15 14:57, Maria Jackson wrote:
>>
>>> Hi,
>>>
>>> I am trying to load dbpedia.nq in Jena using tdbloader. The loading seems
>>> to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
>>> has. Also I have set heap size to 2GB. Is there some way by which I may
>>> speed-up loading of quads in Jena?
>>>
>>>
>> What does the log say? Starting rate and as it progresses?
>>
>> When you say "1% of 96GB RAM" does that include all the process RAM (not
>> just heap) because TDB uses out-of-heap RAM for memory mapped files.
>>
>> The other fact is whether your OS setup restricts the amount of space for
>> mmap files.
>>
>>         Andy
>>
>>
>

Re: Loading quads in Jena

Posted by Maria Jackson <ma...@gmail.com>.
Dear Andy,

In the beginning I was getting the following message:
17:06:12 INFO  loader               :: Add: 15,200,000 quads (Batch: 21,114
/ Avg: 23,491)

But now I am getting the following message:
17:43:14 INFO  loader               :: Add: 21,300,000 quads (Batch: 979 /
Avg: 7,422)

Actually top command shows me, Jena is using "1%" RAM and my actual RAM
size is 96 GB. I think it does not include the space required by heap.

Also I increased the heap space to following (as a non-root user):
    $ export JVM_ARGS="-Xmx2048m"

After doing so I am getting the following heap information?

    $ java -XX:+PrintFlagsFinal -version | grep -iE
'HeapSize|PermSize|ThreadStackSize'
        uintx AdaptivePermSizeWeight                    = 20
 {product}
         intx CompilerThreadStackSize                   = 0
{pd product}
        uintx ErgoHeapSizeLimit                         = 0
{product}
        uintx HeapSizePerGCThread                       = 87241520
 {product}
        uintx InitialHeapSize                          := 1054841216
 {product}
        uintx LargePageHeapSizeThreshold                = 134217728
{product}
        uintx MaxHeapSize                              := 16877879296
{product}
        uintx MaxPermSize                               = 174063616
{pd product}
        uintx PermSize                                  = 21757952
 {pd product}
         intx ThreadStackSize                           = 1024
 {pd product}
         intx VMThreadStackSize                         = 1024
 {pd product}
    java version "1.7.0_75"
    OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
    OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)


Also does OS setup restricts the amount of space for mmap file? I am using

    Distributor ID: Scientific
    Description: Scientific Linux release 6.6 (Carbon)

If yes, can you please direct me as to how can I increase the space of
memory mapped files?



On Thu, Apr 9, 2015 at 11:11 PM, Andy Seaborne <an...@apache.org> wrote:

> On 09/04/15 14:57, Maria Jackson wrote:
>
>> Hi,
>>
>> I am trying to load dbpedia.nq in Jena using tdbloader. The loading seems
>> to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
>> has. Also I have set heap size to 2GB. Is there some way by which I may
>> speed-up loading of quads in Jena?
>>
>>
> What does the log say? Starting rate and as it progresses?
>
> When you say "1% of 96GB RAM" does that include all the process RAM (not
> just heap) because TDB uses out-of-heap RAM for memory mapped files.
>
> The other fact is whether your OS setup restricts the amount of space for
> mmap files.
>
>         Andy
>
>

Re: Loading quads in Jena

Posted by Andy Seaborne <an...@apache.org>.
On 09/04/15 14:57, Maria Jackson wrote:
> Hi,
>
> I am trying to load dbpedia.nq in Jena using tdbloader. The loading seems
> to be pretty slow, it is consuming only 1% of 96GB RAM which may machine
> has. Also I have set heap size to 2GB. Is there some way by which I may
> speed-up loading of quads in Jena?
>

What does the log say? Starting rate and as it progresses?

When you say "1% of 96GB RAM" does that include all the process RAM (not 
just heap) because TDB uses out-of-heap RAM for memory mapped files.

The other fact is whether your OS setup restricts the amount of space 
for mmap files.

	Andy