You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Erman Korkut (BLOOMBERG/ 120 PARK)" <ek...@bloomberg.net> on 2018/01/19 21:14:56 UTC

riot/jena - limiting CPU and memory usage

Hi all,

I was running riot to generate ttl files with 50+ millions triples in it in a production machine and it took over the entire cpu resources of that box, basically taking down the entire machine. Is there a way to limit parallelization that it leverages and/or cpu (and memory usage) and alike? 

I am going to be using cgroups (a linux kernel feature) to limit resources allocated to riot processes, but although I know it is a long shot, I wanted to check if there is a similar setting in riot/jena itself.

Thanks,
Erman 

Re: riot/jena - limiting CPU and memory usage

Posted by Andy Seaborne <an...@apache.org>.
Parsing is single-threaded - if Java is taking over multiple CPU cores, 
it could be because it is short of RAM and the parallel garbage 
collector is cutting in.  Increasing RAM, even by small amounts, usually 
fixes this.

In addition, TTL, the pretty form is computationally expensive, as well 
as needing memory working space, so try --stream.

Parsing for most formats is CPU-bound - the I/O tends not to be the 
limitation because the file is read sequentially in large chunks, and 
files are often produced originally sequentially so on disk the blocks 
tend to be laid out quite nicely.

As ajs6f asks - what's the machine? what's the input?

     Andy

On 19/01/18 21:17, ajs6f wrote:
> Can you please tell us a bit about the machine? How many CPUs does it have / how many cores? What are you actually doing with riot? (riot can't generate RDF. It only translates it, so what are the original files from which you are working.)
> 
> NTriples is almost always the least resource-intensive output format.
> 
> ajs6f
> 
>> On Jan 19, 2018, at 4:14 PM, Erman Korkut (BLOOMBERG/ 120 PARK) <ek...@bloomberg.net> wrote:
>>
>> Hi all,
>>
>> I was running riot to generate ttl files with 50+ millions triples in it in a production machine and it took over the entire cpu resources of that box, basically taking down the entire machine. Is there a way to limit parallelization that it leverages and/or cpu (and memory usage) and alike?
>>
>> I am going to be using cgroups (a linux kernel feature) to limit resources allocated to riot processes, but although I know it is a long shot, I wanted to check if there is a similar setting in riot/jena itself.
>>
>> Thanks,
>> Erman
> 

Re: riot/jena - limiting CPU and memory usage

Posted by ajs6f <aj...@apache.org>.
Can you please tell us a bit about the machine? How many CPUs does it have / how many cores? What are you actually doing with riot? (riot can't generate RDF. It only translates it, so what are the original files from which you are working.)

NTriples is almost always the least resource-intensive output format.

ajs6f

> On Jan 19, 2018, at 4:14 PM, Erman Korkut (BLOOMBERG/ 120 PARK) <ek...@bloomberg.net> wrote:
> 
> Hi all,
> 
> I was running riot to generate ttl files with 50+ millions triples in it in a production machine and it took over the entire cpu resources of that box, basically taking down the entire machine. Is there a way to limit parallelization that it leverages and/or cpu (and memory usage) and alike? 
> 
> I am going to be using cgroups (a linux kernel feature) to limit resources allocated to riot processes, but although I know it is a long shot, I wanted to check if there is a similar setting in riot/jena itself.
> 
> Thanks,
> Erman