You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Massimilian Mattetti <MA...@il.ibm.com> on 2016/11/22 15:36:50 UTC
how to reduce re-seeking rate?
Hi all,
I developed an iterator that pre-loads a set of results every time it
jumps to a new row. Checking the logs (by the way I am using log4j2 as
logging library, but it is not able to locate the log4j2.xml that is
inside the iterator jar, should I put it in a different place?) I noticed
that Accumulo is re-seeking after each key returned by my iterator. This
is killing the performance of my system, is there a way to reduce the rate
in which Accumulo kill and re-seek the iterators?
Thanks
Regards,
Massimiliano Mattetti
Re: how to reduce re-seeking rate?
Posted by Massimilian Mattetti <MA...@il.ibm.com>.
I am not using the HDFS class loader so I need to look in some other place
to find out what the problem is with log4j2.
I increased the size of table.scan.max.memory up to 1MB and it worked. I
do not have big Key-Values but I am using a "kind of document-partitioned
table" in which my iterator is pre-loading a SortedSet of Keys from the
index section of the row and then using them to go over the data section
one key-value pair at time. In order to achieve good performance I need
that my iterator re-calculates the set of Keys from the index as few times
as possible.
Thanks.
Regards,
Max
From: Josh Elser <jo...@gmail.com>
To: user@accumulo.apache.org
Date: 22/11/2016 20:39
Subject: Re: how to reduce re-seeking rate?
There isn't any funny classloading happening in the normal case, so
having the log4j2.xml file in your jar should be sufficient. Caveat is
if you're using the HDFS classloading stuff, but that's something you
would have enabled by hand if you're using it.
I think the scan max memory that Dave pointed out is the only knob for
this one. I don't think we have any other sort of policy that governs
lifecycle of iterators. It's not intended from a framework that
re-instantiation of a batch of results is costly.
Making a guess: are you returning very large Key-Values?
dlmarion@comcast.net wrote:
> In one case, the tserver will send data back to the client when it fills
> its buffer. When this happens, it?s possible that the iterator could be
> torn down and re-seeked to the last key returned. You could increase the
> size of this buffer to see if that helps
> (
http://accumulo.apache.org/1.8/accumulo_user_manual.html#_table_scan_max_memory
)
>
> *From:*Massimilian Mattetti [mailto:MASSIMIL@il.ibm.com]
> *Sent:* Tuesday, November 22, 2016 10:37 AM
> *To:* user@accumulo.apache.org
> *Subject:* how to reduce re-seeking rate?
>
> Hi all,
>
> I developed an iterator that pre-loads a set of results every time it
> jumps to a new row. Checking the logs (by the way I am using log4j2 as
> logging library, but it is not able to locate the log4j2.xml that is
> inside the iterator jar, should I put it in a different place?) I
> noticed that Accumulo is re-seeking after each key returned by my
> iterator. This is killing the performance of my system, is there a way
> to reduce the rate in which Accumulo kill and re-seek the iterators?
> Thanks
>
> Regards,
>
> *Massimiliano Mattetti*
>
>
>
>
>
Re: how to reduce re-seeking rate?
Posted by Josh Elser <jo...@gmail.com>.
There isn't any funny classloading happening in the normal case, so
having the log4j2.xml file in your jar should be sufficient. Caveat is
if you're using the HDFS classloading stuff, but that's something you
would have enabled by hand if you're using it.
I think the scan max memory that Dave pointed out is the only knob for
this one. I don't think we have any other sort of policy that governs
lifecycle of iterators. It's not intended from a framework that
re-instantiation of a batch of results is costly.
Making a guess: are you returning very large Key-Values?
dlmarion@comcast.net wrote:
> In one case, the tserver will send data back to the client when it fills
> its buffer. When this happens, its possible that the iterator could be
> torn down and re-seeked to the last key returned. You could increase the
> size of this buffer to see if that helps
> (http://accumulo.apache.org/1.8/accumulo_user_manual.html#_table_scan_max_memory)
>
> *From:*Massimilian Mattetti [mailto:MASSIMIL@il.ibm.com]
> *Sent:* Tuesday, November 22, 2016 10:37 AM
> *To:* user@accumulo.apache.org
> *Subject:* how to reduce re-seeking rate?
>
> Hi all,
>
> I developed an iterator that pre-loads a set of results every time it
> jumps to a new row. Checking the logs (by the way I am using log4j2 as
> logging library, but it is not able to locate the log4j2.xml that is
> inside the iterator jar, should I put it in a different place?) I
> noticed that Accumulo is re-seeking after each key returned by my
> iterator. This is killing the performance of my system, is there a way
> to reduce the rate in which Accumulo kill and re-seek the iterators?
> Thanks
>
> Regards,
>
> *Massimiliano Mattetti*
>
>
>
>
>
RE: how to reduce re-seeking rate?
Posted by dl...@comcast.net.
In one case, the tserver will send data back to the client when it fills its
buffer. When this happens, it's possible that the iterator could be torn
down and re-seeked to the last key returned. You could increase the size of
this buffer to see if that helps
(http://accumulo.apache.org/1.8/accumulo_user_manual.html#_table_scan_max_me
mory)
From: Massimilian Mattetti [mailto:MASSIMIL@il.ibm.com]
Sent: Tuesday, November 22, 2016 10:37 AM
To: user@accumulo.apache.org
Subject: how to reduce re-seeking rate?
Hi all,
I developed an iterator that pre-loads a set of results every time it jumps
to a new row. Checking the logs (by the way I am using log4j2 as logging
library, but it is not able to locate the log4j2.xml that is inside the
iterator jar, should I put it in a different place?) I noticed that Accumulo
is re-seeking after each key returned by my iterator. This is killing the
performance of my system, is there a way to reduce the rate in which
Accumulo kill and re-seek the iterators?
Thanks
Regards,
Massimiliano Mattetti