You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Sanjay Deshmukh <sa...@gmail.com> on 2021/05/03 00:39:03 UTC

Iterator Parameter Length/Format

Is there a limit to how long an Iterator parameter can be? And does it have
to be a String, or is there a way to send arbitrary binary without encoding
it in a string?


-- 
Sanjay Deshmukh
sanjayd@gmail.com

Re: Iterator Parameter Length/Format

Posted by Sanjay Deshmukh <sa...@gmail.com>.
Looks like passing it in through the client is working. This is huge for
our system - it brought a 30s operation down to a couple hundred ms. The
only problem I have left to solve is the iterator parameters are all being
written to the log file with every query including the 17k base64 encoded
string. Is there a good way to keep that from happening to keep the log
files down to a reasonable size? Can I hook into the logging to truncate
that parameter, or some other way?

On Mon, May 3, 2021 at 2:16 PM Christopher <ct...@apache.org> wrote:

> It really depends on your specific system. I think you could try
> passing it in via the client as you are now, but if you want to
> experiment storing it elsewhere (like in ZooKeeper or on HDFS, or on
> an external REST endpoint or whatever), you could modify your iterator
> to accept a String containing the location of the content, rather than
> the content itself. It's just an idea to experiment with, if your
> first attempt doesn't meet your needs.
>
> On Mon, May 3, 2021 at 8:27 AM Sanjay Deshmukh <sa...@gmail.com> wrote:
> >
> > Ok, that makes sense. This is a scan-time iterator. The data will be
> about 18kb in length. The data is a new row that was just inserted into the
> table, that's going to be used as input in a computation with a bunch of
> other rows. Are you saying it'd be better to read that row in the init
> method of the Iterator on the tablet servers, vs passing the data in from
> the client? What's the best way to do that?
> >
> > On Sun, May 2, 2021 at 10:32 PM Christopher <ct...@apache.org> wrote:
> >>
> >> Iterator parameters are passed as strings, so you have to encode
> >> binary data if you need to send that. The limit should be reasonable,
> >> but there's no hard-coded limit. If you are storing options for an
> >> iterator configured on a table, one would expect it to be able to be
> >> small enough to be stored easily in a ZooKeeper node, possibly
> >> alongside other options and table configuration, so it shouldn't be
> >> big enough to make ZooKeeper have trouble. If you are passing the
> >> option as part of a scan-time iterator, it should fit in an RPC
> >> message over Thrift.
> >>
> >> I would keep them small (a few hundred characters or less), no more
> >> than a few thousand, if you really need to. If you need to pass large
> >> parameters, consider passing them indirectly, like passing the name of
> >> a file in HDFS that stores the binary data that the iterator reads
> >> when initialized.
> >>
> >> Your experience will vary depending on the configuration of your
> >> system's components and the hardware resources your machine has
> >> available.
> >>
> >> On Sun, May 2, 2021 at 8:39 PM Sanjay Deshmukh <sa...@gmail.com>
> wrote:
> >> >
> >> > Is there a limit to how long an Iterator parameter can be? And does
> it have to be a String, or is there a way to send arbitrary binary without
> encoding it in a string?
> >> >
> >> >
> >> > --
> >> > Sanjay Deshmukh
> >> > sanjayd@gmail.com
> >
> >
> >
> > --
> > Sanjay Deshmukh
> > sanjayd@gmail.com
>


-- 
Sanjay Deshmukh
sanjayd@gmail.com

Re: Iterator Parameter Length/Format

Posted by Christopher <ct...@apache.org>.
It really depends on your specific system. I think you could try
passing it in via the client as you are now, but if you want to
experiment storing it elsewhere (like in ZooKeeper or on HDFS, or on
an external REST endpoint or whatever), you could modify your iterator
to accept a String containing the location of the content, rather than
the content itself. It's just an idea to experiment with, if your
first attempt doesn't meet your needs.

On Mon, May 3, 2021 at 8:27 AM Sanjay Deshmukh <sa...@gmail.com> wrote:
>
> Ok, that makes sense. This is a scan-time iterator. The data will be about 18kb in length. The data is a new row that was just inserted into the table, that's going to be used as input in a computation with a bunch of other rows. Are you saying it'd be better to read that row in the init method of the Iterator on the tablet servers, vs passing the data in from the client? What's the best way to do that?
>
> On Sun, May 2, 2021 at 10:32 PM Christopher <ct...@apache.org> wrote:
>>
>> Iterator parameters are passed as strings, so you have to encode
>> binary data if you need to send that. The limit should be reasonable,
>> but there's no hard-coded limit. If you are storing options for an
>> iterator configured on a table, one would expect it to be able to be
>> small enough to be stored easily in a ZooKeeper node, possibly
>> alongside other options and table configuration, so it shouldn't be
>> big enough to make ZooKeeper have trouble. If you are passing the
>> option as part of a scan-time iterator, it should fit in an RPC
>> message over Thrift.
>>
>> I would keep them small (a few hundred characters or less), no more
>> than a few thousand, if you really need to. If you need to pass large
>> parameters, consider passing them indirectly, like passing the name of
>> a file in HDFS that stores the binary data that the iterator reads
>> when initialized.
>>
>> Your experience will vary depending on the configuration of your
>> system's components and the hardware resources your machine has
>> available.
>>
>> On Sun, May 2, 2021 at 8:39 PM Sanjay Deshmukh <sa...@gmail.com> wrote:
>> >
>> > Is there a limit to how long an Iterator parameter can be? And does it have to be a String, or is there a way to send arbitrary binary without encoding it in a string?
>> >
>> >
>> > --
>> > Sanjay Deshmukh
>> > sanjayd@gmail.com
>
>
>
> --
> Sanjay Deshmukh
> sanjayd@gmail.com

Re: Iterator Parameter Length/Format

Posted by Sanjay Deshmukh <sa...@gmail.com>.
Ok, that makes sense. This is a scan-time iterator. The data will be about
18kb in length. The data is a new row that was just inserted into the
table, that's going to be used as input in a computation with a bunch of
other rows. Are you saying it'd be better to read that row in the init
method of the Iterator on the tablet servers, vs passing the data in from
the client? What's the best way to do that?

On Sun, May 2, 2021 at 10:32 PM Christopher <ct...@apache.org> wrote:

> Iterator parameters are passed as strings, so you have to encode
> binary data if you need to send that. The limit should be reasonable,
> but there's no hard-coded limit. If you are storing options for an
> iterator configured on a table, one would expect it to be able to be
> small enough to be stored easily in a ZooKeeper node, possibly
> alongside other options and table configuration, so it shouldn't be
> big enough to make ZooKeeper have trouble. If you are passing the
> option as part of a scan-time iterator, it should fit in an RPC
> message over Thrift.
>
> I would keep them small (a few hundred characters or less), no more
> than a few thousand, if you really need to. If you need to pass large
> parameters, consider passing them indirectly, like passing the name of
> a file in HDFS that stores the binary data that the iterator reads
> when initialized.
>
> Your experience will vary depending on the configuration of your
> system's components and the hardware resources your machine has
> available.
>
> On Sun, May 2, 2021 at 8:39 PM Sanjay Deshmukh <sa...@gmail.com> wrote:
> >
> > Is there a limit to how long an Iterator parameter can be? And does it
> have to be a String, or is there a way to send arbitrary binary without
> encoding it in a string?
> >
> >
> > --
> > Sanjay Deshmukh
> > sanjayd@gmail.com
>


-- 
Sanjay Deshmukh
sanjayd@gmail.com

Re: Iterator Parameter Length/Format

Posted by Christopher <ct...@apache.org>.
Iterator parameters are passed as strings, so you have to encode
binary data if you need to send that. The limit should be reasonable,
but there's no hard-coded limit. If you are storing options for an
iterator configured on a table, one would expect it to be able to be
small enough to be stored easily in a ZooKeeper node, possibly
alongside other options and table configuration, so it shouldn't be
big enough to make ZooKeeper have trouble. If you are passing the
option as part of a scan-time iterator, it should fit in an RPC
message over Thrift.

I would keep them small (a few hundred characters or less), no more
than a few thousand, if you really need to. If you need to pass large
parameters, consider passing them indirectly, like passing the name of
a file in HDFS that stores the binary data that the iterator reads
when initialized.

Your experience will vary depending on the configuration of your
system's components and the hardware resources your machine has
available.

On Sun, May 2, 2021 at 8:39 PM Sanjay Deshmukh <sa...@gmail.com> wrote:
>
> Is there a limit to how long an Iterator parameter can be? And does it have to be a String, or is there a way to send arbitrary binary without encoding it in a string?
>
>
> --
> Sanjay Deshmukh
> sanjayd@gmail.com