You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Slater, David M." <Da...@jhuapl.edu> on 2013/03/28 16:45:41 UTC

Distributed Cache - for iterators?

Hey everyone,

In Hadoop Map Reduce, the Configuration class can pass String parameters (via the Context argument to map and reduce). Likewise, the Map<String, String> options argument in Iterator init allows the same functionality for Accumulo iterators.

However, for more complex parameters, Hadoop has a DistributedCache which is available to all of the mappers and reducers. Is there any similar functionality for Accumulo iterators, or does all of the information need to be sent as a String through options?

Also, are there any problems with sending exceptionally long Strings in the options argument?

Thanks,
David

Re: Distributed Cache - for iterators?

Posted by Eric Newton <er...@gmail.com>.
He might.  I know users who send a lot of configuration data to their
iterators.  It's quite ugly when viewed with "listscans" in the shell.  If
you are thinking of passing more than a megabyte, maybe its better to send
it through a side channel like HDFS.




On Thu, Mar 28, 2013 at 12:03 PM, Keith Turner <ke...@deenlo.com> wrote:

> On Thu, Mar 28, 2013 at 11:45 AM, Slater, David M.
> <Da...@jhuapl.edu> wrote:
> > Hey everyone,
> >
> >
> >
> > In Hadoop Map Reduce, the Configuration class can pass String parameters
> > (via the Context argument to map and reduce). Likewise, the Map<String,
> > String> options argument in Iterator init allows the same functionality
> for
> > Accumulo iterators.
> >
> >
> >
> > However, for more complex parameters, Hadoop has a DistributedCache
> which is
> > available to all of the mappers and reducers. Is there any similar
> > functionality for Accumulo iterators, or does all of the information
> need to
> > be sent as a String through options?
>
> Accumulo does not provide anything out of the box.  I wonder if
> putting a file in HDFS w/ a high replication factor would be a good
> way to pass this info.
>
> >
> >
> >
> > Also, are there any problems with sending exceptionally long Strings in
> the
> > options argument?
>
> Does anyone know if David would run into issues similar to ACCUMULO-1141?
>
> >
> >
> >
> > Thanks,
> > David
>

Re: Distributed Cache - for iterators?

Posted by Keith Turner <ke...@deenlo.com>.
On Thu, Mar 28, 2013 at 11:45 AM, Slater, David M.
<Da...@jhuapl.edu> wrote:
> Hey everyone,
>
>
>
> In Hadoop Map Reduce, the Configuration class can pass String parameters
> (via the Context argument to map and reduce). Likewise, the Map<String,
> String> options argument in Iterator init allows the same functionality for
> Accumulo iterators.
>
>
>
> However, for more complex parameters, Hadoop has a DistributedCache which is
> available to all of the mappers and reducers. Is there any similar
> functionality for Accumulo iterators, or does all of the information need to
> be sent as a String through options?

Accumulo does not provide anything out of the box.  I wonder if
putting a file in HDFS w/ a high replication factor would be a good
way to pass this info.

>
>
>
> Also, are there any problems with sending exceptionally long Strings in the
> options argument?

Does anyone know if David would run into issues similar to ACCUMULO-1141?

>
>
>
> Thanks,
> David