You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Josh Elser <jo...@gmail.com> on 2013/11/05 16:24:49 UTC
MultipleInputs with AccumuloInputFormat
In executing some MapReduce over Accumulo with the AccumuloInputFormat,
I came to the realization that AIF fundamentally doesn't work with
concepts like MultipleInputs in Hadoop
(http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html).
Given that you can only write one set of configuration for AIF into a
Configuration object, there's not a mechanism to support multiple. This
appears to be the case across all versions.
Is this correct? Have I overlooked something?
Re: MultipleInputs with AccumuloInputFormat
Posted by Christopher <ct...@apache.org>.
Are there any other analogous InputFormats that use multiple static
methods in a stateless way to configure a job?
--
Christopher L Tubbs II
http://gravatar.com/ctubbsii
On Tue, Nov 5, 2013 at 12:15 PM, Josh Elser <jo...@gmail.com> wrote:
> Heh, ok.
>
> I'm currently working through a bit of a prototype to see how it works.
>
> I'm not a mapred/mapreduce expert, but I *think* I have an approach that
> will work. Keep an eye out for a Jira -- would love feedback.
>
>
> On 11/5/13, 12:13 PM, Kevin Faro wrote:
>>
>> I recently looked into that and came to the same realization.
>>
>> I ended up writing a new input format that did the cartesian product of
>> two
>> tables. But to do that I had to store values for the left configuration
>> and right configuration and then copy over whichever config settings I
>> wanted to use for the AIF depending on which split i needed in the
>> RecordReader.
>>
>> It would have been awesome if I could have just used the MultipleInputs
>> ...
>>
>> --Kevin
>>
>>
>> On Tue, Nov 5, 2013 at 10:24 AM, Josh Elser <jo...@gmail.com> wrote:
>>
>>> In executing some MapReduce over Accumulo with the AccumuloInputFormat, I
>>> came to the realization that AIF fundamentally doesn't work with concepts
>>> like MultipleInputs in Hadoop (http://hadoop.apache.org/
>>>
>>> docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html).
>>> Given that you can only write one set of configuration for AIF into a
>>> Configuration object, there's not a mechanism to support multiple. This
>>> appears to be the case across all versions.
>>>
>>> Is this correct? Have I overlooked something?
>>>
>>
>
Re: MultipleInputs with AccumuloInputFormat
Posted by Josh Elser <jo...@gmail.com>.
Heh, ok.
I'm currently working through a bit of a prototype to see how it works.
I'm not a mapred/mapreduce expert, but I *think* I have an approach that
will work. Keep an eye out for a Jira -- would love feedback.
On 11/5/13, 12:13 PM, Kevin Faro wrote:
> I recently looked into that and came to the same realization.
>
> I ended up writing a new input format that did the cartesian product of two
> tables. But to do that I had to store values for the left configuration
> and right configuration and then copy over whichever config settings I
> wanted to use for the AIF depending on which split i needed in the
> RecordReader.
>
> It would have been awesome if I could have just used the MultipleInputs ...
>
> --Kevin
>
>
> On Tue, Nov 5, 2013 at 10:24 AM, Josh Elser <jo...@gmail.com> wrote:
>
>> In executing some MapReduce over Accumulo with the AccumuloInputFormat, I
>> came to the realization that AIF fundamentally doesn't work with concepts
>> like MultipleInputs in Hadoop (http://hadoop.apache.org/
>> docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html).
>> Given that you can only write one set of configuration for AIF into a
>> Configuration object, there's not a mechanism to support multiple. This
>> appears to be the case across all versions.
>>
>> Is this correct? Have I overlooked something?
>>
>
Re: MultipleInputs with AccumuloInputFormat
Posted by Kevin Faro <ke...@gmail.com>.
I recently looked into that and came to the same realization.
I ended up writing a new input format that did the cartesian product of two
tables. But to do that I had to store values for the left configuration
and right configuration and then copy over whichever config settings I
wanted to use for the AIF depending on which split i needed in the
RecordReader.
It would have been awesome if I could have just used the MultipleInputs ...
--Kevin
On Tue, Nov 5, 2013 at 10:24 AM, Josh Elser <jo...@gmail.com> wrote:
> In executing some MapReduce over Accumulo with the AccumuloInputFormat, I
> came to the realization that AIF fundamentally doesn't work with concepts
> like MultipleInputs in Hadoop (http://hadoop.apache.org/
> docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html).
> Given that you can only write one set of configuration for AIF into a
> Configuration object, there's not a mechanism to support multiple. This
> appears to be the case across all versions.
>
> Is this correct? Have I overlooked something?
>