You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Flavio Pompermaier <po...@okkam.it> on 2014/09/09 10:36:03 UTC

Mapreduce job

Hi to all,

I'd like to know which is the correct way to run a mapreduce job on a table
managed by phoenix to put data in another table (always managed by Phoenix).
Is it sufficient to read data contained in column 0 (like 0:id, 0:value)
and create insert statements in the reducer to put things correctly in the
output table?
Should I filter rows containing some special value for ccolumn 0:_0..?

Best,
FP

Re: Mapreduce job

Posted by Vikas Agarwal <vi...@infoobjects.com>.

I have a question regarding map reduce jobs reading/writing a phoenix
table. How would data locality come into picture when we are running
mapper/reducers with phoenix/hbase instead of directly reading data from
raw HDFS?


On Thu, Sep 11, 2014 at 8:27 PM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Gustavo Anatoly helped me a lot here :)
> Sorry for chatting off the mailing list...
>
> Summarizing:
> James Taylor answered why there's a (non-intuitive) _0 column qualifier in
> the default column family here:
> https://groups.google.com/forum/#!topic/phoenix-hbase-user/wCeljAvLekc.
> Thus, if I understood correctly, I can avoid to care about that field
> (unless I perform inserts by-passing Phoenix).
>
> The other doubt I had was about how to read arrays from byte[] in
> mapreduce jobs. Fortunately, that is not very difficult if I know the type
> in advance (es VARCHAR array). For example:
>
> public void map(final ImmutableBytesWritable rowKey, final Result columns,
> final Context context)
> throws IOException, InterruptedException {
> ...
>                final byte[] bytes =
> columns.getValue(Bytes.toBytes(columnFamily)), Bytes.toBytes(columnQual));
>          final PhoenixArray resultArr = (PhoenixArray)
> PDataType.VARCHAR_ARRAY.toObject(bytes, 0, bytes.length);
>
> Thanks again to Gustavo for the help,
> Flavio
>
> On Thu, Sep 11, 2014 at 4:31 PM, Krishna <re...@gmail.com> wrote:
>
>> I assume you are referring to the bulk loader. "-a" option allows you to
>> pass array delimiter.
>>
>>
>> On Thursday, September 11, 2014, Flavio Pompermaier <po...@okkam.it>
>> wrote:
>>
>>> Any help about this..?
>>> What if I save a field as an array? how could I read it from a mapreduce
>>> job? Is there a separator char to use for splitting or what?
>>>
>>> On Tue, Sep 9, 2014 at 10:36 AM, Flavio Pompermaier <
>>> pompermaier@okkam.it> wrote:
>>>
>>>> Hi to all,
>>>>
>>>> I'd like to know which is the correct way to run a mapreduce job on a
>>>> table managed by phoenix to put data in another table (always managed by
>>>> Phoenix).
>>>> Is it sufficient to read data contained in column 0 (like 0:id,
>>>> 0:value) and create insert statements in the reducer to put things
>>>> correctly in the output table?
>>>> Should I filter rows containing some special value for ccolumn 0:_0..?
>>>>
>>>> Best,
>>>> FP
>>>>
>>>


-- 
Regards,
Vikas Agarwal
91 – 9928301411

InfoObjects, Inc.
Execution Matters
http://www.infoobjects.com
2041 Mission College Boulevard, #280
Santa Clara, CA 95054
+1 (408) 988-2000 Work
+1 (408) 716-2726 Fax

Re: Mapreduce job

Posted by Flavio Pompermaier <po...@okkam.it>.

Gustavo Anatoly helped me a lot here :)
Sorry for chatting off the mailing list...

Summarizing:
James Taylor answered why there's a (non-intuitive) _0 column qualifier in
the default column family here:
https://groups.google.com/forum/#!topic/phoenix-hbase-user/wCeljAvLekc.
Thus, if I understood correctly, I can avoid to care about that field
(unless I perform inserts by-passing Phoenix).

The other doubt I had was about how to read arrays from byte[] in mapreduce
jobs. Fortunately, that is not very difficult if I know the type in advance
(es VARCHAR array). For example:

public void map(final ImmutableBytesWritable rowKey, final Result columns,
final Context context)
throws IOException, InterruptedException {
...
               final byte[] bytes =
columns.getValue(Bytes.toBytes(columnFamily)), Bytes.toBytes(columnQual));
         final PhoenixArray resultArr = (PhoenixArray)
PDataType.VARCHAR_ARRAY.toObject(bytes, 0, bytes.length);

Thanks again to Gustavo for the help,
Flavio

On Thu, Sep 11, 2014 at 4:31 PM, Krishna <re...@gmail.com> wrote:

> I assume you are referring to the bulk loader. "-a" option allows you to
> pass array delimiter.
>
>
> On Thursday, September 11, 2014, Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Any help about this..?
>> What if I save a field as an array? how could I read it from a mapreduce
>> job? Is there a separator char to use for splitting or what?
>>
>> On Tue, Sep 9, 2014 at 10:36 AM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Hi to all,
>>>
>>> I'd like to know which is the correct way to run a mapreduce job on a
>>> table managed by phoenix to put data in another table (always managed by
>>> Phoenix).
>>> Is it sufficient to read data contained in column 0 (like 0:id, 0:value)
>>> and create insert statements in the reducer to put things correctly in the
>>> output table?
>>> Should I filter rows containing some special value for ccolumn 0:_0..?
>>>
>>> Best,
>>> FP
>>>
>>

Re: Mapreduce job

Posted by Krishna <re...@gmail.com>.

I assume you are referring to the bulk loader. "-a" option allows you to
pass array delimiter.

On Thursday, September 11, 2014, Flavio Pompermaier <po...@okkam.it>
wrote:

> Any help about this..?
> What if I save a field as an array? how could I read it from a mapreduce
> job? Is there a separator char to use for splitting or what?
>
> On Tue, Sep 9, 2014 at 10:36 AM, Flavio Pompermaier <pompermaier@okkam.it
> <javascript:_e(%7B%7D,'cvml','pompermaier@okkam.it');>> wrote:
>
>> Hi to all,
>>
>> I'd like to know which is the correct way to run a mapreduce job on a
>> table managed by phoenix to put data in another table (always managed by
>> Phoenix).
>> Is it sufficient to read data contained in column 0 (like 0:id, 0:value)
>> and create insert statements in the reducer to put things correctly in the
>> output table?
>> Should I filter rows containing some special value for ccolumn 0:_0..?
>>
>> Best,
>> FP
>>
>

Re: Mapreduce job

Posted by Flavio Pompermaier <po...@okkam.it>.

Any help about this..?
What if I save a field as an array? how could I read it from a mapreduce
job? Is there a separator char to use for splitting or what?

On Tue, Sep 9, 2014 at 10:36 AM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Hi to all,
>
> I'd like to know which is the correct way to run a mapreduce job on a
> table managed by phoenix to put data in another table (always managed by
> Phoenix).
> Is it sufficient to read data contained in column 0 (like 0:id, 0:value)
> and create insert statements in the reducer to put things correctly in the
> output table?
> Should I filter rows containing some special value for ccolumn 0:_0..?
>
> Best,
> FP
>