You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2011/02/22 21:05:29 UTC

Re: mapreduce streaming with hbase as a source

(moving to the hbase user ML)

I think streaming used to work correctly in hbase 0.19 since the
RowResult class was giving the value (which you had to parse out), but
now that Result is made of KeyValue and they don't include the values
in toString then I don't see how TableInputFormat could be used. You
could write your own InputFormat that wraps around TIF that returns a
specific format for each cell tho.

Hope that somehow helps,

J-D

2011/2/19 Ondrej Holecek <on...@holecek.eu>:
> I don't think you understand me correctly,
>
> I get this line:
>
> 72 6f 77 31     keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
> row1/family1:b/1298037744658/Put/vlen=1, row1/family1:c/1298037748020/Put/vlen=1}
>
> I know "72 6f 77 31" is the key and the rest is value, let's call it
> mapreduce-value. In this mapreduce-value there is
> "row1/family1:a/1298037737154/Put/vlen=1" that is hbase-row name, hbase-column
> name and hbase-timestamp.  But I expect also hbase-value.
>
> So my question is what to do to make TableInputFormat to send also this hbase-value.
>
>
> Ondrej
>
>
> On 02/19/11 16:41, ShengChang Gu wrote:
>> By default, the prefix of a line
>> up to the first tab character is the key and the rest of the line
>> (excluding the tab character)
>> will be the value. If there is no tab character in the line, then entire
>> line is considered as key
>> and the value is null. However, this can be customized, Use:
>>
>> -D stream.map.output.field.separator=.
>> -D stream.num.map.output.key.fields=4
>>
>> 2011/2/19 Ondrej Holecek <ondrej@holecek.eu <ma...@holecek.eu>>
>>
>>     Thank you, I've spend a lot of time with debuging but didn't notice
>>     this typo :(
>>
>>     Now it works, but I don't understand one thing: On stdin I get this:
>>
>>     72 6f 77 31     keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
>>     row1/family1:b/1298037744658/Put/vlen=1,
>>     row1/family1:c/1298037748020/Put/vlen=1}
>>     72 6f 77 32     keyvalues={row2/family1:a/1298037755440/Put/vlen=2,
>>     row2/family1:b/1298037758241/Put/vlen=2,
>>     row2/family1:c/1298037761198/Put/vlen=2}
>>     72 6f 77 33     keyvalues={row3/family1:a/1298037767127/Put/vlen=3,
>>     row3/family1:b/1298037770111/Put/vlen=3,
>>     row3/family1:c/1298037774954/Put/vlen=3}
>>
>>     I see there is everything but value. What should I do to get value
>>     on stdin too?
>>
>>     Ondrej
>>
>>     On 02/18/11 20:01, Jean-Daniel Cryans wrote:
>>     > You have a typo, it's hbase.mapred.tablecolumns not
>>     hbase.mapred.tablecolumn
>>     >
>>     > J-D
>>     >
>>     > On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek <ondrej@holecek.eu
>>     <ma...@holecek.eu>> wrote:
>>     >> Hello,
>>     >>
>>     >> I'm testing hadoop and hbase, I can run mapreduce streaming or
>>     pipes jobs agains text files on
>>     >> hadoop, but I have a problem when I try to run the same job
>>     against hbase table.
>>     >>
>>     >> The table looks like this:
>>     >> hbase(main):015:0> scan 'table1'
>>     >> ROW                                                COLUMN+CELL
>>     >>
>>     >>  row1
>>      column=family1:a, timestamp=1298037737154,
>>     >> value=1
>>     >>
>>     >>  row1
>>      column=family1:b, timestamp=1298037744658,
>>     >> value=2
>>     >>
>>     >>  row1
>>      column=family1:c, timestamp=1298037748020,
>>     >> value=3
>>     >>
>>     >>  row2
>>      column=family1:a, timestamp=1298037755440,
>>     >> value=11
>>     >>
>>     >>  row2
>>      column=family1:b, timestamp=1298037758241,
>>     >> value=22
>>     >>
>>     >>  row2
>>      column=family1:c, timestamp=1298037761198,
>>     >> value=33
>>     >>
>>     >>  row3
>>      column=family1:a, timestamp=1298037767127,
>>     >> value=111
>>     >>
>>     >>  row3
>>      column=family1:b, timestamp=1298037770111,
>>     >> value=222
>>     >>
>>     >>  row3
>>      column=family1:c, timestamp=1298037774954,
>>     >> value=333
>>     >>
>>     >> 3 row(s) in 0.0240 seconds
>>     >>
>>     >>
>>     >> And command I use, with the exception I get:
>>     >>
>>     >> # hadoop jar
>>     /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D
>>     >> hbase.mapred.tablecolumn=family1:  -input table1 -output
>>     /mtestout45 -mapper test-map
>>     >> -numReduceTasks 1 -reducer test-reduce -inputformat
>>     org.apache.hadoop.hbase.mapred.TableInputFormat
>>     >>
>>     >> packageJobJar:
>>     [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] []
>>     >> /tmp/streamjob8218197708173702571.jar tmpDir=null
>>     >> 11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area
>>     >>
>>     hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
>>     <http://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035>
>>     >> Exception in thread "main" java.lang.RuntimeException: Error in
>>     configuring object
>>     >>        at
>>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>     >>        at
>>     org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>     >>        at
>>     org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>     >>        at
>>     org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:597)
>>     >>        at
>>     org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:926)
>>     >>        at
>>     org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:918)
>>     >>        at
>>     org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>>     >>        at
>>     org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:834)
>>     >>        at
>>     org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793)
>>     >>        at java.security.AccessController.doPrivileged(Native Method)
>>     >>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>     >>        at
>>     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
>>     >>        at
>>     org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793)
>>     >>        at
>>     org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:767)
>>     >>        at
>>     org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:922)
>>     >>        at
>>     org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123)
>>     >>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>     >>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>     >>        at
>>     org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
>>     >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     >>        at
>>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     >>        at
>>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     >>        at java.lang.reflect.Method.invoke(Method.java:597)
>>     >>        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>     >> Caused by: java.lang.reflect.InvocationTargetException
>>     >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     >>        at
>>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>     >>        at
>>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>     >>        at java.lang.reflect.Method.invoke(Method.java:597)
>>     >>        at
>>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>     >>        ... 23 more
>>     >> Caused by: java.lang.NullPointerException
>>     >>        at
>>     org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)
>>     >>        ... 28 more
>>     >>
>>     >>
>>     >> Can anyone tell me what I am doing wrong?
>>     >>
>>     >> Regards,
>>     >> Ondrej
>>     >>
>>
>>
>>
>>
>> --
>> 阿昌
>
>