You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by byambajargal <by...@gmail.com> on 2011/04/25 14:32:39 UTC

How to store data into hbase by using Pig

Hello guys

I am running cloudere distribution cdh3u0 on my cluster with Pig and Hbase.
i can read data from hbase using the following pig query:

my_data = LOAD 'hbase://table1' using 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1') ;dump my_data

but when i try to store data into hbase as same way the job was failure.

store my_data into 'hbase://table2' using 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1');

the table1 and the table2 has same structure and same column.


the table i have:

hbase(main):029:0* scan 'table1'
ROW                 COLUMN+CELL
  row1               column=cf:1, timestamp=1303731834050, value=value1
  row2               column=cf:1, timestamp=1303731849901, value=value2
  row3               column=cf:1, timestamp=1303731858637, value=value3
3 row(s) in 0.0470 seconds


thanks

Byambajargal

Re: How to store data into hbase by using Pig

Posted by byambajargal <by...@gmail.com>.

Thank you Dmitriy

i have tried as you said

  my_data = LOAD 'hbase://table1' using 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1', '-loadKey') ;
  store my_data into 'hbase://table2' using  
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1','-loadKey');

the first part of relation can read the data with row key but i can not 
store data to hbase.
when i try second part i have got the following error message from log file.


Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable 
to store alias 1
         at 
org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1569)
         at org.apache.pig.PigServer.registerQuery(PigServer.java:523)
         at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:868)
         at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:388)
         at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
         at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
         at org.apache.pig.Main.run(Main.java:465)
         at org.apache.pig.Main.main(Main.java:107)
Caused by: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: 
ERROR 2017: Internal error creating job configuration.
         at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:673)
         at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
         at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
         at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
         at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
         at org.apache.pig.PigServer.execute(PigServer.java:1190)
         at org.apache.pig.PigServer.access$100(PigServer.java:128)
         at org.apache.pig.PigServer$Graph.execute(PigServer.java:1517)
         at 
org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1564)
         ... 8 more
Caused by: java.lang.IllegalArgumentException: 
java.net.URISyntaxException: Relative path in absolute URI: 
hbase://table2_logs
         at org.apache.hadoop.fs.Path.initialize(Path.java:148)
         at org.apache.hadoop.fs.Path.<init>(Path.java:71)
         at org.apache.hadoop.fs.Path.<init>(Path.java:45)
         at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:476)
         ... 16 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
hbase://table2_logs
         at java.net.URI.checkPath(URI.java:1787)
         at java.net.URI.<init>(URI.java:735)
         at org.apache.hadoop.fs.Path.initialize(Path.java:145)
         ... 19 more
================================================================================
~
~
thank you for your help


On 4/25/11 18:26, Dmitriy Ryaboy wrote:
> The first element of the relation you store must be the row key. You aren't
> loading the row key, so load>  store isn't working.
> Try
> my_data = LOAD 'hbase://table1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1', '-loadKey') ;
>
> On Mon, Apr 25, 2011 at 5:32 AM, byambajargal<by...@gmail.com>wrote:
>
>> Hello guys
>>
>> I am running cloudere distribution cdh3u0 on my cluster with Pig and Hbase.
>> i can read data from hbase using the following pig query:
>>
>> my_data = LOAD 'hbase://table1' using
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1') ;dump my_data
>>
>> but when i try to store data into hbase as same way the job was failure.
>>
>> store my_data into 'hbase://table2' using
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1');
>>
>> the table1 and the table2 has same structure and same column.
>>
>>
>> the table i have:
>>
>> hbase(main):029:0* scan 'table1'
>> ROW                 COLUMN+CELL
>>   row1               column=cf:1, timestamp=1303731834050, value=value1
>>   row2               column=cf:1, timestamp=1303731849901, value=value2
>>   row3               column=cf:1, timestamp=1303731858637, value=value3
>> 3 row(s) in 0.0470 seconds
>>
>>
>> thanks
>>
>> Byambajargal
>>
>>
>>

Re: How to store data into hbase by using Pig

Posted by byambajargal <by...@gmail.com>.

I have just remove 'hbase://' from the second part it works fine


thanks

byambajargal

On 4/25/11 18:26, Dmitriy Ryaboy wrote:
> The first element of the relation you store must be the row key. You aren't
> loading the row key, so load>  store isn't working.
> Try
> my_data = LOAD 'hbase://table1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1', '-loadKey') ;
>
> On Mon, Apr 25, 2011 at 5:32 AM, byambajargal<by...@gmail.com>wrote:
>
>> Hello guys
>>
>> I am running cloudere distribution cdh3u0 on my cluster with Pig and Hbase.
>> i can read data from hbase using the following pig query:
>>
>> my_data = LOAD 'hbase://table1' using
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1') ;dump my_data
>>
>> but when i try to store data into hbase as same way the job was failure.
>>
>> store my_data into 'hbase://table2' using
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1');
>>
>> the table1 and the table2 has same structure and same column.
>>
>>
>> the table i have:
>>
>> hbase(main):029:0* scan 'table1'
>> ROW                 COLUMN+CELL
>>   row1               column=cf:1, timestamp=1303731834050, value=value1
>>   row2               column=cf:1, timestamp=1303731849901, value=value2
>>   row3               column=cf:1, timestamp=1303731858637, value=value3
>> 3 row(s) in 0.0470 seconds
>>
>>
>> thanks
>>
>> Byambajargal
>>
>>
>>

Re: How to store data into hbase by using Pig

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

1)
2011-04-27 10:29:32,953 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201104251150_0071
2011-04-27 10:29:32,954 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://haisen11:50030/jobdetails.jsp?jobid=job_201104251150_0071
2011-04-27 10:29:52,654 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_201104251150_0071 has failed! Stop running all dependent jobs

^^^ Look at the job error logs.

2)

generate $0, $2 -- there is no $2, you only loaded two columns ($0 and $1).
Those are the ones you're going to be wanting.

3) loadKey, as the name implies, only applies to loading data, not to
storing it. It doesn't hurt anything to have it there, but it's not actually
doing anything.


D

On Wed, Apr 27, 2011 at 1:38 AM, byambajargal <by...@gmail.com>wrote:

> Hello
>
> I am using pig version pig 0.8.0
>
>
> A = load '/passwd' using PigStorage(':');B = foreach A generate $0 as id,
> $2 as value;dump B;
>
> the result of first part is here:
>
> (twilli,6259)
> (saamodt,6260)
> (hailu268,6261)
> (oddsen,6262)
> (neuhaus,6263)
> (zoila,6264)
> (elinmn,6265)
> (diego,6266)
> (fsudmann,6267)
> (yanliang,6268)
> (nestor,6269)
>
> As i understood the problem is at the second part
>
>
> store B into 'table2' using
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a','-loadKey');
>
> I am suspecting that problem is for row key i am not sure how it can manage
> the row key .
> what i want is first item should be the row key and second item should be
> the column of hbase table.
>
> when i run the query i have got the following result on my task tracker:
>
> grunt> A = load '/passwd' using PigStorage(':');B = foreach A generate $0
> as id, $2 as value;store B into 'table2' using
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a','-loadKey');
> 2011-04-27 10:29:29,785 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: UNKNOWN
> 2011-04-27 10:29:29,785 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2011-04-27 10:29:29,913 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011
> 22:27 GMT
> 2011-04-27 10:29:29,913 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:host.name=haisen10.ux.uis.no
> 2011-04-27 10:29:29,913 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.version=1.6.0_23
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.vendor=Sun Microsystems Inc.
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.home=/opt/jdk1.6.0_23/jre
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client
> environment:java.class.path=/etc/hbase/conf:/usr/lib/pig/bin/../conf:/opt/jdk/lib/tools.jar:/usr/lib/pig/bin/../pig-0.8.0-cdh3u0-core.jar:/usr/lib/pig/bin/../build/pig-*-SNAPSHOT.jar:/usr/lib/pig/bin/../lib/ant-contrib-1.0b3.jar:/usr/lib/pig/bin/../lib/automaton.jar:/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u0.jar:/usr/lib/hadoop/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-net-1.4.1.jar:/usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/hadoop-fairscheduler-0.20.2-cdh3u0.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/lib/jdiff:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-servlet-tester-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/jsp-2.1:/usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.2.2.jar:/usr/lib/hadoop/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/etc/hbase/conf::/usr/lib/hadoop/conf
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client
> environment:java.library.path=/opt/jdk1.6.0_23/jre/lib/amd64/server:/opt/jdk1.6.0_23/jre/lib/amd64:/opt/jdk1.6.0_23/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.io.tmpdir=/tmp
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.compiler=<NA>
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:os.name=Linux
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:os.arch=amd64
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:os.version=2.6.18-194.32.1.el5.centos.plus
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:user.name=haisen
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:user.home=/home/ekstern/haisen
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:user.dir=/import/br1raid6a1c1/haisen
> 2011-04-27 10:29:29,915 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Initiating client connection, connectString=haisen11:2181
> sessionTimeout=180000 watcher=hconnection
> 2011-04-27 10:29:29,923 [main-SendThread()] INFO
>  org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> haisen11/152.94.1.130:2181
> 2011-04-27 10:29:29,926 [main-SendThread(haisen11:2181)] INFO
>  org.apache.zookeeper.ClientCnxn - Socket connection established to
> haisen11/152.94.1.130:2181, initiating session
> 2011-04-27 10:29:29,936 [main-SendThread(haisen11:2181)] INFO
>  org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> haisen11/152.94.1.130:2181, sessionid = 0x12f8c18a1340177, negotiated
> timeout = 40000
> 2011-04-27 10:29:29,972 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Lookedup root region location,
> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@67f31652;
> hsa=haisen10.ux.uis.no:60020
> 2011-04-27 10:29:30,018 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cached location for .META.,,1.1028785192 is haisen10.ux.uis.no:60020
> 2011-04-27 10:29:30,020 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cache hit for row <> in tableName .META.: location server
> haisen10.ux.uis.no:60020, location region name .META.,,1.1028785192
> 2011-04-27 10:29:30,024 [main] DEBUG
> org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at
> row=table2,,00000000000000 for max=10 rows
> 2011-04-27 10:29:30,028 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cached location for
> table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e. is
> haisen6.ux.uis.no:60020
> 2011-04-27 10:29:30,030 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cache hit for row <> in tableName table2: location server
> haisen6.ux.uis.no:60020, location region name
> table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e.
> 2011-04-27 10:29:30,031 [main] INFO
>  org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table
> instance for table2
> 2011-04-27 10:29:30,068 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: B:
> Store(table2:org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a','-loadKey'))
> - scope-6 Operator Key: scope-6)
> 2011-04-27 10:29:30,085 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2011-04-27 10:29:30,122 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2011-04-27 10:29:30,122 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2011-04-27 10:29:30,187 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> to the job
> 2011-04-27 10:29:30,204 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2011-04-27 10:29:31,684 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2011-04-27 10:29:31,709 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2011-04-27 10:29:32,059 [Thread-7] INFO  org.apache.zookeeper.ZooKeeper -
> Initiating client connection, connectString=haisen11:2181
> sessionTimeout=180000 watcher=hconnection
> 2011-04-27 10:29:32,060 [Thread-7-SendThread()] INFO
>  org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> haisen11/152.94.1.130:2181
> 2011-04-27 10:29:32,061 [Thread-7-SendThread(haisen11:2181)] INFO
>  org.apache.zookeeper.ClientCnxn - Socket connection established to
> haisen11/152.94.1.130:2181, initiating session
> 2011-04-27 10:29:32,063 [Thread-7-SendThread(haisen11:2181)] INFO
>  org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> haisen11/152.94.1.130:2181, sessionid = 0x12f8c18a1340178, negotiated
> timeout = 40000
> 2011-04-27 10:29:32,070 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Lookedup root region location,
> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@1f248f2b;
> hsa=haisen10.ux.uis.no:60020
> 2011-04-27 10:29:32,074 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cached location for .META.,,1.1028785192 is haisen10.ux.uis.no:60020
> 2011-04-27 10:29:32,074 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cache hit for row <> in tableName .META.: location server
> haisen10.ux.uis.no:60020, location region name .META.,,1.1028785192
> 2011-04-27 10:29:32,076 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at
> row=table2,,00000000000000 for max=10 rows
> 2011-04-27 10:29:32,080 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cached location for
> table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e. is
> haisen6.ux.uis.no:60020
> 2011-04-27 10:29:32,081 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cache hit for row <> in tableName table2: location server
> haisen6.ux.uis.no:60020, location region name
> table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e.
> 2011-04-27 10:29:32,082 [Thread-7] INFO
>  org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table
> instance for table2
> 2011-04-27 10:29:32,102 [Thread-7] INFO
>  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 2011-04-27 10:29:32,102 [Thread-7] INFO
>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths to process : 1
> 2011-04-27 10:29:32,110 [Thread-7] INFO
>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths (combined) to process : 1
> 2011-04-27 10:29:32,211 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2011-04-27 10:29:32,953 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_201104251150_0071
> 2011-04-27 10:29:32,954 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - More information at:
> http://haisen11:50030/jobdetails.jsp?jobid=job_201104251150_0071
> 2011-04-27 10:29:52,654 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_201104251150_0071 has failed! Stop running all dependent jobs
> 2011-04-27 10:29:52,666 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2011-04-27 10:29:52,674 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2011-04-27 10:29:52,677 [main] INFO  org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
>
> HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt
>  Features
> 0.20.2-cdh3u0   0.8.0-cdh3u0    haisen  2011-04-27 10:29:30     2011-04-27
> 10:29:52     UNKNOWN
>
> Failed!
>
> Failed Jobs:
> JobId   Alias   Feature Message Outputs
> job_201104251150_0071   A,B     MAP_ONLY        Message: Job failed! Error
> - NA table2,
>
> Input(s):
> Failed to read data from "/passwd"
>
> Output(s):
> Failed to produce result in "table2"
>
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
>
> Job DAG:
> job_201104251150_0071
>
>
> 2011-04-27 10:29:52,677 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
>
>
> thank you
>
> Byambajargal
>
>
> On 4/27/11 06:07, Bill Graham wrote:
>
>> What version of Pig are you running and what errors are you seeing on
>> the task trackers?
>>
>> On Tue, Apr 26, 2011 at 4:46 AM, byambajargal<by...@gmail.com>
>>  wrote:
>>
>>> Hello ...
>>> I have a question for you
>>>
>>> I am doing a pig job as following that read from hdfs simply to store
>>> hbase
>>> when i start the job first part works fine and second part was failure.
>>> Could you give me a direction how to move data from hdfs to Hbase
>>>
>>>
>>>  A = load '/passwd' using PigStorage(':');B = foreach A generate $0 as
>>> id,
>>> $2 as value;dump B;
>>>  store B into 'table2' using
>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'cf:a' , '-loadKey');
>>>
>>> thank you for your help
>>>
>>> Byambajargal
>>>
>>>
>>>
>>> On 4/25/11 18:26, Dmitriy Ryaboy wrote:
>>>
>>>> The first element of the relation you store must be the row key. You
>>>> aren't
>>>> loading the row key, so load>    store isn't working.
>>>> Try
>>>> my_data = LOAD 'hbase://table1' using
>>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1', '-loadKey') ;
>>>>
>>>> On Mon, Apr 25, 2011 at 5:32 AM,
>>>> byambajargal<by...@gmail.com>wrote:
>>>>
>>>>  Hello guys
>>>>>
>>>>> I am running cloudere distribution cdh3u0 on my cluster with Pig and
>>>>> Hbase.
>>>>> i can read data from hbase using the following pig query:
>>>>>
>>>>> my_data = LOAD 'hbase://table1' using
>>>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1') ;dump my_data
>>>>>
>>>>> but when i try to store data into hbase as same way the job was
>>>>> failure.
>>>>>
>>>>> store my_data into 'hbase://table2' using
>>>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1');
>>>>>
>>>>> the table1 and the table2 has same structure and same column.
>>>>>
>>>>>
>>>>> the table i have:
>>>>>
>>>>> hbase(main):029:0* scan 'table1'
>>>>> ROW                 COLUMN+CELL
>>>>>  row1               column=cf:1, timestamp=1303731834050, value=value1
>>>>>  row2               column=cf:1, timestamp=1303731849901, value=value2
>>>>>  row3               column=cf:1, timestamp=1303731858637, value=value3
>>>>> 3 row(s) in 0.0470 seconds
>>>>>
>>>>>
>>>>> thanks
>>>>>
>>>>> Byambajargal
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>

Re: How to store data into hbase by using Pig

Posted by byambajargal <by...@gmail.com>.

Hello

I am using pig version pig 0.8.0

A = load '/passwd' using PigStorage(':');B = foreach A generate $0 as id,
$2 as value;dump B;

the result of first part is here:

(twilli,6259)
(saamodt,6260)
(hailu268,6261)
(oddsen,6262)
(neuhaus,6263)
(zoila,6264)
(elinmn,6265)
(diego,6266)
(fsudmann,6267)
(yanliang,6268)
(nestor,6269)

As i understood the problem is at the second part

store B into 'table2' using  
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a','-loadKey');

I am suspecting that problem is for row key i am not sure how it can 
manage the row key .
what i want is first item should be the row key and second item should 
be the column of hbase table.

when i run the query i have got the following result on my task tracker:

grunt> A = load '/passwd' using PigStorage(':');B = foreach A generate 
$0 as id, $2 as value;store B into 'table2' using  
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a','-loadKey');
2011-04-27 10:29:29,785 [main] INFO  
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the 
script: UNKNOWN
2011-04-27 10:29:29,785 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-04-27 10:29:29,913 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 
22:27 GMT
2011-04-27 10:29:29,913 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:host.name=haisen10.ux.uis.no
2011-04-27 10:29:29,913 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:java.version=1.6.0_23
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:java.vendor=Sun Microsystems Inc.
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:java.home=/opt/jdk1.6.0_23/jre
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client 
environment:java.class.path=/etc/hbase/conf:/usr/lib/pig/bin/../conf:/opt/jdk/lib/tools.jar:/usr/lib/pig/bin/../pig-0.8.0-cdh3u0-core.jar:/usr/lib/pig/bin/../build/pig-*-SNAPSHOT.jar:/usr/lib/pig/bin/../lib/ant-contrib-1.0b3.jar:/usr/lib/pig/bin/../lib/automaton.jar:/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u0.jar:/usr/lib/hadoop/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-net-1.4.1.jar:/usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/hadoop-fairscheduler-0.20.2-cdh3u0.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/lib/jdiff:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-servlet-tester-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/jsp-2.1:/usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.2.2.jar:/usr/lib/hadoop/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/etc/hbase/conf::/usr/lib/hadoop/conf
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client 
environment:java.library.path=/opt/jdk1.6.0_23/jre/lib/amd64/server:/opt/jdk1.6.0_23/jre/lib/amd64:/opt/jdk1.6.0_23/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:java.io.tmpdir=/tmp
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:java.compiler=<NA>
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:os.name=Linux
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:os.arch=amd64
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:os.version=2.6.18-194.32.1.el5.centos.plus
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:user.name=haisen
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:user.home=/home/ekstern/haisen
2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Client environment:user.dir=/import/br1raid6a1c1/haisen
2011-04-27 10:29:29,915 [main] INFO  org.apache.zookeeper.ZooKeeper - 
Initiating client connection, connectString=haisen11:2181 
sessionTimeout=180000 watcher=hconnection
2011-04-27 10:29:29,923 [main-SendThread()] INFO  
org.apache.zookeeper.ClientCnxn - Opening socket connection to server 
haisen11/152.94.1.130:2181
2011-04-27 10:29:29,926 [main-SendThread(haisen11:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Socket connection established to 
haisen11/152.94.1.130:2181, initiating session
2011-04-27 10:29:29,936 [main-SendThread(haisen11:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Session establishment complete on 
server haisen11/152.94.1.130:2181, sessionid = 0x12f8c18a1340177, 
negotiated timeout = 40000
2011-04-27 10:29:29,972 [main] DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
- Lookedup root region location, 
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@67f31652; 
hsa=haisen10.ux.uis.no:60020
2011-04-27 10:29:30,018 [main] DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
- Cached location for .META.,,1.1028785192 is haisen10.ux.uis.no:60020
2011-04-27 10:29:30,020 [main] DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
- Cache hit for row <> in tableName .META.: location server 
haisen10.ux.uis.no:60020, location region name .META.,,1.1028785192
2011-04-27 10:29:30,024 [main] DEBUG 
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at 
row=table2,,00000000000000 for max=10 rows
2011-04-27 10:29:30,028 [main] DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
- Cached location for 
table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e. is 
haisen6.ux.uis.no:60020
2011-04-27 10:29:30,030 [main] DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
- Cache hit for row <> in tableName table2: location server 
haisen6.ux.uis.no:60020, location region name 
table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e.
2011-04-27 10:29:30,031 [main] INFO  
org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table 
instance for table2
2011-04-27 10:29:30,068 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: 
B: 
Store(table2:org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a','-loadKey')) 
- scope-6 Operator Key: scope-6)
2011-04-27 10:29:30,085 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler 
- File concatenation threshold: 100 optimistic? false
2011-04-27 10:29:30,122 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer 
- MR plan size before optimization: 1
2011-04-27 10:29:30,122 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer 
- MR plan size after optimization: 1
2011-04-27 10:29:30,187 [main] INFO  
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are 
added to the job
2011-04-27 10:29:30,204 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-04-27 10:29:31,684 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- Setting up single store job
2011-04-27 10:29:31,709 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 1 map-reduce job(s) waiting for submission.
2011-04-27 10:29:32,059 [Thread-7] INFO  org.apache.zookeeper.ZooKeeper 
- Initiating client connection, connectString=haisen11:2181 
sessionTimeout=180000 watcher=hconnection
2011-04-27 10:29:32,060 [Thread-7-SendThread()] INFO  
org.apache.zookeeper.ClientCnxn - Opening socket connection to server 
haisen11/152.94.1.130:2181
2011-04-27 10:29:32,061 [Thread-7-SendThread(haisen11:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Socket connection established to 
haisen11/152.94.1.130:2181, initiating session
2011-04-27 10:29:32,063 [Thread-7-SendThread(haisen11:2181)] INFO  
org.apache.zookeeper.ClientCnxn - Session establishment complete on 
server haisen11/152.94.1.130:2181, sessionid = 0x12f8c18a1340178, 
negotiated timeout = 40000
2011-04-27 10:29:32,070 [Thread-7] DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
- Lookedup root region location, 
connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@1f248f2b; 
hsa=haisen10.ux.uis.no:60020
2011-04-27 10:29:32,074 [Thread-7] DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
- Cached location for .META.,,1.1028785192 is haisen10.ux.uis.no:60020
2011-04-27 10:29:32,074 [Thread-7] DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
- Cache hit for row <> in tableName .META.: location server 
haisen10.ux.uis.no:60020, location region name .META.,,1.1028785192
2011-04-27 10:29:32,076 [Thread-7] DEBUG 
org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at 
row=table2,,00000000000000 for max=10 rows
2011-04-27 10:29:32,080 [Thread-7] DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
- Cached location for 
table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e. is 
haisen6.ux.uis.no:60020
2011-04-27 10:29:32,081 [Thread-7] DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
- Cache hit for row <> in tableName table2: location server 
haisen6.ux.uis.no:60020, location region name 
table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e.
2011-04-27 10:29:32,082 [Thread-7] INFO  
org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table 
instance for table2
2011-04-27 10:29:32,102 [Thread-7] INFO  
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input 
paths to process : 1
2011-04-27 10:29:32,102 [Thread-7] INFO  
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total 
input paths to process : 1
2011-04-27 10:29:32,110 [Thread-7] INFO  
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total 
input paths (combined) to process : 1
2011-04-27 10:29:32,211 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2011-04-27 10:29:32,953 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- HadoopJobId: job_201104251150_0071
2011-04-27 10:29:32,954 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- More information at: 
http://haisen11:50030/jobdetails.jsp?jobid=job_201104251150_0071
2011-04-27 10:29:52,654 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- job job_201104251150_0071 has failed! Stop running all dependent jobs
2011-04-27 10:29:52,666 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2011-04-27 10:29:52,674 [main] ERROR 
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2011-04-27 10:29:52,677 [main] INFO  
org.apache.pig.tools.pigstats.PigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      
Features
0.20.2-cdh3u0   0.8.0-cdh3u0    haisen  2011-04-27 10:29:30     
2011-04-27 10:29:52     UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_201104251150_0071   A,B     MAP_ONLY        Message: Job failed! 
Error - NA table2,

Input(s):
Failed to read data from "/passwd"

Output(s):
Failed to produce result in "table2"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201104251150_0071


2011-04-27 10:29:52,677 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Failed!


thank you

Byambajargal

On 4/27/11 06:07, Bill Graham wrote:
> What version of Pig are you running and what errors are you seeing on
> the task trackers?
>
> On Tue, Apr 26, 2011 at 4:46 AM, byambajargal<by...@gmail.com>  wrote:
>> Hello ...
>> I have a question for you
>>
>> I am doing a pig job as following that read from hdfs simply to store hbase
>> when i start the job first part works fine and second part was failure.
>> Could you give me a direction how to move data from hdfs to Hbase
>>
>>
>>   A = load '/passwd' using PigStorage(':');B = foreach A generate $0 as id,
>> $2 as value;dump B;
>>   store B into 'table2' using
>>   org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'cf:a' , '-loadKey');
>>
>> thank you for your help
>>
>> Byambajargal
>>
>>
>>
>> On 4/25/11 18:26, Dmitriy Ryaboy wrote:
>>> The first element of the relation you store must be the row key. You
>>> aren't
>>> loading the row key, so load>    store isn't working.
>>> Try
>>> my_data = LOAD 'hbase://table1' using
>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1', '-loadKey') ;
>>>
>>> On Mon, Apr 25, 2011 at 5:32 AM,
>>> byambajargal<by...@gmail.com>wrote:
>>>
>>>> Hello guys
>>>>
>>>> I am running cloudere distribution cdh3u0 on my cluster with Pig and
>>>> Hbase.
>>>> i can read data from hbase using the following pig query:
>>>>
>>>> my_data = LOAD 'hbase://table1' using
>>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1') ;dump my_data
>>>>
>>>> but when i try to store data into hbase as same way the job was failure.
>>>>
>>>> store my_data into 'hbase://table2' using
>>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1');
>>>>
>>>> the table1 and the table2 has same structure and same column.
>>>>
>>>>
>>>> the table i have:
>>>>
>>>> hbase(main):029:0* scan 'table1'
>>>> ROW                 COLUMN+CELL
>>>>   row1               column=cf:1, timestamp=1303731834050, value=value1
>>>>   row2               column=cf:1, timestamp=1303731849901, value=value2
>>>>   row3               column=cf:1, timestamp=1303731858637, value=value3
>>>> 3 row(s) in 0.0470 seconds
>>>>
>>>>
>>>> thanks
>>>>
>>>> Byambajargal
>>>>
>>>>
>>>>
>>

Re: How to store data into hbase by using Pig

Posted by Bill Graham <bi...@gmail.com>.

What version of Pig are you running and what errors are you seeing on
the task trackers?

On Tue, Apr 26, 2011 at 4:46 AM, byambajargal <by...@gmail.com> wrote:
> Hello ...
> I have a question for you
>
> I am doing a pig job as following that read from hdfs simply to store hbase
> when i start the job first part works fine and second part was failure.
> Could you give me a direction how to move data from hdfs to Hbase
>
>
>  A = load '/passwd' using PigStorage(':');B = foreach A generate $0 as id,
> $2 as value;dump B;
>  store B into 'table2' using
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'cf:a' , '-loadKey');
>
> thank you for your help
>
> Byambajargal
>
>
>
> On 4/25/11 18:26, Dmitriy Ryaboy wrote:
>>
>> The first element of the relation you store must be the row key. You
>> aren't
>> loading the row key, so load>  store isn't working.
>> Try
>> my_data = LOAD 'hbase://table1' using
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1', '-loadKey') ;
>>
>> On Mon, Apr 25, 2011 at 5:32 AM,
>> byambajargal<by...@gmail.com>wrote:
>>
>>> Hello guys
>>>
>>> I am running cloudere distribution cdh3u0 on my cluster with Pig and
>>> Hbase.
>>> i can read data from hbase using the following pig query:
>>>
>>> my_data = LOAD 'hbase://table1' using
>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1') ;dump my_data
>>>
>>> but when i try to store data into hbase as same way the job was failure.
>>>
>>> store my_data into 'hbase://table2' using
>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1');
>>>
>>> the table1 and the table2 has same structure and same column.
>>>
>>>
>>> the table i have:
>>>
>>> hbase(main):029:0* scan 'table1'
>>> ROW                 COLUMN+CELL
>>>  row1               column=cf:1, timestamp=1303731834050, value=value1
>>>  row2               column=cf:1, timestamp=1303731849901, value=value2
>>>  row3               column=cf:1, timestamp=1303731858637, value=value3
>>> 3 row(s) in 0.0470 seconds
>>>
>>>
>>> thanks
>>>
>>> Byambajargal
>>>
>>>
>>>
>
>

Re: How to store data into hbase by using Pig

Posted by byambajargal <by...@gmail.com>.

Hello ...
I have a question for you

I am doing a pig job as following that read from hdfs simply to store hbase
when i start the job first part works fine and second part was failure.
Could you give me a direction how to move data from hdfs to Hbase


  A = load '/passwd' using PigStorage(':');B = foreach A generate $0 as 
id, $2 as value;dump B;
  store B into 'table2' using  
org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'cf:a' , '-loadKey');

thank you for your help

Byambajargal



On 4/25/11 18:26, Dmitriy Ryaboy wrote:
> The first element of the relation you store must be the row key. You aren't
> loading the row key, so load>  store isn't working.
> Try
> my_data = LOAD 'hbase://table1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1', '-loadKey') ;
>
> On Mon, Apr 25, 2011 at 5:32 AM, byambajargal<by...@gmail.com>wrote:
>
>> Hello guys
>>
>> I am running cloudere distribution cdh3u0 on my cluster with Pig and Hbase.
>> i can read data from hbase using the following pig query:
>>
>> my_data = LOAD 'hbase://table1' using
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1') ;dump my_data
>>
>> but when i try to store data into hbase as same way the job was failure.
>>
>> store my_data into 'hbase://table2' using
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1');
>>
>> the table1 and the table2 has same structure and same column.
>>
>>
>> the table i have:
>>
>> hbase(main):029:0* scan 'table1'
>> ROW                 COLUMN+CELL
>>   row1               column=cf:1, timestamp=1303731834050, value=value1
>>   row2               column=cf:1, timestamp=1303731849901, value=value2
>>   row3               column=cf:1, timestamp=1303731858637, value=value3
>> 3 row(s) in 0.0470 seconds
>>
>>
>> thanks
>>
>> Byambajargal
>>
>>
>>

Re: How to store data into hbase by using Pig

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

The first element of the relation you store must be the row key. You aren't
loading the row key, so load > store isn't working.
Try
my_data = LOAD 'hbase://table1' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1', '-loadKey') ;

On Mon, Apr 25, 2011 at 5:32 AM, byambajargal <by...@gmail.com>wrote:

>
> Hello guys
>
> I am running cloudere distribution cdh3u0 on my cluster with Pig and Hbase.
> i can read data from hbase using the following pig query:
>
> my_data = LOAD 'hbase://table1' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1') ;dump my_data
>
> but when i try to store data into hbase as same way the job was failure.
>
> store my_data into 'hbase://table2' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1');
>
> the table1 and the table2 has same structure and same column.
>
>
> the table i have:
>
> hbase(main):029:0* scan 'table1'
> ROW                 COLUMN+CELL
>  row1               column=cf:1, timestamp=1303731834050, value=value1
>  row2               column=cf:1, timestamp=1303731849901, value=value2
>  row3               column=cf:1, timestamp=1303731858637, value=value3
> 3 row(s) in 0.0470 seconds
>
>
> thanks
>
> Byambajargal
>
>
>