You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Alex WANG <AW...@mdacorporation.com> on 2010/09/21 22:48:51 UTC
Problem with Pig Store command

Hi,

I am using pig 0.7.0 in hadoop mapreduce mode.

The problem I have is that I simply can't use

STORE INTO alias USING PigStorage();

I can load dataset in, write UDFs to manipulate the dataset, but I can't store it. The output is a directory in HDFS with 0 bytes.

As an example, I've been testing with a simple script:

W = load 'wordbag' using PigStorage(' ') as (f1:int, f2:int, name:chararray, type:chararray);
store W into 'wordtesting' using PigStorage(' ');

I run the code in grunt, and the output of hadoop fs -ls is:

drwxr-xr-x   - awang supergroup          0 2010-09-21 13:45 /user/awang/wordtesting

The grunt messages are:

grunt> store filteredW into 'wordtesting' using PigStorage(' ');
2010-09-21 13:45:35,210 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for W
2010-09-21 13:45:35,210 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for W
2010-09-21 13:45:35,440 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(hdfs://pineal:9000/user/awang/wordtesting:PigStorage(' ')) - 1-46 Operator Key: 1-46)
2010-09-21 13:45:35,498 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2010-09-21 13:45:35,498 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2010-09-21 13:45:35,549 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-09-21 13:45:38,100 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-09-21 13:45:38,166 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-09-21 13:45:38,173 [Thread-15] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-09-21 13:45:38,307 [Thread-15] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-09-21 13:45:38,307 [Thread-15] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-09-21 13:45:38,670 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201009211320_0002
2010-09-21 13:45:38,670 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://pineal:50030/jobdetails.jsp?jobid=job_201009211320_0002
2010-09-21 13:45:38,673 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2010-09-21 13:45:48,755 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2010-09-21 13:45:53,835 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2010-09-21 13:45:53,835 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Successfully stored result in: "hdfs://pineal:9000/user/awang/wordtesting"
2010-09-21 13:45:53,846 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Records written : 1
2010-09-21 13:45:53,846 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Bytes written : 20
2010-09-21 13:45:53,846 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Spillable Memory Manager spill count : 0
2010-09-21 13:45:53,847 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Proactive spill count : 0
2010-09-21 13:45:53,847 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!


I've been struggling with this for a long time.... It works if I have a one bytearray in my tuple, but once I defined my schema, it  no longer works.

Anyone has any idea? Please help!! Thanks!

Best regards,
Alex