You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chris Diehl <cp...@gmail.com> on 2012/05/16 20:47:35 UTC

Problem loading sequence files with Elephant Bird

Hi All,

I'm attempting to load sequence files for the first using Elephant Bird's
sequence file loader and having absolutely no luck.

I did a hadoop fs -text one on of the sequence files and noticed all the
keys are (null). Not sure if that is throwing off things here.

Here are various approaches I've tried that all have failed.

REGISTER
'/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
%declare SEQFILE_LOADER
'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
%declare NULL_CONVERTER
'com.twitter.elephantbird.pig.util.NullWritableConverter'

raw_logs = LOAD
'/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
$SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
bytearray, value: chararray);
--raw_logs = LOAD
'/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
$SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
chararray, value: chararray);
--raw_logs = LOAD
'/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
$SEQFILE_LOADER ();

STORE raw_logs INTO '/data/SearchLogJSON/';

Any thoughts on what might be the problem? Anything else I should try? I'm
totally out of ideas.

Appreciate any pointers!

Chris

Re: Problem loading sequence files with Elephant Bird

Posted by Raghu Angadi <an...@gmail.com>.
'AS' is almost always dangerous. The loader already has a schema. Use a
projection if you want to rename them.

On Fri, May 18, 2012 at 4:07 PM, Chris Diehl <cp...@gmail.com> wrote:

> With a little bit of luck, we managed to find an answer.
>
> Turns out we needed to remove the cast from key and run the script in Pig
> 0.10. I was running the script with Pig 0.8.1 up until today.
>
> raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
> $NULL_CONVERTER','-c $TEXT_CONVERTER')
>     AS (key, value: chararray);
>
> Chris
>
> On Fri, May 18, 2012 at 2:27 PM, Chris Diehl <cp...@gmail.com> wrote:
>
> > Hi Andy,
> >
> > Here's what is in the log file.
> >
> > Pig Stack Trace
> > ---------------
> > ERROR 2244: Job failed, hadoop does not return any error message
> >
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job
> > failed, hadoop does not return any error message
> > at
> > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119)
> >  at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> >  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
> > at org.apache.pig.Main.run(Main.java:500)
> >  at org.apache.pig.Main.main(Main.java:107)
> >
> >
> ================================================================================
> >
> > I am running it on the cluster. I could not find any additional
> > information on the job tracker.
> >
> > The keys in the sequence files are all null. The values are all JSON
> > strings. Given that information, I tried configuring the
> SequenceFileLoader
> > this way to no avail.
> >
> > %declare SEQFILE_LOADER
> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > %declare TEXT_CONVERTER
> 'com.twitter.elephantbird.pig.util.TextConverter';
> > %declare NULL_CONVERTER
> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >
> > raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
> > $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key: chararray, value:
> > chararray);
> >
> > Is there another way I should be configuring it?
> >
> > Chris
> >
> > On Fri, May 18, 2012 at 11:24 AM, Andy Schlaikjer <
> > andrew.schlaikjer@gmail.com> wrote:
> >
> >> Chris, the console output mentions file "/opt/shared_storage/log_
> >> analysis_pig_python_scripts/pig_1337299061301.log". Does this contain
> any
> >> kind of stack trace? Were you running the script in local mode or on a
> >> cluster? If the latter, there should be at least map task log output
> >> someplace that may also have some clues.
> >>
> >> Does path
> >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> >> contain SequenceFile<Text, Text> data? If not, you'll have to configure
> >> SequenceFileLoader further to properly deserialize the key-value pairs.
> >>
> >> Andy
> >>
> >>
> >> On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <cp...@gmail.com> wrote:
> >>
> >> > Andy,
> >> >
> >> > Here's what I'm seeing when I run the following script. There's no
> >> > information beyond what is here in the log file.
> >> >
> >> > Chris
> >> >
> >> > REGISTER
> >> >
> >>
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> >> > %declare SEQFILE_LOADER
> >> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> >> > %declare TEXT_CONVERTER
> >> 'com.twitter.elephantbird.pig.util.TextConverter';
> >> > %declare NULL_CONVERTER
> >> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >> >
> >> > rmf /data/SearchLogJSON;
> >> >
> >> > -- Load raw log data
> >> > raw_logs = LOAD
> >> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> >> USING
> >> > $SEQFILE_LOADER ();
> >> >
> >> > -- Store the JSON
> >> > STORE raw_logs INTO '/data/SearchLogJSON/';
> >> >
> >> > -------------------
> >> >
> >> > -sh-3.2$ pig dump_log_json.pig
> >> > 2012-05-17 23:57:41,304 [main] INFO  org.apache.pig.Main - Logging
> error
> >> > messages to:
> >> >
> >>
> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> >> > 2012-05-17 23:57:41,586 [main] INFO
> >> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> >> > Connecting to hadoop file system at: XXX
> >> > 2012-05-17 23:57:41,932 [main] INFO
> >> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> >> > Connecting to map-reduce job tracker at: XXX
> >> > 2012-05-17 23:57:42,204 [main] INFO
> >> >  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> >> > script: UNKNOWN
> >> > 2012-05-17 23:57:42,204 [main] INFO
> >> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> >> > pig.usenewlogicalplan is set to true. New logical plan will be used.
> >> > 2012-05-17 23:57:42,301 [main] INFO
> >> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> (Name:
> >> > raw_logs:
> Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
> >> > scope-1 Operator Key: scope-1)
> >> > 2012-05-17 23:57:42,317 [main] INFO
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> >> > File concatenation threshold: 100 optimistic? false
> >> > 2012-05-17 23:57:42,349 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> >> > - MR plan size before optimization: 1
> >> > 2012-05-17 23:57:42,349 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> >> > - MR plan size after optimization: 1
> >> > 2012-05-17 23:57:42,529 [main] INFO
> >> >  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
> >> added
> >> > to the job
> >> > 2012-05-17 23:57:42,545 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> >> > - mapred.job.reduce.markreset.buffer.percent is not set, set to
> default
> >> 0.3
> >> > 2012-05-17 23:57:44,706 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> >> > - Setting up single store job
> >> > 2012-05-17 23:57:44,734 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - 1 map-reduce job(s) waiting for submission.
> >> > 2012-05-17 23:57:45,053 [Thread-4] INFO
> >> >  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
> >> paths
> >> > to process : 1
> >> > 2012-05-17 23:57:45,057 [Thread-4] INFO
> >> >  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> >> > input paths (combined) to process : 1
> >> > 2012-05-17 23:57:45,236 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - 0% complete
> >> > 2012-05-17 23:57:45,849 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - HadoopJobId: job_201205170527_0003
> >> > 2012-05-17 23:57:45,849 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - More information at: XXX
> >> > 2012-05-17 23:58:25,816 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - job job_201205170527_0003 has failed! Stop running all dependent
> jobs
> >> > 2012-05-17 23:58:25,821 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - 100% complete
> >> > 2012-05-17 23:58:25,824 [main] ERROR
> >> > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> failed!
> >> > 2012-05-17 23:58:25,825 [main] INFO
> >>  org.apache.pig.tools.pigstats.PigStats
> >> > - Script Statistics:
> >> >
> >> > HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> >> > 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
> >> > 23:58:25 UNKNOWN
> >> >
> >> > Failed!
> >> >
> >> > Failed Jobs:
> >> > JobId Alias Feature Message Outputs
> >> > job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error -
> NA
> >> > /data/SearchLogJSON,
> >> >
> >> > Input(s):
> >> > Failed to read data from
> >> > "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"
> >> >
> >> > Output(s):
> >> > Failed to produce result in "/data/SearchLogJSON"
> >> >
> >> > Counters:
> >> > Total records written : 0
> >> > Total bytes written : 0
> >> > Spillable Memory Manager spill count : 0
> >> > Total bags proactively spilled: 0
> >> > Total records proactively spilled: 0
> >> >
> >> > Job DAG:
> >> > job_201205170527_0003
> >> >
> >> >
> >> > 2012-05-17 23:58:25,825 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - Failed!
> >> > 2012-05-17 23:58:25,831 [main] ERROR
> >> org.apache.pig.tools.grunt.GruntParser
> >> > - ERROR 2244: Job failed, hadoop does not return any error message
> >> > Details at logfile:
> >> >
> >>
> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> >> >
> >> >
> >> >
> >> > On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
> >> > andrew.schlaikjer@gmail.com> wrote:
> >> >
> >> > > Chris, could you send us any of your error logs? What kind of
> failures
> >> > are
> >> > > you running into?
> >> > >
> >> > > Andy
> >> > >
> >> > >
> >> > > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Hi All,
> >> > > >
> >> > > > I'm attempting to load sequence files for the first using Elephant
> >> > Bird's
> >> > > > sequence file loader and having absolutely no luck.
> >> > > >
> >> > > > I did a hadoop fs -text one on of the sequence files and noticed
> all
> >> > the
> >> > > > keys are (null). Not sure if that is throwing off things here.
> >> > > >
> >> > > > Here are various approaches I've tried that all have failed.
> >> > > >
> >> > > > REGISTER
> >> > > >
> >> > >
> >> >
> >>
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> >> > > > %declare SEQFILE_LOADER
> >> > > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> >> > > > %declare TEXT_CONVERTER
> >> > > 'com.twitter.elephantbird.pig.util.TextConverter';
> >> > > > %declare NULL_CONVERTER
> >> > > > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >> > > >
> >> > > > raw_logs = LOAD
> >> > > >
> >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> >> > > USING
> >> > > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS
> (key:
> >> > > > bytearray, value: chararray);
> >> > > > --raw_logs = LOAD
> >> > > >
> >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> >> > > USING
> >> > > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS
> (key:
> >> > > > chararray, value: chararray);
> >> > > > --raw_logs = LOAD
> >> > > >
> >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> >> > > USING
> >> > > > $SEQFILE_LOADER ();
> >> > > >
> >> > > > STORE raw_logs INTO '/data/SearchLogJSON/';
> >> > > >
> >> > > > Any thoughts on what might be the problem? Anything else I should
> >> try?
> >> > > I'm
> >> > > > totally out of ideas.
> >> > > >
> >> > > > Appreciate any pointers!
> >> > > >
> >> > > > Chris
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Problem loading sequence files with Elephant Bird

Posted by Chris Diehl <cp...@gmail.com>.
With a little bit of luck, we managed to find an answer.

Turns out we needed to remove the cast from key and run the script in Pig
0.10. I was running the script with Pig 0.8.1 up until today.

raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
$NULL_CONVERTER','-c $TEXT_CONVERTER')
    AS (key, value: chararray);

Chris

On Fri, May 18, 2012 at 2:27 PM, Chris Diehl <cp...@gmail.com> wrote:

> Hi Andy,
>
> Here's what is in the log file.
>
> Pig Stack Trace
> ---------------
> ERROR 2244: Job failed, hadoop does not return any error message
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job
> failed, hadoop does not return any error message
> at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119)
>  at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
>  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
> at org.apache.pig.Main.run(Main.java:500)
>  at org.apache.pig.Main.main(Main.java:107)
>
> ================================================================================
>
> I am running it on the cluster. I could not find any additional
> information on the job tracker.
>
> The keys in the sequence files are all null. The values are all JSON
> strings. Given that information, I tried configuring the SequenceFileLoader
> this way to no avail.
>
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
> %declare NULL_CONVERTER
> 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>
> raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
> $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key: chararray, value:
> chararray);
>
> Is there another way I should be configuring it?
>
> Chris
>
> On Fri, May 18, 2012 at 11:24 AM, Andy Schlaikjer <
> andrew.schlaikjer@gmail.com> wrote:
>
>> Chris, the console output mentions file "/opt/shared_storage/log_
>> analysis_pig_python_scripts/pig_1337299061301.log". Does this contain any
>> kind of stack trace? Were you running the script in local mode or on a
>> cluster? If the latter, there should be at least map task log output
>> someplace that may also have some clues.
>>
>> Does path
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> contain SequenceFile<Text, Text> data? If not, you'll have to configure
>> SequenceFileLoader further to properly deserialize the key-value pairs.
>>
>> Andy
>>
>>
>> On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <cp...@gmail.com> wrote:
>>
>> > Andy,
>> >
>> > Here's what I'm seeing when I run the following script. There's no
>> > information beyond what is here in the log file.
>> >
>> > Chris
>> >
>> > REGISTER
>> >
>> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
>> > %declare SEQFILE_LOADER
>> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
>> > %declare TEXT_CONVERTER
>> 'com.twitter.elephantbird.pig.util.TextConverter';
>> > %declare NULL_CONVERTER
>> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>> >
>> > rmf /data/SearchLogJSON;
>> >
>> > -- Load raw log data
>> > raw_logs = LOAD
>> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> USING
>> > $SEQFILE_LOADER ();
>> >
>> > -- Store the JSON
>> > STORE raw_logs INTO '/data/SearchLogJSON/';
>> >
>> > -------------------
>> >
>> > -sh-3.2$ pig dump_log_json.pig
>> > 2012-05-17 23:57:41,304 [main] INFO  org.apache.pig.Main - Logging error
>> > messages to:
>> >
>> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
>> > 2012-05-17 23:57:41,586 [main] INFO
>> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> > Connecting to hadoop file system at: XXX
>> > 2012-05-17 23:57:41,932 [main] INFO
>> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> > Connecting to map-reduce job tracker at: XXX
>> > 2012-05-17 23:57:42,204 [main] INFO
>> >  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>> > script: UNKNOWN
>> > 2012-05-17 23:57:42,204 [main] INFO
>> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> > pig.usenewlogicalplan is set to true. New logical plan will be used.
>> > 2012-05-17 23:57:42,301 [main] INFO
>> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
>> > raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
>> > scope-1 Operator Key: scope-1)
>> > 2012-05-17 23:57:42,317 [main] INFO
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
>> > File concatenation threshold: 100 optimistic? false
>> > 2012-05-17 23:57:42,349 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>> > - MR plan size before optimization: 1
>> > 2012-05-17 23:57:42,349 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>> > - MR plan size after optimization: 1
>> > 2012-05-17 23:57:42,529 [main] INFO
>> >  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
>> added
>> > to the job
>> > 2012-05-17 23:57:42,545 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - mapred.job.reduce.markreset.buffer.percent is not set, set to default
>> 0.3
>> > 2012-05-17 23:57:44,706 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - Setting up single store job
>> > 2012-05-17 23:57:44,734 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 1 map-reduce job(s) waiting for submission.
>> > 2012-05-17 23:57:45,053 [Thread-4] INFO
>> >  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
>> paths
>> > to process : 1
>> > 2012-05-17 23:57:45,057 [Thread-4] INFO
>> >  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>> > input paths (combined) to process : 1
>> > 2012-05-17 23:57:45,236 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 0% complete
>> > 2012-05-17 23:57:45,849 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - HadoopJobId: job_201205170527_0003
>> > 2012-05-17 23:57:45,849 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - More information at: XXX
>> > 2012-05-17 23:58:25,816 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - job job_201205170527_0003 has failed! Stop running all dependent jobs
>> > 2012-05-17 23:58:25,821 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 100% complete
>> > 2012-05-17 23:58:25,824 [main] ERROR
>> > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>> > 2012-05-17 23:58:25,825 [main] INFO
>>  org.apache.pig.tools.pigstats.PigStats
>> > - Script Statistics:
>> >
>> > HadoopVersion PigVersion UserId StartedAt FinishedAt Features
>> > 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
>> > 23:58:25 UNKNOWN
>> >
>> > Failed!
>> >
>> > Failed Jobs:
>> > JobId Alias Feature Message Outputs
>> > job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA
>> > /data/SearchLogJSON,
>> >
>> > Input(s):
>> > Failed to read data from
>> > "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"
>> >
>> > Output(s):
>> > Failed to produce result in "/data/SearchLogJSON"
>> >
>> > Counters:
>> > Total records written : 0
>> > Total bytes written : 0
>> > Spillable Memory Manager spill count : 0
>> > Total bags proactively spilled: 0
>> > Total records proactively spilled: 0
>> >
>> > Job DAG:
>> > job_201205170527_0003
>> >
>> >
>> > 2012-05-17 23:58:25,825 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - Failed!
>> > 2012-05-17 23:58:25,831 [main] ERROR
>> org.apache.pig.tools.grunt.GruntParser
>> > - ERROR 2244: Job failed, hadoop does not return any error message
>> > Details at logfile:
>> >
>> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
>> >
>> >
>> >
>> > On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
>> > andrew.schlaikjer@gmail.com> wrote:
>> >
>> > > Chris, could you send us any of your error logs? What kind of failures
>> > are
>> > > you running into?
>> > >
>> > > Andy
>> > >
>> > >
>> > > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com>
>> wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I'm attempting to load sequence files for the first using Elephant
>> > Bird's
>> > > > sequence file loader and having absolutely no luck.
>> > > >
>> > > > I did a hadoop fs -text one on of the sequence files and noticed all
>> > the
>> > > > keys are (null). Not sure if that is throwing off things here.
>> > > >
>> > > > Here are various approaches I've tried that all have failed.
>> > > >
>> > > > REGISTER
>> > > >
>> > >
>> >
>> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
>> > > > %declare SEQFILE_LOADER
>> > > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
>> > > > %declare TEXT_CONVERTER
>> > > 'com.twitter.elephantbird.pig.util.TextConverter';
>> > > > %declare NULL_CONVERTER
>> > > > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>> > > >
>> > > > raw_logs = LOAD
>> > > >
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> > > USING
>> > > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
>> > > > bytearray, value: chararray);
>> > > > --raw_logs = LOAD
>> > > >
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> > > USING
>> > > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
>> > > > chararray, value: chararray);
>> > > > --raw_logs = LOAD
>> > > >
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> > > USING
>> > > > $SEQFILE_LOADER ();
>> > > >
>> > > > STORE raw_logs INTO '/data/SearchLogJSON/';
>> > > >
>> > > > Any thoughts on what might be the problem? Anything else I should
>> try?
>> > > I'm
>> > > > totally out of ideas.
>> > > >
>> > > > Appreciate any pointers!
>> > > >
>> > > > Chris
>> > > >
>> > >
>> >
>>
>
>

Re: Problem loading sequence files with Elephant Bird

Posted by Chris Diehl <cp...@gmail.com>.
Hi Andy,

Here's what is in the log file.

Pig Stack Trace
---------------
ERROR 2244: Job failed, hadoop does not return any error message

org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job
failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119)
 at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:500)
 at org.apache.pig.Main.main(Main.java:107)
================================================================================

I am running it on the cluster. I could not find any additional information
on the job tracker.

The keys in the sequence files are all null. The values are all JSON
strings. Given that information, I tried configuring the SequenceFileLoader
this way to no avail.

%declare SEQFILE_LOADER
'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
%declare NULL_CONVERTER
'com.twitter.elephantbird.pig.util.NullWritableConverter'

raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
$NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key: chararray, value:
chararray);

Is there another way I should be configuring it?

Chris

On Fri, May 18, 2012 at 11:24 AM, Andy Schlaikjer <
andrew.schlaikjer@gmail.com> wrote:

> Chris, the console output mentions file "/opt/shared_storage/log_
> analysis_pig_python_scripts/pig_1337299061301.log". Does this contain any
> kind of stack trace? Were you running the script in local mode or on a
> cluster? If the latter, there should be at least map task log output
> someplace that may also have some clues.
>
> Does path
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> contain SequenceFile<Text, Text> data? If not, you'll have to configure
> SequenceFileLoader further to properly deserialize the key-value pairs.
>
> Andy
>
>
> On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <cp...@gmail.com> wrote:
>
> > Andy,
> >
> > Here's what I'm seeing when I run the following script. There's no
> > information beyond what is here in the log file.
> >
> > Chris
> >
> > REGISTER
> >
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> > %declare SEQFILE_LOADER
> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > %declare TEXT_CONVERTER
> 'com.twitter.elephantbird.pig.util.TextConverter';
> > %declare NULL_CONVERTER
> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >
> > rmf /data/SearchLogJSON;
> >
> > -- Load raw log data
> > raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ();
> >
> > -- Store the JSON
> > STORE raw_logs INTO '/data/SearchLogJSON/';
> >
> > -------------------
> >
> > -sh-3.2$ pig dump_log_json.pig
> > 2012-05-17 23:57:41,304 [main] INFO  org.apache.pig.Main - Logging error
> > messages to:
> > /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> > 2012-05-17 23:57:41,586 [main] INFO
> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> > Connecting to hadoop file system at: XXX
> > 2012-05-17 23:57:41,932 [main] INFO
> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> > Connecting to map-reduce job tracker at: XXX
> > 2012-05-17 23:57:42,204 [main] INFO
> >  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> > script: UNKNOWN
> > 2012-05-17 23:57:42,204 [main] INFO
> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> > pig.usenewlogicalplan is set to true. New logical plan will be used.
> > 2012-05-17 23:57:42,301 [main] INFO
> >  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> > raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
> > scope-1 Operator Key: scope-1)
> > 2012-05-17 23:57:42,317 [main] INFO
> >  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
> -
> > File concatenation threshold: 100 optimistic? false
> > 2012-05-17 23:57:42,349 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> > - MR plan size before optimization: 1
> > 2012-05-17 23:57:42,349 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> > - MR plan size after optimization: 1
> > 2012-05-17 23:57:42,529 [main] INFO
> >  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
> added
> > to the job
> > 2012-05-17 23:57:42,545 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - mapred.job.reduce.markreset.buffer.percent is not set, set to default
> 0.3
> > 2012-05-17 23:57:44,706 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Setting up single store job
> > 2012-05-17 23:57:44,734 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 1 map-reduce job(s) waiting for submission.
> > 2012-05-17 23:57:45,053 [Thread-4] INFO
> >  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
> paths
> > to process : 1
> > 2012-05-17 23:57:45,057 [Thread-4] INFO
> >  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> > input paths (combined) to process : 1
> > 2012-05-17 23:57:45,236 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 0% complete
> > 2012-05-17 23:57:45,849 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - HadoopJobId: job_201205170527_0003
> > 2012-05-17 23:57:45,849 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - More information at: XXX
> > 2012-05-17 23:58:25,816 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - job job_201205170527_0003 has failed! Stop running all dependent jobs
> > 2012-05-17 23:58:25,821 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 100% complete
> > 2012-05-17 23:58:25,824 [main] ERROR
> > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> > 2012-05-17 23:58:25,825 [main] INFO
>  org.apache.pig.tools.pigstats.PigStats
> > - Script Statistics:
> >
> > HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> > 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
> > 23:58:25 UNKNOWN
> >
> > Failed!
> >
> > Failed Jobs:
> > JobId Alias Feature Message Outputs
> > job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA
> > /data/SearchLogJSON,
> >
> > Input(s):
> > Failed to read data from
> > "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"
> >
> > Output(s):
> > Failed to produce result in "/data/SearchLogJSON"
> >
> > Counters:
> > Total records written : 0
> > Total bytes written : 0
> > Spillable Memory Manager spill count : 0
> > Total bags proactively spilled: 0
> > Total records proactively spilled: 0
> >
> > Job DAG:
> > job_201205170527_0003
> >
> >
> > 2012-05-17 23:58:25,825 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Failed!
> > 2012-05-17 23:58:25,831 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser
> > - ERROR 2244: Job failed, hadoop does not return any error message
> > Details at logfile:
> > /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> >
> >
> >
> > On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
> > andrew.schlaikjer@gmail.com> wrote:
> >
> > > Chris, could you send us any of your error logs? What kind of failures
> > are
> > > you running into?
> > >
> > > Andy
> > >
> > >
> > > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com>
> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'm attempting to load sequence files for the first using Elephant
> > Bird's
> > > > sequence file loader and having absolutely no luck.
> > > >
> > > > I did a hadoop fs -text one on of the sequence files and noticed all
> > the
> > > > keys are (null). Not sure if that is throwing off things here.
> > > >
> > > > Here are various approaches I've tried that all have failed.
> > > >
> > > > REGISTER
> > > >
> > >
> >
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> > > > %declare SEQFILE_LOADER
> > > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > > > %declare TEXT_CONVERTER
> > > 'com.twitter.elephantbird.pig.util.TextConverter';
> > > > %declare NULL_CONVERTER
> > > > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> > > >
> > > > raw_logs = LOAD
> > > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > > USING
> > > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > > > bytearray, value: chararray);
> > > > --raw_logs = LOAD
> > > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > > USING
> > > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > > > chararray, value: chararray);
> > > > --raw_logs = LOAD
> > > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > > USING
> > > > $SEQFILE_LOADER ();
> > > >
> > > > STORE raw_logs INTO '/data/SearchLogJSON/';
> > > >
> > > > Any thoughts on what might be the problem? Anything else I should
> try?
> > > I'm
> > > > totally out of ideas.
> > > >
> > > > Appreciate any pointers!
> > > >
> > > > Chris
> > > >
> > >
> >
>

Re: Problem loading sequence files with Elephant Bird

Posted by Andy Schlaikjer <an...@gmail.com>.
Chris, the console output mentions file "/opt/shared_storage/log_
analysis_pig_python_scripts/pig_1337299061301.log". Does this contain any
kind of stack trace? Were you running the script in local mode or on a
cluster? If the latter, there should be at least map task log output
someplace that may also have some clues.

Does path '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
contain SequenceFile<Text, Text> data? If not, you'll have to configure
SequenceFileLoader further to properly deserialize the key-value pairs.

Andy


On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <cp...@gmail.com> wrote:

> Andy,
>
> Here's what I'm seeing when I run the following script. There's no
> information beyond what is here in the log file.
>
> Chris
>
> REGISTER
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
> %declare NULL_CONVERTER
> 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>
> rmf /data/SearchLogJSON;
>
> -- Load raw log data
> raw_logs = LOAD
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
> $SEQFILE_LOADER ();
>
> -- Store the JSON
> STORE raw_logs INTO '/data/SearchLogJSON/';
>
> -------------------
>
> -sh-3.2$ pig dump_log_json.pig
> 2012-05-17 23:57:41,304 [main] INFO  org.apache.pig.Main - Logging error
> messages to:
> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> 2012-05-17 23:57:41,586 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: XXX
> 2012-05-17 23:57:41,932 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to map-reduce job tracker at: XXX
> 2012-05-17 23:57:42,204 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: UNKNOWN
> 2012-05-17 23:57:42,204 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2012-05-17 23:57:42,301 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
> scope-1 Operator Key: scope-1)
> 2012-05-17 23:57:42,317 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2012-05-17 23:57:42,349 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2012-05-17 23:57:42,349 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2012-05-17 23:57:42,529 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> to the job
> 2012-05-17 23:57:42,545 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-05-17 23:57:44,706 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2012-05-17 23:57:44,734 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2012-05-17 23:57:45,053 [Thread-4] INFO
>  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 2012-05-17 23:57:45,057 [Thread-4] INFO
>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths (combined) to process : 1
> 2012-05-17 23:57:45,236 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2012-05-17 23:57:45,849 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_201205170527_0003
> 2012-05-17 23:57:45,849 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - More information at: XXX
> 2012-05-17 23:58:25,816 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_201205170527_0003 has failed! Stop running all dependent jobs
> 2012-05-17 23:58:25,821 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2012-05-17 23:58:25,824 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2012-05-17 23:58:25,825 [main] INFO  org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
>
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
> 23:58:25 UNKNOWN
>
> Failed!
>
> Failed Jobs:
> JobId Alias Feature Message Outputs
> job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA
> /data/SearchLogJSON,
>
> Input(s):
> Failed to read data from
> "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"
>
> Output(s):
> Failed to produce result in "/data/SearchLogJSON"
>
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
>
> Job DAG:
> job_201205170527_0003
>
>
> 2012-05-17 23:58:25,825 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2012-05-17 23:58:25,831 [main] ERROR org.apache.pig.tools.grunt.GruntParser
> - ERROR 2244: Job failed, hadoop does not return any error message
> Details at logfile:
> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
>
>
>
> On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
> andrew.schlaikjer@gmail.com> wrote:
>
> > Chris, could you send us any of your error logs? What kind of failures
> are
> > you running into?
> >
> > Andy
> >
> >
> > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I'm attempting to load sequence files for the first using Elephant
> Bird's
> > > sequence file loader and having absolutely no luck.
> > >
> > > I did a hadoop fs -text one on of the sequence files and noticed all
> the
> > > keys are (null). Not sure if that is throwing off things here.
> > >
> > > Here are various approaches I've tried that all have failed.
> > >
> > > REGISTER
> > >
> >
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> > > %declare SEQFILE_LOADER
> > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > > %declare TEXT_CONVERTER
> > 'com.twitter.elephantbird.pig.util.TextConverter';
> > > %declare NULL_CONVERTER
> > > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> > >
> > > raw_logs = LOAD
> > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > USING
> > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > > bytearray, value: chararray);
> > > --raw_logs = LOAD
> > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > USING
> > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > > chararray, value: chararray);
> > > --raw_logs = LOAD
> > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > USING
> > > $SEQFILE_LOADER ();
> > >
> > > STORE raw_logs INTO '/data/SearchLogJSON/';
> > >
> > > Any thoughts on what might be the problem? Anything else I should try?
> > I'm
> > > totally out of ideas.
> > >
> > > Appreciate any pointers!
> > >
> > > Chris
> > >
> >
>

Re: Problem loading sequence files with Elephant Bird

Posted by Chris Diehl <cp...@gmail.com>.
Andy,

Here's what I'm seeing when I run the following script. There's no
information beyond what is here in the log file.

Chris

REGISTER
'/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
%declare SEQFILE_LOADER
'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
%declare NULL_CONVERTER
'com.twitter.elephantbird.pig.util.NullWritableConverter'

rmf /data/SearchLogJSON;

-- Load raw log data
raw_logs = LOAD
'/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
$SEQFILE_LOADER ();

-- Store the JSON
STORE raw_logs INTO '/data/SearchLogJSON/';

-------------------

-sh-3.2$ pig dump_log_json.pig
2012-05-17 23:57:41,304 [main] INFO  org.apache.pig.Main - Logging error
messages to:
/opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
2012-05-17 23:57:41,586 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: XXX
2012-05-17 23:57:41,932 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to map-reduce job tracker at: XXX
2012-05-17 23:57:42,204 [main] INFO
 org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2012-05-17 23:57:42,204 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2012-05-17 23:57:42,301 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
scope-1 Operator Key: scope-1)
2012-05-17 23:57:42,317 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-05-17 23:57:42,349 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-05-17 23:57:42,349 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2012-05-17 23:57:42,529 [main] INFO
 org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2012-05-17 23:57:42,545 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-05-17 23:57:44,706 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2012-05-17 23:57:44,734 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2012-05-17 23:57:45,053 [Thread-4] INFO
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2012-05-17 23:57:45,057 [Thread-4] INFO
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2012-05-17 23:57:45,236 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2012-05-17 23:57:45,849 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201205170527_0003
2012-05-17 23:57:45,849 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at: XXX
2012-05-17 23:58:25,816 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_201205170527_0003 has failed! Stop running all dependent jobs
2012-05-17 23:58:25,821 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2012-05-17 23:58:25,824 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2012-05-17 23:58:25,825 [main] INFO  org.apache.pig.tools.pigstats.PigStats
- Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
23:58:25 UNKNOWN

Failed!

Failed Jobs:
JobId Alias Feature Message Outputs
job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA
/data/SearchLogJSON,

Input(s):
Failed to read data from
"/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"

Output(s):
Failed to produce result in "/data/SearchLogJSON"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201205170527_0003


2012-05-17 23:58:25,825 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2012-05-17 23:58:25,831 [main] ERROR org.apache.pig.tools.grunt.GruntParser
- ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile:
/opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log



On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
andrew.schlaikjer@gmail.com> wrote:

> Chris, could you send us any of your error logs? What kind of failures are
> you running into?
>
> Andy
>
>
> On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com> wrote:
>
> > Hi All,
> >
> > I'm attempting to load sequence files for the first using Elephant Bird's
> > sequence file loader and having absolutely no luck.
> >
> > I did a hadoop fs -text one on of the sequence files and noticed all the
> > keys are (null). Not sure if that is throwing off things here.
> >
> > Here are various approaches I've tried that all have failed.
> >
> > REGISTER
> >
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> > %declare SEQFILE_LOADER
> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > %declare TEXT_CONVERTER
> 'com.twitter.elephantbird.pig.util.TextConverter';
> > %declare NULL_CONVERTER
> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >
> > raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > bytearray, value: chararray);
> > --raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > chararray, value: chararray);
> > --raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ();
> >
> > STORE raw_logs INTO '/data/SearchLogJSON/';
> >
> > Any thoughts on what might be the problem? Anything else I should try?
> I'm
> > totally out of ideas.
> >
> > Appreciate any pointers!
> >
> > Chris
> >
>

Re: Problem loading sequence files with Elephant Bird

Posted by Andy Schlaikjer <an...@gmail.com>.
Chris, could you send us any of your error logs? What kind of failures are
you running into?

Andy


On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com> wrote:

> Hi All,
>
> I'm attempting to load sequence files for the first using Elephant Bird's
> sequence file loader and having absolutely no luck.
>
> I did a hadoop fs -text one on of the sequence files and noticed all the
> keys are (null). Not sure if that is throwing off things here.
>
> Here are various approaches I've tried that all have failed.
>
> REGISTER
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
> %declare NULL_CONVERTER
> 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>
> raw_logs = LOAD
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
> $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> bytearray, value: chararray);
> --raw_logs = LOAD
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
> $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> chararray, value: chararray);
> --raw_logs = LOAD
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
> $SEQFILE_LOADER ();
>
> STORE raw_logs INTO '/data/SearchLogJSON/';
>
> Any thoughts on what might be the problem? Anything else I should try? I'm
> totally out of ideas.
>
> Appreciate any pointers!
>
> Chris
>