You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chris Diehl <cp...@gmail.com> on 2012/05/16 20:47:35 UTC
Problem loading sequence files with Elephant Bird
Hi All,
I'm attempting to load sequence files for the first using Elephant Bird's
sequence file loader and having absolutely no luck.
I did a hadoop fs -text one on of the sequence files and noticed all the
keys are (null). Not sure if that is throwing off things here.
Here are various approaches I've tried that all have failed.
REGISTER
'/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
%declare SEQFILE_LOADER
'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
%declare NULL_CONVERTER
'com.twitter.elephantbird.pig.util.NullWritableConverter'
raw_logs = LOAD
'/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
$SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
bytearray, value: chararray);
--raw_logs = LOAD
'/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
$SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
chararray, value: chararray);
--raw_logs = LOAD
'/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
$SEQFILE_LOADER ();
STORE raw_logs INTO '/data/SearchLogJSON/';
Any thoughts on what might be the problem? Anything else I should try? I'm
totally out of ideas.
Appreciate any pointers!
Chris
Re: Problem loading sequence files with Elephant Bird
Posted by Raghu Angadi <an...@gmail.com>.
'AS' is almost always dangerous. The loader already has a schema. Use a
projection if you want to rename them.
On Fri, May 18, 2012 at 4:07 PM, Chris Diehl <cp...@gmail.com> wrote:
> With a little bit of luck, we managed to find an answer.
>
> Turns out we needed to remove the cast from key and run the script in Pig
> 0.10. I was running the script with Pig 0.8.1 up until today.
>
> raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
> $NULL_CONVERTER','-c $TEXT_CONVERTER')
> AS (key, value: chararray);
>
> Chris
>
> On Fri, May 18, 2012 at 2:27 PM, Chris Diehl <cp...@gmail.com> wrote:
>
> > Hi Andy,
> >
> > Here's what is in the log file.
> >
> > Pig Stack Trace
> > ---------------
> > ERROR 2244: Job failed, hadoop does not return any error message
> >
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job
> > failed, hadoop does not return any error message
> > at
> > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
> > at org.apache.pig.Main.run(Main.java:500)
> > at org.apache.pig.Main.main(Main.java:107)
> >
> >
> ================================================================================
> >
> > I am running it on the cluster. I could not find any additional
> > information on the job tracker.
> >
> > The keys in the sequence files are all null. The values are all JSON
> > strings. Given that information, I tried configuring the
> SequenceFileLoader
> > this way to no avail.
> >
> > %declare SEQFILE_LOADER
> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > %declare TEXT_CONVERTER
> 'com.twitter.elephantbird.pig.util.TextConverter';
> > %declare NULL_CONVERTER
> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >
> > raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
> > $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key: chararray, value:
> > chararray);
> >
> > Is there another way I should be configuring it?
> >
> > Chris
> >
> > On Fri, May 18, 2012 at 11:24 AM, Andy Schlaikjer <
> > andrew.schlaikjer@gmail.com> wrote:
> >
> >> Chris, the console output mentions file "/opt/shared_storage/log_
> >> analysis_pig_python_scripts/pig_1337299061301.log". Does this contain
> any
> >> kind of stack trace? Were you running the script in local mode or on a
> >> cluster? If the latter, there should be at least map task log output
> >> someplace that may also have some clues.
> >>
> >> Does path
> >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> >> contain SequenceFile<Text, Text> data? If not, you'll have to configure
> >> SequenceFileLoader further to properly deserialize the key-value pairs.
> >>
> >> Andy
> >>
> >>
> >> On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <cp...@gmail.com> wrote:
> >>
> >> > Andy,
> >> >
> >> > Here's what I'm seeing when I run the following script. There's no
> >> > information beyond what is here in the log file.
> >> >
> >> > Chris
> >> >
> >> > REGISTER
> >> >
> >>
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> >> > %declare SEQFILE_LOADER
> >> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> >> > %declare TEXT_CONVERTER
> >> 'com.twitter.elephantbird.pig.util.TextConverter';
> >> > %declare NULL_CONVERTER
> >> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >> >
> >> > rmf /data/SearchLogJSON;
> >> >
> >> > -- Load raw log data
> >> > raw_logs = LOAD
> >> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> >> USING
> >> > $SEQFILE_LOADER ();
> >> >
> >> > -- Store the JSON
> >> > STORE raw_logs INTO '/data/SearchLogJSON/';
> >> >
> >> > -------------------
> >> >
> >> > -sh-3.2$ pig dump_log_json.pig
> >> > 2012-05-17 23:57:41,304 [main] INFO org.apache.pig.Main - Logging
> error
> >> > messages to:
> >> >
> >>
> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> >> > 2012-05-17 23:57:41,586 [main] INFO
> >> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> >> > Connecting to hadoop file system at: XXX
> >> > 2012-05-17 23:57:41,932 [main] INFO
> >> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> >> > Connecting to map-reduce job tracker at: XXX
> >> > 2012-05-17 23:57:42,204 [main] INFO
> >> > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> >> > script: UNKNOWN
> >> > 2012-05-17 23:57:42,204 [main] INFO
> >> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> >> > pig.usenewlogicalplan is set to true. New logical plan will be used.
> >> > 2012-05-17 23:57:42,301 [main] INFO
> >> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> (Name:
> >> > raw_logs:
> Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
> >> > scope-1 Operator Key: scope-1)
> >> > 2012-05-17 23:57:42,317 [main] INFO
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> >> > File concatenation threshold: 100 optimistic? false
> >> > 2012-05-17 23:57:42,349 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> >> > - MR plan size before optimization: 1
> >> > 2012-05-17 23:57:42,349 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> >> > - MR plan size after optimization: 1
> >> > 2012-05-17 23:57:42,529 [main] INFO
> >> > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
> >> added
> >> > to the job
> >> > 2012-05-17 23:57:42,545 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> >> > - mapred.job.reduce.markreset.buffer.percent is not set, set to
> default
> >> 0.3
> >> > 2012-05-17 23:57:44,706 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> >> > - Setting up single store job
> >> > 2012-05-17 23:57:44,734 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - 1 map-reduce job(s) waiting for submission.
> >> > 2012-05-17 23:57:45,053 [Thread-4] INFO
> >> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
> >> paths
> >> > to process : 1
> >> > 2012-05-17 23:57:45,057 [Thread-4] INFO
> >> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> >> > input paths (combined) to process : 1
> >> > 2012-05-17 23:57:45,236 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - 0% complete
> >> > 2012-05-17 23:57:45,849 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - HadoopJobId: job_201205170527_0003
> >> > 2012-05-17 23:57:45,849 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - More information at: XXX
> >> > 2012-05-17 23:58:25,816 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - job job_201205170527_0003 has failed! Stop running all dependent
> jobs
> >> > 2012-05-17 23:58:25,821 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - 100% complete
> >> > 2012-05-17 23:58:25,824 [main] ERROR
> >> > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> failed!
> >> > 2012-05-17 23:58:25,825 [main] INFO
> >> org.apache.pig.tools.pigstats.PigStats
> >> > - Script Statistics:
> >> >
> >> > HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> >> > 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
> >> > 23:58:25 UNKNOWN
> >> >
> >> > Failed!
> >> >
> >> > Failed Jobs:
> >> > JobId Alias Feature Message Outputs
> >> > job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error -
> NA
> >> > /data/SearchLogJSON,
> >> >
> >> > Input(s):
> >> > Failed to read data from
> >> > "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"
> >> >
> >> > Output(s):
> >> > Failed to produce result in "/data/SearchLogJSON"
> >> >
> >> > Counters:
> >> > Total records written : 0
> >> > Total bytes written : 0
> >> > Spillable Memory Manager spill count : 0
> >> > Total bags proactively spilled: 0
> >> > Total records proactively spilled: 0
> >> >
> >> > Job DAG:
> >> > job_201205170527_0003
> >> >
> >> >
> >> > 2012-05-17 23:58:25,825 [main] INFO
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - Failed!
> >> > 2012-05-17 23:58:25,831 [main] ERROR
> >> org.apache.pig.tools.grunt.GruntParser
> >> > - ERROR 2244: Job failed, hadoop does not return any error message
> >> > Details at logfile:
> >> >
> >>
> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> >> >
> >> >
> >> >
> >> > On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
> >> > andrew.schlaikjer@gmail.com> wrote:
> >> >
> >> > > Chris, could you send us any of your error logs? What kind of
> failures
> >> > are
> >> > > you running into?
> >> > >
> >> > > Andy
> >> > >
> >> > >
> >> > > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Hi All,
> >> > > >
> >> > > > I'm attempting to load sequence files for the first using Elephant
> >> > Bird's
> >> > > > sequence file loader and having absolutely no luck.
> >> > > >
> >> > > > I did a hadoop fs -text one on of the sequence files and noticed
> all
> >> > the
> >> > > > keys are (null). Not sure if that is throwing off things here.
> >> > > >
> >> > > > Here are various approaches I've tried that all have failed.
> >> > > >
> >> > > > REGISTER
> >> > > >
> >> > >
> >> >
> >>
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> >> > > > %declare SEQFILE_LOADER
> >> > > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> >> > > > %declare TEXT_CONVERTER
> >> > > 'com.twitter.elephantbird.pig.util.TextConverter';
> >> > > > %declare NULL_CONVERTER
> >> > > > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >> > > >
> >> > > > raw_logs = LOAD
> >> > > >
> >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> >> > > USING
> >> > > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS
> (key:
> >> > > > bytearray, value: chararray);
> >> > > > --raw_logs = LOAD
> >> > > >
> >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> >> > > USING
> >> > > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS
> (key:
> >> > > > chararray, value: chararray);
> >> > > > --raw_logs = LOAD
> >> > > >
> >> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> >> > > USING
> >> > > > $SEQFILE_LOADER ();
> >> > > >
> >> > > > STORE raw_logs INTO '/data/SearchLogJSON/';
> >> > > >
> >> > > > Any thoughts on what might be the problem? Anything else I should
> >> try?
> >> > > I'm
> >> > > > totally out of ideas.
> >> > > >
> >> > > > Appreciate any pointers!
> >> > > >
> >> > > > Chris
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>
Re: Problem loading sequence files with Elephant Bird
Posted by Chris Diehl <cp...@gmail.com>.
With a little bit of luck, we managed to find an answer.
Turns out we needed to remove the cast from key and run the script in Pig
0.10. I was running the script with Pig 0.8.1 up until today.
raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
$NULL_CONVERTER','-c $TEXT_CONVERTER')
AS (key, value: chararray);
Chris
On Fri, May 18, 2012 at 2:27 PM, Chris Diehl <cp...@gmail.com> wrote:
> Hi Andy,
>
> Here's what is in the log file.
>
> Pig Stack Trace
> ---------------
> ERROR 2244: Job failed, hadoop does not return any error message
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job
> failed, hadoop does not return any error message
> at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
> at org.apache.pig.Main.run(Main.java:500)
> at org.apache.pig.Main.main(Main.java:107)
>
> ================================================================================
>
> I am running it on the cluster. I could not find any additional
> information on the job tracker.
>
> The keys in the sequence files are all null. The values are all JSON
> strings. Given that information, I tried configuring the SequenceFileLoader
> this way to no avail.
>
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
> %declare NULL_CONVERTER
> 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>
> raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
> $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key: chararray, value:
> chararray);
>
> Is there another way I should be configuring it?
>
> Chris
>
> On Fri, May 18, 2012 at 11:24 AM, Andy Schlaikjer <
> andrew.schlaikjer@gmail.com> wrote:
>
>> Chris, the console output mentions file "/opt/shared_storage/log_
>> analysis_pig_python_scripts/pig_1337299061301.log". Does this contain any
>> kind of stack trace? Were you running the script in local mode or on a
>> cluster? If the latter, there should be at least map task log output
>> someplace that may also have some clues.
>>
>> Does path
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> contain SequenceFile<Text, Text> data? If not, you'll have to configure
>> SequenceFileLoader further to properly deserialize the key-value pairs.
>>
>> Andy
>>
>>
>> On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <cp...@gmail.com> wrote:
>>
>> > Andy,
>> >
>> > Here's what I'm seeing when I run the following script. There's no
>> > information beyond what is here in the log file.
>> >
>> > Chris
>> >
>> > REGISTER
>> >
>> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
>> > %declare SEQFILE_LOADER
>> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
>> > %declare TEXT_CONVERTER
>> 'com.twitter.elephantbird.pig.util.TextConverter';
>> > %declare NULL_CONVERTER
>> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>> >
>> > rmf /data/SearchLogJSON;
>> >
>> > -- Load raw log data
>> > raw_logs = LOAD
>> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> USING
>> > $SEQFILE_LOADER ();
>> >
>> > -- Store the JSON
>> > STORE raw_logs INTO '/data/SearchLogJSON/';
>> >
>> > -------------------
>> >
>> > -sh-3.2$ pig dump_log_json.pig
>> > 2012-05-17 23:57:41,304 [main] INFO org.apache.pig.Main - Logging error
>> > messages to:
>> >
>> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
>> > 2012-05-17 23:57:41,586 [main] INFO
>> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> > Connecting to hadoop file system at: XXX
>> > 2012-05-17 23:57:41,932 [main] INFO
>> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> > Connecting to map-reduce job tracker at: XXX
>> > 2012-05-17 23:57:42,204 [main] INFO
>> > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>> > script: UNKNOWN
>> > 2012-05-17 23:57:42,204 [main] INFO
>> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> > pig.usenewlogicalplan is set to true. New logical plan will be used.
>> > 2012-05-17 23:57:42,301 [main] INFO
>> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
>> > raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
>> > scope-1 Operator Key: scope-1)
>> > 2012-05-17 23:57:42,317 [main] INFO
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
>> > File concatenation threshold: 100 optimistic? false
>> > 2012-05-17 23:57:42,349 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>> > - MR plan size before optimization: 1
>> > 2012-05-17 23:57:42,349 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>> > - MR plan size after optimization: 1
>> > 2012-05-17 23:57:42,529 [main] INFO
>> > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
>> added
>> > to the job
>> > 2012-05-17 23:57:42,545 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - mapred.job.reduce.markreset.buffer.percent is not set, set to default
>> 0.3
>> > 2012-05-17 23:57:44,706 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - Setting up single store job
>> > 2012-05-17 23:57:44,734 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 1 map-reduce job(s) waiting for submission.
>> > 2012-05-17 23:57:45,053 [Thread-4] INFO
>> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
>> paths
>> > to process : 1
>> > 2012-05-17 23:57:45,057 [Thread-4] INFO
>> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>> > input paths (combined) to process : 1
>> > 2012-05-17 23:57:45,236 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 0% complete
>> > 2012-05-17 23:57:45,849 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - HadoopJobId: job_201205170527_0003
>> > 2012-05-17 23:57:45,849 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - More information at: XXX
>> > 2012-05-17 23:58:25,816 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - job job_201205170527_0003 has failed! Stop running all dependent jobs
>> > 2012-05-17 23:58:25,821 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 100% complete
>> > 2012-05-17 23:58:25,824 [main] ERROR
>> > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>> > 2012-05-17 23:58:25,825 [main] INFO
>> org.apache.pig.tools.pigstats.PigStats
>> > - Script Statistics:
>> >
>> > HadoopVersion PigVersion UserId StartedAt FinishedAt Features
>> > 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
>> > 23:58:25 UNKNOWN
>> >
>> > Failed!
>> >
>> > Failed Jobs:
>> > JobId Alias Feature Message Outputs
>> > job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA
>> > /data/SearchLogJSON,
>> >
>> > Input(s):
>> > Failed to read data from
>> > "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"
>> >
>> > Output(s):
>> > Failed to produce result in "/data/SearchLogJSON"
>> >
>> > Counters:
>> > Total records written : 0
>> > Total bytes written : 0
>> > Spillable Memory Manager spill count : 0
>> > Total bags proactively spilled: 0
>> > Total records proactively spilled: 0
>> >
>> > Job DAG:
>> > job_201205170527_0003
>> >
>> >
>> > 2012-05-17 23:58:25,825 [main] INFO
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - Failed!
>> > 2012-05-17 23:58:25,831 [main] ERROR
>> org.apache.pig.tools.grunt.GruntParser
>> > - ERROR 2244: Job failed, hadoop does not return any error message
>> > Details at logfile:
>> >
>> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
>> >
>> >
>> >
>> > On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
>> > andrew.schlaikjer@gmail.com> wrote:
>> >
>> > > Chris, could you send us any of your error logs? What kind of failures
>> > are
>> > > you running into?
>> > >
>> > > Andy
>> > >
>> > >
>> > > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com>
>> wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > I'm attempting to load sequence files for the first using Elephant
>> > Bird's
>> > > > sequence file loader and having absolutely no luck.
>> > > >
>> > > > I did a hadoop fs -text one on of the sequence files and noticed all
>> > the
>> > > > keys are (null). Not sure if that is throwing off things here.
>> > > >
>> > > > Here are various approaches I've tried that all have failed.
>> > > >
>> > > > REGISTER
>> > > >
>> > >
>> >
>> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
>> > > > %declare SEQFILE_LOADER
>> > > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
>> > > > %declare TEXT_CONVERTER
>> > > 'com.twitter.elephantbird.pig.util.TextConverter';
>> > > > %declare NULL_CONVERTER
>> > > > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>> > > >
>> > > > raw_logs = LOAD
>> > > >
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> > > USING
>> > > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
>> > > > bytearray, value: chararray);
>> > > > --raw_logs = LOAD
>> > > >
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> > > USING
>> > > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
>> > > > chararray, value: chararray);
>> > > > --raw_logs = LOAD
>> > > >
>> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
>> > > USING
>> > > > $SEQFILE_LOADER ();
>> > > >
>> > > > STORE raw_logs INTO '/data/SearchLogJSON/';
>> > > >
>> > > > Any thoughts on what might be the problem? Anything else I should
>> try?
>> > > I'm
>> > > > totally out of ideas.
>> > > >
>> > > > Appreciate any pointers!
>> > > >
>> > > > Chris
>> > > >
>> > >
>> >
>>
>
>
Re: Problem loading sequence files with Elephant Bird
Posted by Chris Diehl <cp...@gmail.com>.
Hi Andy,
Here's what is in the log file.
Pig Stack Trace
---------------
ERROR 2244: Job failed, hadoop does not return any error message
org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job
failed, hadoop does not return any error message
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:119)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:500)
at org.apache.pig.Main.main(Main.java:107)
================================================================================
I am running it on the cluster. I could not find any additional information
on the job tracker.
The keys in the sequence files are all null. The values are all JSON
strings. Given that information, I tried configuring the SequenceFileLoader
this way to no avail.
%declare SEQFILE_LOADER
'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
%declare NULL_CONVERTER
'com.twitter.elephantbird.pig.util.NullWritableConverter'
raw_logs = LOAD '$INPUT_LOCATION' USING $SEQFILE_LOADER ('-c
$NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key: chararray, value:
chararray);
Is there another way I should be configuring it?
Chris
On Fri, May 18, 2012 at 11:24 AM, Andy Schlaikjer <
andrew.schlaikjer@gmail.com> wrote:
> Chris, the console output mentions file "/opt/shared_storage/log_
> analysis_pig_python_scripts/pig_1337299061301.log". Does this contain any
> kind of stack trace? Were you running the script in local mode or on a
> cluster? If the latter, there should be at least map task log output
> someplace that may also have some clues.
>
> Does path
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> contain SequenceFile<Text, Text> data? If not, you'll have to configure
> SequenceFileLoader further to properly deserialize the key-value pairs.
>
> Andy
>
>
> On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <cp...@gmail.com> wrote:
>
> > Andy,
> >
> > Here's what I'm seeing when I run the following script. There's no
> > information beyond what is here in the log file.
> >
> > Chris
> >
> > REGISTER
> >
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> > %declare SEQFILE_LOADER
> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > %declare TEXT_CONVERTER
> 'com.twitter.elephantbird.pig.util.TextConverter';
> > %declare NULL_CONVERTER
> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >
> > rmf /data/SearchLogJSON;
> >
> > -- Load raw log data
> > raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ();
> >
> > -- Store the JSON
> > STORE raw_logs INTO '/data/SearchLogJSON/';
> >
> > -------------------
> >
> > -sh-3.2$ pig dump_log_json.pig
> > 2012-05-17 23:57:41,304 [main] INFO org.apache.pig.Main - Logging error
> > messages to:
> > /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> > 2012-05-17 23:57:41,586 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> > Connecting to hadoop file system at: XXX
> > 2012-05-17 23:57:41,932 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> > Connecting to map-reduce job tracker at: XXX
> > 2012-05-17 23:57:42,204 [main] INFO
> > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> > script: UNKNOWN
> > 2012-05-17 23:57:42,204 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> > pig.usenewlogicalplan is set to true. New logical plan will be used.
> > 2012-05-17 23:57:42,301 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> > raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
> > scope-1 Operator Key: scope-1)
> > 2012-05-17 23:57:42,317 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
> -
> > File concatenation threshold: 100 optimistic? false
> > 2012-05-17 23:57:42,349 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> > - MR plan size before optimization: 1
> > 2012-05-17 23:57:42,349 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> > - MR plan size after optimization: 1
> > 2012-05-17 23:57:42,529 [main] INFO
> > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
> added
> > to the job
> > 2012-05-17 23:57:42,545 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - mapred.job.reduce.markreset.buffer.percent is not set, set to default
> 0.3
> > 2012-05-17 23:57:44,706 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Setting up single store job
> > 2012-05-17 23:57:44,734 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 1 map-reduce job(s) waiting for submission.
> > 2012-05-17 23:57:45,053 [Thread-4] INFO
> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
> paths
> > to process : 1
> > 2012-05-17 23:57:45,057 [Thread-4] INFO
> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> > input paths (combined) to process : 1
> > 2012-05-17 23:57:45,236 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 0% complete
> > 2012-05-17 23:57:45,849 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - HadoopJobId: job_201205170527_0003
> > 2012-05-17 23:57:45,849 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - More information at: XXX
> > 2012-05-17 23:58:25,816 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - job job_201205170527_0003 has failed! Stop running all dependent jobs
> > 2012-05-17 23:58:25,821 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 100% complete
> > 2012-05-17 23:58:25,824 [main] ERROR
> > org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> > 2012-05-17 23:58:25,825 [main] INFO
> org.apache.pig.tools.pigstats.PigStats
> > - Script Statistics:
> >
> > HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> > 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
> > 23:58:25 UNKNOWN
> >
> > Failed!
> >
> > Failed Jobs:
> > JobId Alias Feature Message Outputs
> > job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA
> > /data/SearchLogJSON,
> >
> > Input(s):
> > Failed to read data from
> > "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"
> >
> > Output(s):
> > Failed to produce result in "/data/SearchLogJSON"
> >
> > Counters:
> > Total records written : 0
> > Total bytes written : 0
> > Spillable Memory Manager spill count : 0
> > Total bags proactively spilled: 0
> > Total records proactively spilled: 0
> >
> > Job DAG:
> > job_201205170527_0003
> >
> >
> > 2012-05-17 23:58:25,825 [main] INFO
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Failed!
> > 2012-05-17 23:58:25,831 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser
> > - ERROR 2244: Job failed, hadoop does not return any error message
> > Details at logfile:
> > /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> >
> >
> >
> > On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
> > andrew.schlaikjer@gmail.com> wrote:
> >
> > > Chris, could you send us any of your error logs? What kind of failures
> > are
> > > you running into?
> > >
> > > Andy
> > >
> > >
> > > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com>
> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'm attempting to load sequence files for the first using Elephant
> > Bird's
> > > > sequence file loader and having absolutely no luck.
> > > >
> > > > I did a hadoop fs -text one on of the sequence files and noticed all
> > the
> > > > keys are (null). Not sure if that is throwing off things here.
> > > >
> > > > Here are various approaches I've tried that all have failed.
> > > >
> > > > REGISTER
> > > >
> > >
> >
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> > > > %declare SEQFILE_LOADER
> > > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > > > %declare TEXT_CONVERTER
> > > 'com.twitter.elephantbird.pig.util.TextConverter';
> > > > %declare NULL_CONVERTER
> > > > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> > > >
> > > > raw_logs = LOAD
> > > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > > USING
> > > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > > > bytearray, value: chararray);
> > > > --raw_logs = LOAD
> > > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > > USING
> > > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > > > chararray, value: chararray);
> > > > --raw_logs = LOAD
> > > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > > USING
> > > > $SEQFILE_LOADER ();
> > > >
> > > > STORE raw_logs INTO '/data/SearchLogJSON/';
> > > >
> > > > Any thoughts on what might be the problem? Anything else I should
> try?
> > > I'm
> > > > totally out of ideas.
> > > >
> > > > Appreciate any pointers!
> > > >
> > > > Chris
> > > >
> > >
> >
>
Re: Problem loading sequence files with Elephant Bird
Posted by Andy Schlaikjer <an...@gmail.com>.
Chris, the console output mentions file "/opt/shared_storage/log_
analysis_pig_python_scripts/pig_1337299061301.log". Does this contain any
kind of stack trace? Were you running the script in local mode or on a
cluster? If the latter, there should be at least map task log output
someplace that may also have some clues.
Does path '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
contain SequenceFile<Text, Text> data? If not, you'll have to configure
SequenceFileLoader further to properly deserialize the key-value pairs.
Andy
On Thu, May 17, 2012 at 5:07 PM, Chris Diehl <cp...@gmail.com> wrote:
> Andy,
>
> Here's what I'm seeing when I run the following script. There's no
> information beyond what is here in the log file.
>
> Chris
>
> REGISTER
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
> %declare NULL_CONVERTER
> 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>
> rmf /data/SearchLogJSON;
>
> -- Load raw log data
> raw_logs = LOAD
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
> $SEQFILE_LOADER ();
>
> -- Store the JSON
> STORE raw_logs INTO '/data/SearchLogJSON/';
>
> -------------------
>
> -sh-3.2$ pig dump_log_json.pig
> 2012-05-17 23:57:41,304 [main] INFO org.apache.pig.Main - Logging error
> messages to:
> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
> 2012-05-17 23:57:41,586 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: XXX
> 2012-05-17 23:57:41,932 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to map-reduce job tracker at: XXX
> 2012-05-17 23:57:42,204 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: UNKNOWN
> 2012-05-17 23:57:42,204 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2012-05-17 23:57:42,301 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
> scope-1 Operator Key: scope-1)
> 2012-05-17 23:57:42,317 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2012-05-17 23:57:42,349 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2012-05-17 23:57:42,349 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2012-05-17 23:57:42,529 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> to the job
> 2012-05-17 23:57:42,545 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-05-17 23:57:44,706 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2012-05-17 23:57:44,734 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2012-05-17 23:57:45,053 [Thread-4] INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 2012-05-17 23:57:45,057 [Thread-4] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths (combined) to process : 1
> 2012-05-17 23:57:45,236 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2012-05-17 23:57:45,849 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_201205170527_0003
> 2012-05-17 23:57:45,849 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - More information at: XXX
> 2012-05-17 23:58:25,816 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_201205170527_0003 has failed! Stop running all dependent jobs
> 2012-05-17 23:58:25,821 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2012-05-17 23:58:25,824 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2012-05-17 23:58:25,825 [main] INFO org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
>
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> 0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
> 23:58:25 UNKNOWN
>
> Failed!
>
> Failed Jobs:
> JobId Alias Feature Message Outputs
> job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA
> /data/SearchLogJSON,
>
> Input(s):
> Failed to read data from
> "/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"
>
> Output(s):
> Failed to produce result in "/data/SearchLogJSON"
>
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
>
> Job DAG:
> job_201205170527_0003
>
>
> 2012-05-17 23:58:25,825 [main] INFO
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2012-05-17 23:58:25,831 [main] ERROR org.apache.pig.tools.grunt.GruntParser
> - ERROR 2244: Job failed, hadoop does not return any error message
> Details at logfile:
> /opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
>
>
>
> On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
> andrew.schlaikjer@gmail.com> wrote:
>
> > Chris, could you send us any of your error logs? What kind of failures
> are
> > you running into?
> >
> > Andy
> >
> >
> > On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I'm attempting to load sequence files for the first using Elephant
> Bird's
> > > sequence file loader and having absolutely no luck.
> > >
> > > I did a hadoop fs -text one on of the sequence files and noticed all
> the
> > > keys are (null). Not sure if that is throwing off things here.
> > >
> > > Here are various approaches I've tried that all have failed.
> > >
> > > REGISTER
> > >
> >
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> > > %declare SEQFILE_LOADER
> > > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > > %declare TEXT_CONVERTER
> > 'com.twitter.elephantbird.pig.util.TextConverter';
> > > %declare NULL_CONVERTER
> > > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> > >
> > > raw_logs = LOAD
> > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > USING
> > > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > > bytearray, value: chararray);
> > > --raw_logs = LOAD
> > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > USING
> > > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > > chararray, value: chararray);
> > > --raw_logs = LOAD
> > > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> > USING
> > > $SEQFILE_LOADER ();
> > >
> > > STORE raw_logs INTO '/data/SearchLogJSON/';
> > >
> > > Any thoughts on what might be the problem? Anything else I should try?
> > I'm
> > > totally out of ideas.
> > >
> > > Appreciate any pointers!
> > >
> > > Chris
> > >
> >
>
Re: Problem loading sequence files with Elephant Bird
Posted by Chris Diehl <cp...@gmail.com>.
Andy,
Here's what I'm seeing when I run the following script. There's no
information beyond what is here in the log file.
Chris
REGISTER
'/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
%declare SEQFILE_LOADER
'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
%declare NULL_CONVERTER
'com.twitter.elephantbird.pig.util.NullWritableConverter'
rmf /data/SearchLogJSON;
-- Load raw log data
raw_logs = LOAD
'/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
$SEQFILE_LOADER ();
-- Store the JSON
STORE raw_logs INTO '/data/SearchLogJSON/';
-------------------
-sh-3.2$ pig dump_log_json.pig
2012-05-17 23:57:41,304 [main] INFO org.apache.pig.Main - Logging error
messages to:
/opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
2012-05-17 23:57:41,586 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: XXX
2012-05-17 23:57:41,932 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to map-reduce job tracker at: XXX
2012-05-17 23:57:42,204 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2012-05-17 23:57:42,204 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2012-05-17 23:57:42,301 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
raw_logs: Store(/data/SearchLogJSON:org.apache.pig.builtin.PigStorage) -
scope-1 Operator Key: scope-1)
2012-05-17 23:57:42,317 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-05-17 23:57:42,349 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-05-17 23:57:42,349 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2012-05-17 23:57:42,529 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2012-05-17 23:57:42,545 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-05-17 23:57:44,706 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2012-05-17 23:57:44,734 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2012-05-17 23:57:45,053 [Thread-4] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2012-05-17 23:57:45,057 [Thread-4] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2012-05-17 23:57:45,236 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2012-05-17 23:57:45,849 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201205170527_0003
2012-05-17 23:57:45,849 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at: XXX
2012-05-17 23:58:25,816 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_201205170527_0003 has failed! Stop running all dependent jobs
2012-05-17 23:58:25,821 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2012-05-17 23:58:25,824 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2012-05-17 23:58:25,825 [main] INFO org.apache.pig.tools.pigstats.PigStats
- Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2-cdh3u2 0.8.1-cdh3u2 chris.diehl 2012-05-17 23:57:42 2012-05-17
23:58:25 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201205170527_0003 raw_logs MAP_ONLY Message: Job failed! Error - NA
/data/SearchLogJSON,
Input(s):
Failed to read data from
"/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq"
Output(s):
Failed to produce result in "/data/SearchLogJSON"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201205170527_0003
2012-05-17 23:58:25,825 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2012-05-17 23:58:25,831 [main] ERROR org.apache.pig.tools.grunt.GruntParser
- ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile:
/opt/shared_storage/log_analysis_pig_python_scripts/pig_1337299061301.log
On Thu, May 17, 2012 at 1:20 PM, Andy Schlaikjer <
andrew.schlaikjer@gmail.com> wrote:
> Chris, could you send us any of your error logs? What kind of failures are
> you running into?
>
> Andy
>
>
> On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com> wrote:
>
> > Hi All,
> >
> > I'm attempting to load sequence files for the first using Elephant Bird's
> > sequence file loader and having absolutely no luck.
> >
> > I did a hadoop fs -text one on of the sequence files and noticed all the
> > keys are (null). Not sure if that is throwing off things here.
> >
> > Here are various approaches I've tried that all have failed.
> >
> > REGISTER
> >
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> > %declare SEQFILE_LOADER
> > 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> > %declare TEXT_CONVERTER
> 'com.twitter.elephantbird.pig.util.TextConverter';
> > %declare NULL_CONVERTER
> > 'com.twitter.elephantbird.pig.util.NullWritableConverter'
> >
> > raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > bytearray, value: chararray);
> > --raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> > chararray, value: chararray);
> > --raw_logs = LOAD
> > '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq'
> USING
> > $SEQFILE_LOADER ();
> >
> > STORE raw_logs INTO '/data/SearchLogJSON/';
> >
> > Any thoughts on what might be the problem? Anything else I should try?
> I'm
> > totally out of ideas.
> >
> > Appreciate any pointers!
> >
> > Chris
> >
>
Re: Problem loading sequence files with Elephant Bird
Posted by Andy Schlaikjer <an...@gmail.com>.
Chris, could you send us any of your error logs? What kind of failures are
you running into?
Andy
On Wed, May 16, 2012 at 11:47 AM, Chris Diehl <cp...@gmail.com> wrote:
> Hi All,
>
> I'm attempting to load sequence files for the first using Elephant Bird's
> sequence file loader and having absolutely no luck.
>
> I did a hadoop fs -text one on of the sequence files and noticed all the
> keys are (null). Not sure if that is throwing off things here.
>
> Here are various approaches I've tried that all have failed.
>
> REGISTER
> '/opt/shared_storage/elephant-bird/build/elephant-bird-2.2.3-SNAPSHOT.jar';
> %declare SEQFILE_LOADER
> 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
> %declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
> %declare NULL_CONVERTER
> 'com.twitter.elephantbird.pig.util.NullWritableConverter'
>
> raw_logs = LOAD
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
> $SEQFILE_LOADER ('-c $NULL_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> bytearray, value: chararray);
> --raw_logs = LOAD
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
> $SEQFILE_LOADER ('-c $TEXT_CONVERTER','-c $TEXT_CONVERTER') AS (key:
> chararray, value: chararray);
> --raw_logs = LOAD
> '/logs/jive/internal/raw/2012/05/07/2012050795652.0627-720078349.seq' USING
> $SEQFILE_LOADER ();
>
> STORE raw_logs INTO '/data/SearchLogJSON/';
>
> Any thoughts on what might be the problem? Anything else I should try? I'm
> totally out of ideas.
>
> Appreciate any pointers!
>
> Chris
>