You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by Dave Beech <da...@paraliatech.com> on 2013/06/28 17:07:44 UTC
Misleading error message
Hi all,
Please take a look at the following pipeline:
read(From.textFile(args[0])).write(To.textFile(args[1] + "-text"));
run();
read(From.textFile(args[0])).write(To.sequenceFile(args[1] + "-seq"));
run();
read(From.textFile(args[0])).write(To.avroFile(args[1] + "-avro"));
done();
The first two jobs are fine, and give correct output types of text and
sequence files respectively. The text to avro conversion fails. This is no
great surprise, knowing a little about the internals of Crunch, but when
put alongside the other examples it feels like it should work.
Even if it can't work - no big deal, it's just a toy example. The main
problem for me was the error message:
13/06/28 14:11:40 INFO jobcontrol.CrunchControlledJob:
org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
I think the job should have been killed somewhere before this point. There
must be a bit of logic (though I haven't properly looked for it) which
decides the requested target is no good for the PCollection provided, so
the exception should be raised there with a message explaining this.
What do you think?
I'm sure there's a JIRA ticket lurking somewhere in all this - I'm just not
sure what it is! :)
Thanks,
Dave
Re: Misleading error message
Posted by Josh Wills <jw...@cloudera.com>.
Hey Dave,
Agree with your assessment, we should fail fast in this case. Adding a JIRA
issue w/a patch for it.
J
On Fri, Jun 28, 2013 at 8:07 AM, Dave Beech <da...@paraliatech.com> wrote:
> Hi all,
>
> Please take a look at the following pipeline:
>
> read(From.textFile(args[0])).write(To.textFile(args[1] + "-text"));
> run();
> read(From.textFile(args[0])).write(To.sequenceFile(args[1] + "-seq"));
> run();
> read(From.textFile(args[0])).write(To.avroFile(args[1] + "-avro"));
> done();
> The first two jobs are fine, and give correct output types of text and
> sequence files respectively. The text to avro conversion fails. This is no
> great surprise, knowing a little about the internals of Crunch, but when
> put alongside the other examples it feels like it should work.
>
> Even if it can't work - no big deal, it's just a toy example. The main
> problem for me was the error message:
>
> 13/06/28 14:11:40 INFO jobcontrol.CrunchControlledJob:
> org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
> at
>
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
>
> I think the job should have been killed somewhere before this point. There
> must be a bit of logic (though I haven't properly looked for it) which
> decides the requested target is no good for the PCollection provided, so
> the exception should be raised there with a message explaining this.
>
> What do you think?
>
> I'm sure there's a JIRA ticket lurking somewhere in all this - I'm just not
> sure what it is! :)
>
> Thanks,
> Dave
>
--
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>