You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@crunch.apache.org by Dave Beech <da...@paraliatech.com> on 2013/06/28 17:07:44 UTC

Misleading error message

Hi all,

Please take a look at the following pipeline:

read(From.textFile(args[0])).write(To.textFile(args[1] + "-text"));
run();
read(From.textFile(args[0])).write(To.sequenceFile(args[1] + "-seq"));
run();
read(From.textFile(args[0])).write(To.avroFile(args[1] + "-avro"));
done();

The first two jobs are fine, and give correct output types of text and
sequence files respectively. The text to avro conversion fails. This is no
great surprise, knowing a little about the internals of Crunch, but when
put alongside the other examples it feels like it should work.

Even if it can't work - no big deal, it's just a toy example. The main
problem for me was the error message:

13/06/28 14:11:40 INFO jobcontrol.CrunchControlledJob:
org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)

I think the job should have been killed somewhere before this point. There
must be a bit of logic (though I haven't properly looked for it) which
decides the requested target is no good for the PCollection provided, so
the exception should be raised there with a message explaining this.

What do you think?

I'm sure there's a JIRA ticket lurking somewhere in all this - I'm just not
sure what it is! :)

Thanks,
Dave

Re: Misleading error message

Posted by Josh Wills <jw...@cloudera.com>.

Hey Dave,

Agree with your assessment, we should fail fast in this case. Adding a JIRA
issue w/a patch for it.

J

On Fri, Jun 28, 2013 at 8:07 AM, Dave Beech <da...@paraliatech.com> wrote:

> Hi all,
>
> Please take a look at the following pipeline:
>
> read(From.textFile(args[0])).write(To.textFile(args[1] + "-text"));
> run();
> read(From.textFile(args[0])).write(To.sequenceFile(args[1] + "-seq"));
> run();
> read(From.textFile(args[0])).write(To.avroFile(args[1] + "-avro"));
> done();


> The first two jobs are fine, and give correct output types of text and
> sequence files respectively. The text to avro conversion fails. This is no
> great surprise, knowing a little about the internals of Crunch, but when
> put alongside the other examples it feels like it should work.
>
> Even if it can't work - no big deal, it's just a toy example. The main
> problem for me was the error message:
>
> 13/06/28 14:11:40 INFO jobcontrol.CrunchControlledJob:
> org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
> at
>
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
>
> I think the job should have been killed somewhere before this point. There
> must be a bit of logic (though I haven't properly looked for it) which
> decides the requested target is no good for the PCollection provided, so
> the exception should be raised there with a message explaining this.
>
> What do you think?
>
> I'm sure there's a JIRA ticket lurking somewhere in all this - I'm just not
> sure what it is! :)
>
> Thanks,
> Dave
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>