You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Josh Wills <jw...@cloudera.com> on 2014/09/09 17:02:43 UTC

Re: Getting exception for a crunch job when running on Elastic Map Reduce, which runs fine locally

Hey,

Sorry I missed this-- I only really monitor the user@crunch.apache.org
mailing list closely. Which version of Crunch were you running when you got
the exception? The usual explanation for that kind of error is that an
upstream job failed, in which case you should see the error in the
JobHistory server. It's also possible that we're not handling S3/EMR
FileSystem stuff correctly, which happens sometimes, e.g.,

http://mail-archives.apache.org/mod_mbox/crunch-user/201310.mbox/%3CCAHCsPn8pvqJ6aJWcEqk4R3YZ8gu_MYVig+pqFEgw-AfSAD277w@mail.gmail.com%3E

Josh

On Mon, Sep 8, 2014 at 8:19 AM, <fa...@gmail.com> wrote:

> I am trying to get some directions on how to go about debugging this issue. I run my crunch job on a local hadoop setup, and it works fine. I understand that the data is much smaller, but the files that I am trying to diff have the same structure (dump from different sources). The stack trace is as follows:
>
>
> No files found to materialize at: /tmp/crunch-1412622901/p4
> 	at org.apache.crunch.materialize.MaterializableIterable.materialize(MaterializableIterable.java:79)
> 	at org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:69)
> 	at org.apache.crunch.materialize.pobject.CollectionPObject.process(CollectionPObject.java:49)
> 	at org.apache.crunch.materialize.pobject.CollectionPObject.process(CollectionPObject.java:34)
> 	at org.apache.crunch.materialize.pobject.PObjectImpl.getValue(PObjectImpl.java:70)
> 	at <redacted>(BulkDiffCommand.java:126)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>
> The line in my code that invokes the above calls the following method:
>
>
> tables.asCollection().getValue()
>
>
> Also, this error occurs after the job runs for a few hours, and then fails on one of its subsequent jobs. I've looked at crunch source code, and CompositePath.create() method basically couldn't find the path noted above. I am trying a few things out, but any ideas on how to go about debugging this?
>
>
>
>  To unsubscribe from this group and stop receiving emails from it, send an
> email to crunch-users+unsubscribe@cloudera.org.
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>