You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Das, Mridul" <mr...@news.com.au> on 2014/07/22 09:26:59 UTC

Ignoring crunch.disable.combine.file

Hi,
  While trying to read text files from S3 using Crunch(version is 0.10.0),
I get an error "INFO jobcontrol.CrunchControlledJob:
java.io.FileNotFoundException: File does not exist:"  This is probably
because of https://issues.apache.org/jira/browse/MAPREDUCE-2704.
 However when I try and disable combining
files( crunch.disable.combine.file=true), I can see it ignores the setting,
and creates CrunchCombineFileInputFormat.




-- 
 Mridul Das
 Data Engineer
 Level 4, 2 Holt Street Surry Hills NSW 2010
T +61 2 8114 7621 M +61 478 977 665
E mridul.das@news.com.au W www.NewsCorpAustralia.com
 Proudly supporting 1 degree <http://www.1degree.net.au>, A News Corp
Australia initiative.
 [image: News Corp Australia]

-- 
This message and its attachments may contain legally privileged or 
confidential information. It is intended solely for the named addressee. If 
you are not the addressee indicated in this message or responsible for 
delivery of the message to the addressee, you may not copy or deliver this 
message or its attachments to anyone. Rather, you should permanently delete 
this message and its attachments and kindly notify the sender by reply 
e-mail. Any content of this message and its attachments which does not 
relate to the official business of the sending company must be taken not to 
have been sent or endorsed by that company or any of its related entities. 
No warranty is made that the e-mail or attachments are free from computer 
virus or other defect.

Re: Ignoring crunch.disable.combine.file

Posted by Josh Wills <jw...@cloudera.com>.
Hey man,

You should be able to override the disable settings on the TextFileSource
you create with the inputConf method, so like:

From.textFile(somePath).inputConf("crunch.disable.combine.file", "true");

Let me know if that does the trick,
J


On Tue, Jul 22, 2014 at 12:39 AM, Das, Mridul <mr...@news.com.au>
wrote:

> To add, the constructor of the TextFileSource (all the sources for that
> matter) set the crunch.disable.combine.file to false, thereby overriding
> any value supplied by the user.
>
>
> On 22 July 2014 17:26, Das, Mridul <mr...@news.com.au> wrote:
>
> > Hi,
> >   While trying to read text files from S3 using Crunch(version is
> 0.10.0),
> > I get an error "INFO jobcontrol.CrunchControlledJob:
> > java.io.FileNotFoundException: File does not exist:"  This is probably
> > because of https://issues.apache.org/jira/browse/MAPREDUCE-2704.
> >  However when I try and disable combining
> > files( crunch.disable.combine.file=true), I can see it ignores the
> setting,
> > and creates CrunchCombineFileInputFormat.
> >
> >
> >
> >
> > --
> >  Mridul Das
> >  Data Engineer
> >  Level 4, 2 Holt Street Surry Hills NSW 2010
> > T +61 2 8114 7621 M +61 478 977 665
> > E mridul.das@news.com.au W www.NewsCorpAustralia.com
> >  Proudly supporting 1 degree <http://www.1degree.net.au>, A News Corp
> > Australia initiative.
> >  [image: News Corp Australia]
> >
>
>
>
> --
>  Mridul Das
>  Data Engineer
>  Level 4, 2 Holt Street Surry Hills NSW 2010
> T +61 2 8114 7621 M +61 478 977 665
> E mridul.das@news.com.au W www.NewsCorpAustralia.com
>  Proudly supporting 1 degree <http://www.1degree.net.au>, A News Corp
> Australia initiative.
>  [image: News Corp Australia]
>
> --
> This message and its attachments may contain legally privileged or
> confidential information. It is intended solely for the named addressee. If
> you are not the addressee indicated in this message or responsible for
> delivery of the message to the addressee, you may not copy or deliver this
> message or its attachments to anyone. Rather, you should permanently delete
> this message and its attachments and kindly notify the sender by reply
> e-mail. Any content of this message and its attachments which does not
> relate to the official business of the sending company must be taken not to
> have been sent or endorsed by that company or any of its related entities.
> No warranty is made that the e-mail or attachments are free from computer
> virus or other defect.
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Ignoring crunch.disable.combine.file

Posted by "Das, Mridul" <mr...@news.com.au>.
To add, the constructor of the TextFileSource (all the sources for that
matter) set the crunch.disable.combine.file to false, thereby overriding
any value supplied by the user.


On 22 July 2014 17:26, Das, Mridul <mr...@news.com.au> wrote:

> Hi,
>   While trying to read text files from S3 using Crunch(version is 0.10.0),
> I get an error "INFO jobcontrol.CrunchControlledJob:
> java.io.FileNotFoundException: File does not exist:"  This is probably
> because of https://issues.apache.org/jira/browse/MAPREDUCE-2704.
>  However when I try and disable combining
> files( crunch.disable.combine.file=true), I can see it ignores the setting,
> and creates CrunchCombineFileInputFormat.
>
>
>
>
> --
>  Mridul Das
>  Data Engineer
>  Level 4, 2 Holt Street Surry Hills NSW 2010
> T +61 2 8114 7621 M +61 478 977 665
> E mridul.das@news.com.au W www.NewsCorpAustralia.com
>  Proudly supporting 1 degree <http://www.1degree.net.au>, A News Corp
> Australia initiative.
>  [image: News Corp Australia]
>



-- 
 Mridul Das
 Data Engineer
 Level 4, 2 Holt Street Surry Hills NSW 2010
T +61 2 8114 7621 M +61 478 977 665
E mridul.das@news.com.au W www.NewsCorpAustralia.com
 Proudly supporting 1 degree <http://www.1degree.net.au>, A News Corp
Australia initiative.
 [image: News Corp Australia]

-- 
This message and its attachments may contain legally privileged or 
confidential information. It is intended solely for the named addressee. If 
you are not the addressee indicated in this message or responsible for 
delivery of the message to the addressee, you may not copy or deliver this 
message or its attachments to anyone. Rather, you should permanently delete 
this message and its attachments and kindly notify the sender by reply 
e-mail. Any content of this message and its attachments which does not 
relate to the official business of the sending company must be taken not to 
have been sent or endorsed by that company or any of its related entities. 
No warranty is made that the e-mail or attachments are free from computer 
virus or other defect.