You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2011/03/10 10:17:17 UTC
Coding gotcha in WikipediaToSequenceFile.java
In WikipediaToSequenceFile there is a coding mistake: new Job(conf)
copies the conf structure and all future changes are not reflected in
the job configuration. If you ever uncomment the second block of code,
it will not take effect.
conf.set( stuff)
conf.set(more stuff)
Job job = new Job(conf)
......
/*
* conf.set("mapred.compress.map.output", "true");
conf.set("mapred.map.output.compression.type",
* "BLOCK"); conf.set("mapred.output.compress", "true");
conf.set("mapred.output.compression.type",
* "BLOCK"); conf.set("mapred.output.compression.codec",
"org.apache.hadoop.io.compress.GzipCodec");
*/
--
Lance Norskog
goksron@gmail.com
Re: Coding gotcha in WikipediaToSequenceFile.java
Posted by Vasil Vasilev <va...@gmail.com>.
I had similar issue with seq2sparse. The LLR parameter was not taken
into account. The place in code was the CollocDriver
On 3/10/11, Robin Anil <ro...@gmail.com> wrote:
> the compression part shouldnt be in mahout code. It should be supplied
> externally by the user based on his hadoop setup. Can remove that
>
>
> On Thu, Mar 10, 2011 at 2:47 PM, Lance Norskog <go...@gmail.com> wrote:
>
>> In WikipediaToSequenceFile there is a coding mistake: new Job(conf)
>> copies the conf structure and all future changes are not reflected in
>> the job configuration. If you ever uncomment the second block of code,
>> it will not take effect.
>>
>> conf.set( stuff)
>> conf.set(more stuff)
>> Job job = new Job(conf)
>>
>> ......
>>
>> /*
>> * conf.set("mapred.compress.map.output", "true");
>> conf.set("mapred.map.output.compression.type",
>> * "BLOCK"); conf.set("mapred.output.compress", "true");
>> conf.set("mapred.output.compression.type",
>> * "BLOCK"); conf.set("mapred.output.compression.codec",
>> "org.apache.hadoop.io.compress.GzipCodec");
>> */
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
Re: Coding gotcha in WikipediaToSequenceFile.java
Posted by Robin Anil <ro...@gmail.com>.
the compression part shouldnt be in mahout code. It should be supplied
externally by the user based on his hadoop setup. Can remove that
On Thu, Mar 10, 2011 at 2:47 PM, Lance Norskog <go...@gmail.com> wrote:
> In WikipediaToSequenceFile there is a coding mistake: new Job(conf)
> copies the conf structure and all future changes are not reflected in
> the job configuration. If you ever uncomment the second block of code,
> it will not take effect.
>
> conf.set( stuff)
> conf.set(more stuff)
> Job job = new Job(conf)
>
> ......
>
> /*
> * conf.set("mapred.compress.map.output", "true");
> conf.set("mapred.map.output.compression.type",
> * "BLOCK"); conf.set("mapred.output.compress", "true");
> conf.set("mapred.output.compression.type",
> * "BLOCK"); conf.set("mapred.output.compression.codec",
> "org.apache.hadoop.io.compress.GzipCodec");
> */
>
>
> --
> Lance Norskog
> goksron@gmail.com
>