You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2007/06/01 09:53:15 UTC

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

    [ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500603 ] 

Doğacan Güney commented on NUTCH-392:
-------------------------------------

>From what I  understand of MapFile.Writer code in hadoop, if you give CompressionType as an argument in its constructor it overwrites the compression value in config. So since nutch manually sets parse_text and parse_data to RECORD compression ( and crawl_parse to NONE), we will not get the advantages of BLOCK compression even if we set it in config. 

BLOCK compression seems to work really great if you got the native libraries in place, so IMHO it would be better to not manually set CompressionType and allow people to set it to whatever they want in config.

> OutputFormat implementations should pass on Progressable
> --------------------------------------------------------
>
>                 Key: NUTCH-392
>                 URL: https://issues.apache.org/jira/browse/NUTCH-392
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>            Reporter: Doug Cutting
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to underlying SequenceFile implementations.  This will keep reduce tasks from timing out when block writes are slow.  This issue depends on http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.