You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Updated) (JIRA)" <ji...@apache.org> on 2011/12/27 16:34:30 UTC

[jira] [Updated] (NUTCH-1104) Port issues from trunk NutchGora branch

     [ https://issues.apache.org/jira/browse/NUTCH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1104:
---------------------------------

    Description: 
Umbrella issue for tracking issues that should be ported from 1.x trunk to the NutchGora branch. Please mark ported issues by modifying this description.

NOT YET PORTED:

* NUTCH-987 Support HTTP auth for Solr communication
* NUTCH-1028 Log parser keys
* NUTCH-1036 Solr jobs should increment counters in Reporter
* NUTCH-1057 Make fetcher thread time out configurable
* NUTCH-1067 Configure minimum throughput for fetcher
* NUTCH-1101 Options to purge db_gone records in updatedb
* NUTCH-1102 Fetcher, rely on fetcher.parse directive only
* NUTCH-1105 MaxContentLength option for index-basic
* NUTCH-940 Statis field plugin
* NUTCH-1094 create comprehensive documentation for Nutch 2.0 trunk
* NUTCH-1207 ParserChecker to output signature
* NUTCH-1090 InvertLinks should inform when ignoring internal links
* NUTCH-1174 Outlinks are not properly normalized
* NUTCH-1203 ParseSegment to show number of milliseconds per parse
* NUTCH-1173 DomainStats doesn't count db_not_modified
* NUTCH-1155 Host/domain limit in generator is generate.max.count+1
* NUTCH-1061 Migrate MoreIndexingFilter from Apache ORO to java.util.regex
* NUTCH-1142 Normalization and filtering in WebGraph
* NUTCH-1153 LinkRank not to log all keys and not to write Hadoop _SUCCESS file
* NUTCH-1195 Add Solr 4x (trunk) example schema
* NUTCH-1141 Configurable Fetcher queue depth
* NUTCH-1214 DomainStats tool should be named for what it's doing
* NUTCH-1213 Pass additional SolrParams when indexing to Solr
* NUTCH-1211 URLFilterChecker command line help doesn't inform user of STDIN requirements
* NUTCH-1231 Upgrade to Tika 1.0
* NUTCH-1230 MimeType API deprecated and breaks with Tika 1.0
* NUTCH-1235 Upgrade to new Hadoop 0.20.205.0
* NUTCH-1184 Fetcher to parse and follow Nth degree outlinks
* NUTCH-1214 DomainStats tool should be named for what it's doing
* NUTCH-1207 ParserChecker to output signature
* NUTCH-1174 Outlinks are not properly normalized
* NUTCH-1173 DomainStats doesn't count db_not_modified
* NUTCH-1142 Normalization and filtering in WebGraph

PORTED:
* No issues yet


NOT GOING TO BE PORTED:
* No issues, explain why it should not be ported



  was:
Umbrella issue for tracking issues that should be ported from 1.x trunk to the NutchGora branch. Please mark ported issues by modifying this description.

NOT YET PORTED:

* NUTCH-987 Support HTTP auth for Solr communication
* NUTCH-1028 Log parser keys
* NUTCH-1036 Solr jobs should increment counters in Reporter
* NUTCH-1057 Make fetcher thread time out configurable
* NUTCH-1067 Configure minimum throughput for fetcher
* NUTCH-1101 Options to purge db_gone records in updatedb
* NUTCH-1102 Fetcher, rely on fetcher.parse directive only
* NUTCH-1105 MaxContentLength option for index-basic
* NUTCH-940 Statis field plugin
* NUTCH-1094 create comprehensive documentation for Nutch 2.0 trunk
* NUTCH-1207 ParserChecker to output signature
* NUTCH-1090 InvertLinks should inform when ignoring internal links
* NUTCH-1174 Outlinks are not properly normalized
* NUTCH-1203 ParseSegment to show number of milliseconds per parse
* NUTCH-1173 DomainStats doesn't count db_not_modified
* NUTCH-1155 Host/domain limit in generator is generate.max.count+1
* NUTCH-1061 Migrate MoreIndexingFilter from Apache ORO to java.util.regex
* NUTCH-1142 Normalization and filtering in WebGraph
* NUTCH-1153 LinkRank not to log all keys and not to write Hadoop _SUCCESS file
* NUTCH-1195 Add Solr 4x (trunk) example schema
* NUTCH-1141 Configurable Fetcher queue depth
* NUTCH-1214 DomainStats tool should be named for what it's doing
* NUTCH-1213 Pass additional SolrParams when indexing to Solr
* NUTCH-1211 URLFilterChecker command line help doesn't inform user of STDIN requirements


PORTED:
* No issues yet


NOT GOING TO BE PORTED:
* No issues, explain why it should not be ported



    
> Port issues from trunk NutchGora branch
> ---------------------------------------
>
>                 Key: NUTCH-1104
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1104
>             Project: Nutch
>          Issue Type: Task
>    Affects Versions: nutchgora
>            Reporter: Markus Jelsma
>             Fix For: nutchgora
>
>
> Umbrella issue for tracking issues that should be ported from 1.x trunk to the NutchGora branch. Please mark ported issues by modifying this description.
> NOT YET PORTED:
> * NUTCH-987 Support HTTP auth for Solr communication
> * NUTCH-1028 Log parser keys
> * NUTCH-1036 Solr jobs should increment counters in Reporter
> * NUTCH-1057 Make fetcher thread time out configurable
> * NUTCH-1067 Configure minimum throughput for fetcher
> * NUTCH-1101 Options to purge db_gone records in updatedb
> * NUTCH-1102 Fetcher, rely on fetcher.parse directive only
> * NUTCH-1105 MaxContentLength option for index-basic
> * NUTCH-940 Statis field plugin
> * NUTCH-1094 create comprehensive documentation for Nutch 2.0 trunk
> * NUTCH-1207 ParserChecker to output signature
> * NUTCH-1090 InvertLinks should inform when ignoring internal links
> * NUTCH-1174 Outlinks are not properly normalized
> * NUTCH-1203 ParseSegment to show number of milliseconds per parse
> * NUTCH-1173 DomainStats doesn't count db_not_modified
> * NUTCH-1155 Host/domain limit in generator is generate.max.count+1
> * NUTCH-1061 Migrate MoreIndexingFilter from Apache ORO to java.util.regex
> * NUTCH-1142 Normalization and filtering in WebGraph
> * NUTCH-1153 LinkRank not to log all keys and not to write Hadoop _SUCCESS file
> * NUTCH-1195 Add Solr 4x (trunk) example schema
> * NUTCH-1141 Configurable Fetcher queue depth
> * NUTCH-1214 DomainStats tool should be named for what it's doing
> * NUTCH-1213 Pass additional SolrParams when indexing to Solr
> * NUTCH-1211 URLFilterChecker command line help doesn't inform user of STDIN requirements
> * NUTCH-1231 Upgrade to Tika 1.0
> * NUTCH-1230 MimeType API deprecated and breaks with Tika 1.0
> * NUTCH-1235 Upgrade to new Hadoop 0.20.205.0
> * NUTCH-1184 Fetcher to parse and follow Nth degree outlinks
> * NUTCH-1214 DomainStats tool should be named for what it's doing
> * NUTCH-1207 ParserChecker to output signature
> * NUTCH-1174 Outlinks are not properly normalized
> * NUTCH-1173 DomainStats doesn't count db_not_modified
> * NUTCH-1142 Normalization and filtering in WebGraph
> PORTED:
> * No issues yet
> NOT GOING TO BE PORTED:
> * No issues, explain why it should not be ported

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira