You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Mathijs Homminga (Commented) (JIRA)" <ji...@apache.org> on 2012/03/06 14:04:57 UTC

[jira] [Commented] (NUTCH-1290) crawlId not supported by all Tools

    [ https://issues.apache.org/jira/browse/NUTCH-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223222#comment-13223222 ] 

Mathijs Homminga commented on NUTCH-1290:
-----------------------------------------

Actually, the IndexerReducer is only used by the IndexerJob, which in turn is only implemented by the SolrIndexerJob at the moment. 
The SolrIndexerJob does pretend to support for the crawlId, but since it uses the createDataStore method (instead of the createWebStore method), it will ignore the crawlId eventually.

                
> crawlId not supported by all Tools
> ----------------------------------
>
>                 Key: NUTCH-1290
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1290
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: nutchgora
>            Reporter: Mathijs Homminga
>            Priority: Minor
>             Fix For: nutchgora
>
>
> See also: https://issues.apache.org/jira/browse/NUTCH-907
> The StorageUtils class exposes a createDataStore method which uses the default schema for a persistent class specified in the Gora configuration. 
> This method ignores Nutch' storage.schema property and the notion of a crawlId.
> Two tools use this method instead of the createWebStore method (which does support the storage.schema property and a crawlId):
> o.a.n.indexer.IndexerReducer (IndexerJob)
> o.a.n.util.domain.DomainStatistics
>  
> I propose that these two start using the createWebStore method and that we make remove the createDataStore method from the StorageUtils.
> Also, these two tools should support the crawlId command line parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira