You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Patrick Mézard (Jira)" <ji...@apache.org> on 2020/06/10 08:51:00 UTC
[jira] [Updated] (NUTCH-2792) nutch index -params is only used in
Solr indexer
[ https://issues.apache.org/jira/browse/NUTCH-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Mézard updated NUTCH-2792:
----------------------------------
Description:
`nutch index` help displays:
{code:java}
General options:
...
-params k1=v1&k2=v2... parameters passed to indexer plugins
(via property indexer.additional.params){code}
The option does nothing when used with CSV or dummy indexers. Looking at the code, the property is defined in:
[https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/indexer/IndexerMapReduce.java#L78]
which is only used in:
[https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrIndexWriter.java#L141]
Several possibilities:
* Drop the parameter from the help. Does not break backward compatibility.
* Move the -params handling in IndexWriters.java and add them to IndexWriterParams of every indexer. Not too impactful but not super clean either: the parameters are not "namespaced" per indexer, if someone uses multiple indexers there may be parameter collisions.
* Refactor the way these parameters are passed, to prefix them with target indexer. Would break backward compatibility. In that case, it would be good to change the format completely: turn -params into -param, allow multiple values to be passed and forget the '=/&' syntax (which does not handle escaping anyway).
Not sure how much this parameter is used. I would have used it to configure the output path for indexer-csv or indexer-dummy.
was:
`nutch index` help displays:
```
General options:
...
-params k1=v1&k2=v2... parameters passed to indexer plugins
(via property indexer.additional.params)
```
The option does nothing when used with CSV or dummy indexers. Looking at the code, the property is defined in:
[https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/indexer/IndexerMapReduce.java#L78]
which is only used in:
[https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrIndexWriter.java#L141]
Several possibilities:
* Drop the parameter from the help. Does not break backward compatibility.
* Move the -params handling in IndexWriters.java and add them to IndexWriterParams of every indexer. Not too impactful but not super clean either: the parameters are not "namespaced" per indexer, if someone uses multiple indexers there may be parameter collisions.
* Refactor the way these parameters are passed, to prefix them with target indexer. Would break backward compatibility. In that case, it would be good to change the format completely: turn -params into -param, allow multiple values to be passed and forget the '=/&' syntax (which does not handle escaping anyway).
Not sure how much this parameter is used. I would have used it to configure the output path for indexer-csv or indexer-dummy.
> nutch index -params is only used in Solr indexer
> ------------------------------------------------
>
> Key: NUTCH-2792
> URL: https://issues.apache.org/jira/browse/NUTCH-2792
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Affects Versions: 1.17
> Reporter: Patrick Mézard
> Priority: Minor
>
> `nutch index` help displays:
> {code:java}
> General options:
> ...
> -params k1=v1&k2=v2... parameters passed to indexer plugins
> (via property indexer.additional.params){code}
> The option does nothing when used with CSV or dummy indexers. Looking at the code, the property is defined in:
> [https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/indexer/IndexerMapReduce.java#L78]
> which is only used in:
> [https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrIndexWriter.java#L141]
> Several possibilities:
> * Drop the parameter from the help. Does not break backward compatibility.
> * Move the -params handling in IndexWriters.java and add them to IndexWriterParams of every indexer. Not too impactful but not super clean either: the parameters are not "namespaced" per indexer, if someone uses multiple indexers there may be parameter collisions.
> * Refactor the way these parameters are passed, to prefix them with target indexer. Would break backward compatibility. In that case, it would be good to change the format completely: turn -params into -param, allow multiple values to be passed and forget the '=/&' syntax (which does not handle escaping anyway).
> Not sure how much this parameter is used. I would have used it to configure the output path for indexer-csv or indexer-dummy.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)