You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "nutch.newbie (JIRA)" <ji...@apache.org> on 2007/01/02 12:03:27 UTC

[jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic

    [ http://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461710 ] 

nutch.newbie commented on NUTCH-422:
------------------------------------

I have got it to work.. but took me a while to properly index fields.. Its a rather complex plugin and definitely requires more documentation and example from a newbie prospective. I can see my indexed field using Luke. However I don't have the necessary query-plugin to do a search - find 'xyz' in filed 'author' meta data etc.. Any plans for query-extra plugin? where you define query items via query-extra-conf.xml or something similler?? 

Also the boost feature is important do you have any patch to solve known issue 2. 

Good work for getting a complex plugin to work not so complexly :-0)

> index-extra plugin creates additional fields in the index, based on configurable logic
> --------------------------------------------------------------------------------------
>
>                 Key: NUTCH-422
>                 URL: http://issues.apache.org/jira/browse/NUTCH-422
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.8.1
>         Environment: All environments
>            Reporter: Alan Tanaman
>         Attachments: index-extra-v1.0-bin-java1.5.zip, index-extra-v1.0-source.zip
>
>
> Extract from the Readme file:
> A.  Introduction
>     The index-extra plugin allows you to configure additional fields that you wish to be added to the index, based on one of the following sources:
>       - The parsed text
>       - Meta data fields
>       - Previously created document-to-be-indexed fields
>       - Plain constant string
>       - Java expression combining one or more of the above, and resolving to a string
>     A regex can also be applied to any of the above, allowing fields to be created based on patterns extracted from the source.
> B.  Installation
>     1)  Binaries only:  Copy the 'index-extra' folder within index-extra-v1.0-bin-java1.5.zip to NUTCHDIR/build
>                         Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure
>                         Enable the plugin by updating the nutch-site.xml file
>     2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'index-extra' folder within index-extra-v1.0-source.zip to NUTCHDIR/src/plugin
>                         Update the build.xml in NUTCHDIR/src/plugin to include plugin
>                         Update the NUTCHDIR/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure
>                         Enable the plugin by updating the nutch-site.xml file
> C.  Known Issues
>     1)  For this plugin to work correctly on any document field, it is necessary to run the other index filters
>     first, so that all basic document fields are generated first.  To do this, configure the indexingfilter.order
>     property.  (Please see patch NUTCH-421 to enable indexingfilter.order property. If this patch is not applied,
>     the plugin will still work, but will not be able to use document fields created by other index filter plugins.)
>     2)  At this stage, field boost can not be used as Nutch scoring overrides the field boost with its own
>     document-level boost calculation.  This occurs at the end of org.apache.nutch.indexer.Indexer's reduce method.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

RE: [jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic

Posted by Alan Tanaman <al...@idna-solutions.com>.
Many thanks for your feedback.

Do you have any specifics in mind regarding examples?  I will try and
include any additional ones that we implement.  I know there are a lot of
options, but it is a little hard to see what is unclear from my end -- as I
am so involved in the development, another point-of-view on this is welcome.
;)

Regarding query-extra, we are not currently using the Nutch bean, so the
need has not arisen for us at this point in time, but I can see how that
would be useful.  I guess you could adapt one of the existing query-xxxx
plugins fairly easily by having them read the xml configuration file to see
what fields are potentially available in the index.

As for the boost, I included that as it seems like a useful thing to be able
to control the boost of a single field, although we don't need that at this
very moment.  The line of code in the org.apache.nutch.indexer.Indexer's
reduce method could be overridden, but I'm not yet sure how that would
affect the overall scoring (scoring is one of my really weak points).
Perhaps one of the scoring experts could give some guidance on this?

Best regards,
Alan
_________________________
Alan Tanaman
iDNA Solutions

-----Original Message-----
From: nutch.newbie (JIRA) [mailto:jira@apache.org] 
Sent: 02 January 2007 11:03
To: nutch-dev@lucene.apache.org
Subject: [jira] Commented: (NUTCH-422) index-extra plugin creates additional
fields in the index, based on configurable logic


    [
http://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugi
n.system.issuetabpanels:comment-tabpanel#action_12461710 ] 

nutch.newbie commented on NUTCH-422:
------------------------------------

I have got it to work.. but took me a while to properly index fields.. Its a
rather complex plugin and definitely requires more documentation and example
from a newbie prospective. I can see my indexed field using Luke. However I
don't have the necessary query-plugin to do a search - find 'xyz' in filed
'author' meta data etc.. Any plans for query-extra plugin? where you define
query items via query-extra-conf.xml or something similler?? 

Also the boost feature is important do you have any patch to solve known
issue 2. 

Good work for getting a complex plugin to work not so complexly :-0)

> index-extra plugin creates additional fields in the index, based on
configurable logic
>
----------------------------------------------------------------------------
----------
>
>                 Key: NUTCH-422
>                 URL: http://issues.apache.org/jira/browse/NUTCH-422
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.8.1
>         Environment: All environments
>            Reporter: Alan Tanaman
>         Attachments: index-extra-v1.0-bin-java1.5.zip,
index-extra-v1.0-source.zip
>
>
> Extract from the Readme file:
> A.  Introduction
>     The index-extra plugin allows you to configure additional fields that
you wish to be added to the index, based on one of the following sources:
>       - The parsed text
>       - Meta data fields
>       - Previously created document-to-be-indexed fields
>       - Plain constant string
>       - Java expression combining one or more of the above, and resolving
to a string
>     A regex can also be applied to any of the above, allowing fields to be
created based on patterns extracted from the source.
> B.  Installation
>     1)  Binaries only:  Copy the 'index-extra' folder within
index-extra-v1.0-bin-java1.5.zip to NUTCHDIR/build
>                         Copy the 'index-extra-conf.xml' file to
NUTCHDIR/conf, and configure
>                         Enable the plugin by updating the nutch-site.xml
file
>     2)  Source code:    Always refer to the Nutch wiki for detailed
instructions on building Nutch.  In short:
>                         Copy the 'index-extra' folder within
index-extra-v1.0-source.zip to NUTCHDIR/src/plugin
>                         Update the build.xml in NUTCHDIR/src/plugin to
include plugin
>                         Update the NUTCHDIR/default.properties file to
include plugin
>                         run ant to build
>                         Copy the 'index-extra-conf.xml' file to
NUTCHDIR/conf, and configure
>                         Enable the plugin by updating the nutch-site.xml
file
> C.  Known Issues
>     1)  For this plugin to work correctly on any document field, it is
necessary to run the other index filters
>     first, so that all basic document fields are generated first.  To do
this, configure the indexingfilter.order
>     property.  (Please see patch NUTCH-421 to enable indexingfilter.order
property. If this patch is not applied,
>     the plugin will still work, but will not be able to use document
fields created by other index filter plugins.)
>     2)  At this stage, field boost can not be used as Nutch scoring
overrides the field boost with its own
>     document-level boost calculation.  This occurs at the end of
org.apache.nutch.indexer.Indexer's reduce method.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira