You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Max Stricker <st...@gmail.com> on 2011/08/13 11:02:42 UTC

Multi-Value metadata missing in ParseResult

Hi,

I develop a notch plugin where my Parser extracts multi-value data and puts it in the Metadata which is then
added using ParseResult.put() method.
Debugging it I can see that my data is actually there, but in the Indexer all MultiValue data is gone.
How can this be? Should i file a bug report?
Removing the multi value attribute in the schema.xml gives back the data in the Indexer, but of course it is not
multivalue so only the last added one is present.

Any ideas?


Re: Multi-Value metadata missing in ParseResult

Posted by jasimop <st...@gmail.com>.
I add the fields also using a simple indexing filter plugin.
In the filter method of my HtmlParseFilter plugin implementation I add data extracted from the page
to the metadata using ParseResult.put method. 
Logging the metadata shows that it is added correctly.
However in the implementation of my IndexingFilter the metadata does not contain multi value fields, the other
fields I added are present.
So therefore in the indexer I cannot add the fields to the document and therefore add them to Solr.
What code between parsing and indexing can remove the multi-value fields from the metadata?




--
View this message in context: http://lucene.472066.n3.nabble.com/Multi-Value-metadata-missing-in-ParseResult-tp3251186p3256344.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Multi-Value metadata missing in ParseResult

Posted by Markus Jelsma <ma...@openindex.io>.
No this will not fix your issues but the indexchecker utility will show you 
what Nutch will actually index to Solr. With the patch it will also show multi 
valued fields, handy for debugging.

The question was, how do you add the fields? If i test using a simple indexing 
filter plugin i can easily add multi valued fields. With the indexchecker 
utility i can confirm it works.

> Markus Jelsma-2 wrote:
> > This should just work fine. Do you use an indexing filter to actually add
> > those new fields? I've tested in 1.4 using an indexing filter that adds
> > multiple values. Using the NUTCH-1082 patched indexing filters checker i
> > can
> > see the values as expected.
> 
> I currently use nutch 1.2, so I assume checking out
> http://svn.apache.org/repos/asf/nutch/branches/branch-1.4/, building it,
> then applying patch NUTCH-1082 will fix my issues. I will report.
> Is there a (simple) way to get it also to work in 1.2, as we have this
> version currently
> in production?
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multi-Value-metadata-missing-in-ParseRe
> sult-tp3251186p3256028.html Sent from the Nutch - User mailing list archive
> at Nabble.com.

Re: Multi-Value metadata missing in ParseResult

Posted by jasimop <st...@gmail.com>.
Markus Jelsma-2 wrote:
> 
> 
> This should just work fine. Do you use an indexing filter to actually add 
> those new fields? I've tested in 1.4 using an indexing filter that adds 
> multiple values. Using the NUTCH-1082 patched indexing filters checker i
> can 
> see the values as expected.
> 

I currently use nutch 1.2, so I assume checking out 
http://svn.apache.org/repos/asf/nutch/branches/branch-1.4/, building it,
then applying patch NUTCH-1082 will fix my issues. I will report.
Is there a (simple) way to get it also to work in 1.2, as we have this
version currently
in production?




--
View this message in context: http://lucene.472066.n3.nabble.com/Multi-Value-metadata-missing-in-ParseResult-tp3251186p3256028.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Multi-Value metadata missing in ParseResult

Posted by Markus Jelsma <ma...@openindex.io>.

On Saturday 13 August 2011 11:02:42 Max Stricker wrote:
> Hi,
> 
> I develop a notch plugin where my Parser extracts multi-value data and puts
> it in the Metadata which is then added using ParseResult.put() method.
> Debugging it I can see that my data is actually there, but in the Indexer
> all MultiValue data is gone. How can this be? Should i file a bug report?

This should just work fine. Do you use an indexing filter to actually add 
those new fields? I've tested in 1.4 using an indexing filter that adds 
multiple values. Using the NUTCH-1082 patched indexing filters checker i can 
see the values as expected.

> Removing the multi value attribute in the schema.xml gives back the data in
> the Indexer, but of course it is not multivalue so only the last added one
> is present.
> 
> Any ideas?

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350