You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2010/09/06 18:44:33 UTC

[jira] Created: (NUTCH-898) Multi valued subcollection is not multi valued

Multi valued subcollection is not multi valued
----------------------------------------------

                 Key: NUTCH-898
                 URL: https://issues.apache.org/jira/browse/NUTCH-898
             Project: Nutch
          Issue Type: Bug
          Components: indexer
         Environment: nutch-2010-07-07_04-49-04
            Reporter: Markus Jelsma
             Fix For: 1.2


NUTCH-716 concatenates multiple values in a single string instead of adding single values to a multi valued field. For a test crawl i have defined the following two subcollection definitions:

<subcollection>
<name>asdf</name>
<id>asdf-site</id>
<whitelist>http://asdf/</whitelist>
<blacklist/>
</subcollection>

<subcollection>
<name>news</name>
<id>asdf-news</id>
<whitelist>http://asdf/news/</whitelist>
<blacklist/>
</subcollection>

Reindexing the segments by sending them to Solr will yield the following results for a news URL:

<doc>
<arr name="subcollection">
<str>asdf</str>
</arr>
<str name="url">http://asdf/home/</str>
</doc>
<doc>
<arr name="subcollection">
<str>asdf news</str>
</arr>
<str name="url">http://asdf/news/</str>
</doc>

Instead, i expected the following result for the second document:

<doc>
<arr name="subcollection">
<str>asdf</str>
<str>news</str>
</arr>
<str name="url">http://asdf/news/</str>
</doc>

My Solr schema.xml has the following declaration for the subcollection field:

<field name="subcollection" type="string" stored="true" indexed="true" multiValued="true" />


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-898) Multi valued subcollection is not multi valued

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma closed NUTCH-898.
-------------------------------

    Resolution: Won't Fix

The old (only) nightly build i was using did allow multiple values but concatenated them. The current branch-1.2 already stored the values a multi valued field.

It was already fixed! 

> Multi valued subcollection is not multi valued
> ----------------------------------------------
>
>                 Key: NUTCH-898
>                 URL: https://issues.apache.org/jira/browse/NUTCH-898
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>         Environment: nutch-2010-07-07_04-49-04
>            Reporter: Markus Jelsma
>             Fix For: 1.2
>
>
> NUTCH-716 concatenates multiple values in a single string instead of adding single values to a multi valued field. For a test crawl i have defined the following two subcollection definitions:
> <subcollection>
> <name>asdf</name>
> <id>asdf-site</id>
> <whitelist>http://asdf/</whitelist>
> <blacklist/>
> </subcollection>
> <subcollection>
> <name>news</name>
> <id>asdf-news</id>
> <whitelist>http://asdf/news/</whitelist>
> <blacklist/>
> </subcollection>
> Reindexing the segments by sending them to Solr will yield the following results for a news URL:
> <doc>
> <arr name="subcollection">
> <str>asdf</str>
> </arr>
> <str name="url">http://asdf/home/</str>
> </doc>
> <doc>
> <arr name="subcollection">
> <str>asdf news</str>
> </arr>
> <str name="url">http://asdf/news/</str>
> </doc>
> Instead, i expected the following result for the second document:
> <doc>
> <arr name="subcollection">
> <str>asdf</str>
> <str>news</str>
> </arr>
> <str name="url">http://asdf/news/</str>
> </doc>
> My Solr schema.xml has the following declaration for the subcollection field:
> <field name="subcollection" type="string" stored="true" indexed="true" multiValued="true" />

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.