You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Felix Z. (JIRA)" <ji...@apache.org> on 2008/10/31 14:55:44 UTC
[jira] Issue Comment Edited: (NUTCH-442) Integrate Solr/Nutch

    [ https://issues.apache.org/jira/browse/NUTCH-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644288#action_12644288 ] 

felizimm edited comment on NUTCH-442 at 10/31/08 6:54 AM:
----------------------------------------------------------

Hi everybody,

1. in SOLR, the field "cache" is empty. If the NUTCH/SOLR-Integration does not provide this, how can I put the full (and not parsed) html content into my SOLR-Database? I use patch v.8.

2. Is it possible to index a single character, especially "§" (paragraf), with SOLR? Is it only a SOLR-thing or do I additionally have to change something in NUTCH-parser?

Thanks for help
Felix.

      was (Author: felizimm):
    Hi everybody,

in SOLR, the field "cache" is empty. If the NUTCH/SOLR-Integration does not provide this, how can I put the full (and not parsed) html content into my SOLR-Database? I use patch v.8.

Thanks! 
Felix.
  
> Integrate Solr/Nutch
> --------------------
>
>                 Key: NUTCH-442
>                 URL: https://issues.apache.org/jira/browse/NUTCH-442
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer, searcher
>         Environment: Ubuntu linux
>            Reporter: rubdabadub
>            Assignee: Doğacan Güney
>             Fix For: 1.0.0
>
>         Attachments: Crawl.patch, Indexer.patch, NUTCH-442_v4.patch, NUTCH-442_v5.patch, NUTCH-442_v6.patch.txt, NUTCH-442_v7.patch.txt, NUTCH-442_v7a.patch.txt, NUTCH-442_v8.patch, NUTCH_442_v3.patch, RFC_multiple_search_backends.patch, schema.xml
>
>
> Hi:
> After trying out Sami's patch regarding Solr/Nutch. Can be found here (http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html) and I can confirm it worked :-) And that lead me to request the following :
> I would be very very great full if this could be included in nutch 0.9 as I am trying to eliminate my python based crawler which post documents to solr. As I am in the corporate enviornment I can't install trunk version in the production enviornment thus I am asking this to be included in 0.9 release. I hope my wish would be granted.
> I look forward to get some feedback.
> Thank you.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.