You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rich d'Rich <ri...@gmail.com> on 2011/10/09 22:35:10 UTC

Re: solrdedup crashes if digest-field not compiled

>>Dedup will not work without digest field. Perhaps we can extend solrdedup
>>so
>>it skips all documents
>>with a digest field. Will that work for you?
>You mean skip all documents *without* a digest field?
>Yes, that would work.
>But wouldn't it be better for performance reasons to query only against
>documents with the field already compiled?

I'm getting this issue as well - we've got a heterogenous SOLR index with
various sources apart from Nutch, and the lack of a digest field crashes
dedup when it hits a non-Nutch doc, as described by Matthias.

Is there an issue logged for this? I might be making a patch just to keep us
going.

--
Richard

Re: solrdedup crashes if digest-field not compiled

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi Richard,

Yes is the simple answer. We are aware for some time that Dedup is broken in
nutchgora [1], however Markus also reported an issue with current trunk
development [2]. Can you please review and comment if you can reproduce, or
alternatively browse though out indexer issues [3] and comment accordingly.
A patch would be excellent by any means. Thank you

[1] https://issues.apache.org/jira/browse/NUTCH-992
[2] https://issues.apache.org/jira/browse/NUTCH-1100
[3]
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+NUTCH+AND+resolution+%3D+Unresolved+AND+component+%3D+indexer+ORDER+BY+priority+DESC&mode=hide

On Sun, Oct 9, 2011 at 9:35 PM, Rich d'Rich <ri...@gmail.com> wrote:

> >>Dedup will not work without digest field. Perhaps we can extend solrdedup
> >>so
> >>it skips all documents
> >>with a digest field. Will that work for you?
> >You mean skip all documents *without* a digest field?
> >Yes, that would work.
> >But wouldn't it be better for performance reasons to query only against
> >documents with the field already compiled?
>
> I'm getting this issue as well - we've got a heterogenous SOLR index with
> various sources apart from Nutch, and the lack of a digest field crashes
> dedup when it hits a non-Nutch doc, as described by Matthias.
>
> Is there an issue logged for this? I might be making a patch just to keep
> us
> going.
>
> --
> Richard
>



-- 
*Lewis*