You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by mike c <mc...@gmail.com> on 2006/03/30 21:20:59 UTC

Using Nutch with Ferret (ruby)

Hi all,
I was wondering if anyone is using Nutch (for crawling) with Ferret
(indexing / searching).  Basically, my front-end is built using Ruby
on Rails that's why I'm asking.  I have the Nutch crawler up and
running fine, but can't seem to figure out how to integrate the two. 
Any help is appreciated.

Regards,
Mike

Re: [Nutch-general] Re: Using Nutch with Ferret (ruby)

Posted by Bruno Patini Furtado <bp...@gmail.com>.
Any easy link to the bug report of this utf8 lucene issue?

On 3/31/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
> On Mar 30, 2006, at 4:10 PM, mike c wrote:
> > Hi Erik,
> > Thanks for pointing this out - as I just got Ferret working with
> > indexes created using Nutch.  Any recommendations on how to address
> > this issue?
>
> This is a particularly insidious issue.  Java Lucene is not using
> pure UTF-8, whereas ports like Ferret are.  But changing Java Lucene
> is a big deal and does introduce a (slight) performance hit
> apparently.  The plan is for Java Lucene to be corrected in this
> regard at some point in the future, perhaps as soon as Lucene 2.0.
>
> But for now, I don't know of a way to address this issue.  I gave up
> on Ferret for the time being because of this incompatibility and am
> now prototyping with Solr while still using my custom XML-RPC search
> server for now.
>
>         Erik
>
>
>
> >
> > -Mike
> >
> > On 3/30/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> >> There is one incompatibility between Ferret and Java Lucene of note.
> >> It is the "UTF-8" issue that has surfaced with regards to Java
> >> Lucene.  All can be well between Java Lucene and Ferret, until
> >> characters in another range are indexed, and then Ferret will blow up
> >> trying to search the index.  Maybe this has been worked around in a
> >> more recent version of Ferret than I've tried?
> >>
> >>         Erik
> >>
> >>
> >> On Mar 30, 2006, at 2:50 PM, mike c wrote:
> >>
> >>> Thanks.  I'll try it out.  In the mean time, if I get Ferret working
> >>> I'll post an update.
> >>>
> >>> -Mike
> >>>
> >>> On 3/30/06, Steven Yelton <st...@missiondata.com> wrote:
> >>>> I use WEBrick instead of tomcat to query and serve search
> >>>> results.  I
> >>>> used ruby's 'rjb' to bridge the gap.
> >>>>
> >>>> http://raa.ruby-lang.org/project/rjb/
> >>>>
> >>>> There may be more direct ways (ruby<->lucene), but this was
> >>>> quick and
> >>>> easy and still has decent performance.
> >>>>
> >>>> Steven
> >>>>
> >>>> mike c wrote:
> >>>>
> >>>>> Hi all,
> >>>>> I was wondering if anyone is using Nutch (for crawling) with
> >>>>> Ferret
> >>>>> (indexing / searching).  Basically, my front-end is built using
> >>>>> Ruby
> >>>>> on Rails that's why I'm asking.  I have the Nutch crawler up and
> >>>>> running fine, but can't seem to figure out how to integrate the
> >>>>> two.
> >>>>> Any help is appreciated.
> >>>>>
> >>>>> Regards,
> >>>>> Mike
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>> -------------------------------------------------------
> >>> This SF.Net email is sponsored by xPML, a groundbreaking scripting
> >>> language
> >>> that extends applications into web and mobile media. Attend the
> >>> live webcast
> >>> and join the prime developer group breaking into this new coding
> >>> territory!
> >>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
> >>> _______________________________________________
> >>> Nutch-general mailing list
> >>> Nutch-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/nutch-general
> >>
> >>
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by xPML, a groundbreaking scripting
> > language
> > that extends applications into web and mobile media. Attend the
> > live webcast
> > and join the prime developer group breaking into this new coding
> > territory!
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
> > _______________________________________________
> > Nutch-general mailing list
> > Nutch-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nutch-general
>
>


--
"Minds are like parachutes, they work best when open."

Bruno Patini Furtado
Software Developer
webpage: http://bpfurtado.net
software development blog: http://bpfurtado.livejournal.com

Re: [Nutch-general] Re: Using Nutch with Ferret (ruby)

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Mar 30, 2006, at 4:10 PM, mike c wrote:
> Hi Erik,
> Thanks for pointing this out - as I just got Ferret working with
> indexes created using Nutch.  Any recommendations on how to address
> this issue?

This is a particularly insidious issue.  Java Lucene is not using  
pure UTF-8, whereas ports like Ferret are.  But changing Java Lucene  
is a big deal and does introduce a (slight) performance hit  
apparently.  The plan is for Java Lucene to be corrected in this  
regard at some point in the future, perhaps as soon as Lucene 2.0.

But for now, I don't know of a way to address this issue.  I gave up  
on Ferret for the time being because of this incompatibility and am  
now prototyping with Solr while still using my custom XML-RPC search  
server for now.

	Erik



>
> -Mike
>
> On 3/30/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>> There is one incompatibility between Ferret and Java Lucene of note.
>> It is the "UTF-8" issue that has surfaced with regards to Java
>> Lucene.  All can be well between Java Lucene and Ferret, until
>> characters in another range are indexed, and then Ferret will blow up
>> trying to search the index.  Maybe this has been worked around in a
>> more recent version of Ferret than I've tried?
>>
>>         Erik
>>
>>
>> On Mar 30, 2006, at 2:50 PM, mike c wrote:
>>
>>> Thanks.  I'll try it out.  In the mean time, if I get Ferret working
>>> I'll post an update.
>>>
>>> -Mike
>>>
>>> On 3/30/06, Steven Yelton <st...@missiondata.com> wrote:
>>>> I use WEBrick instead of tomcat to query and serve search  
>>>> results.  I
>>>> used ruby's 'rjb' to bridge the gap.
>>>>
>>>> http://raa.ruby-lang.org/project/rjb/
>>>>
>>>> There may be more direct ways (ruby<->lucene), but this was  
>>>> quick and
>>>> easy and still has decent performance.
>>>>
>>>> Steven
>>>>
>>>> mike c wrote:
>>>>
>>>>> Hi all,
>>>>> I was wondering if anyone is using Nutch (for crawling) with  
>>>>> Ferret
>>>>> (indexing / searching).  Basically, my front-end is built using  
>>>>> Ruby
>>>>> on Rails that's why I'm asking.  I have the Nutch crawler up and
>>>>> running fine, but can't seem to figure out how to integrate the  
>>>>> two.
>>>>> Any help is appreciated.
>>>>>
>>>>> Regards,
>>>>> Mike
>>>>>
>>>>>
>>>>
>>>
>>>
>>> -------------------------------------------------------
>>> This SF.Net email is sponsored by xPML, a groundbreaking scripting
>>> language
>>> that extends applications into web and mobile media. Attend the
>>> live webcast
>>> and join the prime developer group breaking into this new coding
>>> territory!
>>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
>>> _______________________________________________
>>> Nutch-general mailing list
>>> Nutch-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nutch-general
>>
>>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting  
> language
> that extends applications into web and mobile media. Attend the  
> live webcast
> and join the prime developer group breaking into this new coding  
> territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] Re: Using Nutch with Ferret (ruby)

Posted by mike c <mc...@gmail.com>.
Hi Erik,
Thanks for pointing this out - as I just got Ferret working with
indexes created using Nutch.  Any recommendations on how to address
this issue?

-Mike

On 3/30/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> There is one incompatibility between Ferret and Java Lucene of note.
> It is the "UTF-8" issue that has surfaced with regards to Java
> Lucene.  All can be well between Java Lucene and Ferret, until
> characters in another range are indexed, and then Ferret will blow up
> trying to search the index.  Maybe this has been worked around in a
> more recent version of Ferret than I've tried?
>
>         Erik
>
>
> On Mar 30, 2006, at 2:50 PM, mike c wrote:
>
> > Thanks.  I'll try it out.  In the mean time, if I get Ferret working
> > I'll post an update.
> >
> > -Mike
> >
> > On 3/30/06, Steven Yelton <st...@missiondata.com> wrote:
> >> I use WEBrick instead of tomcat to query and serve search results.  I
> >> used ruby's 'rjb' to bridge the gap.
> >>
> >> http://raa.ruby-lang.org/project/rjb/
> >>
> >> There may be more direct ways (ruby<->lucene), but this was quick and
> >> easy and still has decent performance.
> >>
> >> Steven
> >>
> >> mike c wrote:
> >>
> >>> Hi all,
> >>> I was wondering if anyone is using Nutch (for crawling) with Ferret
> >>> (indexing / searching).  Basically, my front-end is built using Ruby
> >>> on Rails that's why I'm asking.  I have the Nutch crawler up and
> >>> running fine, but can't seem to figure out how to integrate the two.
> >>> Any help is appreciated.
> >>>
> >>> Regards,
> >>> Mike
> >>>
> >>>
> >>
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by xPML, a groundbreaking scripting
> > language
> > that extends applications into web and mobile media. Attend the
> > live webcast
> > and join the prime developer group breaking into this new coding
> > territory!
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
> > _______________________________________________
> > Nutch-general mailing list
> > Nutch-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nutch-general
>
>

Re: [Nutch-general] Re: Using Nutch with Ferret (ruby)

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
There is one incompatibility between Ferret and Java Lucene of note.   
It is the "UTF-8" issue that has surfaced with regards to Java  
Lucene.  All can be well between Java Lucene and Ferret, until  
characters in another range are indexed, and then Ferret will blow up  
trying to search the index.  Maybe this has been worked around in a  
more recent version of Ferret than I've tried?

	Erik


On Mar 30, 2006, at 2:50 PM, mike c wrote:

> Thanks.  I'll try it out.  In the mean time, if I get Ferret working
> I'll post an update.
>
> -Mike
>
> On 3/30/06, Steven Yelton <st...@missiondata.com> wrote:
>> I use WEBrick instead of tomcat to query and serve search results.  I
>> used ruby's 'rjb' to bridge the gap.
>>
>> http://raa.ruby-lang.org/project/rjb/
>>
>> There may be more direct ways (ruby<->lucene), but this was quick and
>> easy and still has decent performance.
>>
>> Steven
>>
>> mike c wrote:
>>
>>> Hi all,
>>> I was wondering if anyone is using Nutch (for crawling) with Ferret
>>> (indexing / searching).  Basically, my front-end is built using Ruby
>>> on Rails that's why I'm asking.  I have the Nutch crawler up and
>>> running fine, but can't seem to figure out how to integrate the two.
>>> Any help is appreciated.
>>>
>>> Regards,
>>> Mike
>>>
>>>
>>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting  
> language
> that extends applications into web and mobile media. Attend the  
> live webcast
> and join the prime developer group breaking into this new coding  
> territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: Using Nutch with Ferret (ruby)

Posted by mike c <mc...@gmail.com>.
Thanks.  I'll try it out.  In the mean time, if I get Ferret working
I'll post an update.

-Mike

On 3/30/06, Steven Yelton <st...@missiondata.com> wrote:
> I use WEBrick instead of tomcat to query and serve search results.  I
> used ruby's 'rjb' to bridge the gap.
>
> http://raa.ruby-lang.org/project/rjb/
>
> There may be more direct ways (ruby<->lucene), but this was quick and
> easy and still has decent performance.
>
> Steven
>
> mike c wrote:
>
> >Hi all,
> >I was wondering if anyone is using Nutch (for crawling) with Ferret
> >(indexing / searching).  Basically, my front-end is built using Ruby
> >on Rails that's why I'm asking.  I have the Nutch crawler up and
> >running fine, but can't seem to figure out how to integrate the two.
> >Any help is appreciated.
> >
> >Regards,
> >Mike
> >
> >
>

Re: Using Nutch with Ferret (ruby)

Posted by Steven Yelton <st...@missiondata.com>.
I use WEBrick instead of tomcat to query and serve search results.  I 
used ruby's 'rjb' to bridge the gap.

http://raa.ruby-lang.org/project/rjb/

There may be more direct ways (ruby<->lucene), but this was quick and 
easy and still has decent performance.

Steven

mike c wrote:

>Hi all,
>I was wondering if anyone is using Nutch (for crawling) with Ferret
>(indexing / searching).  Basically, my front-end is built using Ruby
>on Rails that's why I'm asking.  I have the Nutch crawler up and
>running fine, but can't seem to figure out how to integrate the two. 
>Any help is appreciated.
>
>Regards,
>Mike
>  
>