You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by John Mitchell <jm...@collabralink.com> on 2016/03/15 23:56:21 UTC

I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0

Hi,

I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0.

I have Solr correctly setup via "bin/solr Start -c cloud -noprompt" and I
have even crawled data with Norconex web crawler and been able to
successfully commit this crawled data into Solr but I want to see if I can
commit Apache Nutch crawled data into Solr.

I tried the tutorial Integrate Solr with Nutchat
https://wiki.apache.org/nutch/NutchTutorial#Integrate_Solr_with_Nutch but
the location and files referred to don't match my Solr 5.3.0 setup.


Thanks,

John Mitchell

RE: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0

Posted by Markus Jelsma <ma...@openindex.io>.
No worries. Solr 5.5.0 works fine. But you should keep an eye on 6.0.0 which is about to be released this month.
Markus
 
 
-----Original message-----
> From:Victor D'agostino <vi...@fiducial.net>
> Sent: Thursday 17th March 2016 10:50
> To: user@nutch.apache.org
> Cc: Markus Jelsma <ma...@openindex.io>
> Subject: Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0
> 
> Last question (sorry for parasiting this topic) :
> Would you recommend using Solr 5.5.0 or stay with an older version of 
> Solr 5.4.x ?
> 
> Victor
> 
> -------- Message original --------
> *Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler 
> with Solr 5.3.0
> *De : *Victor D'agostino <vi...@fiducial.net>
> *Pour : *user@nutch.apache.org
> *Copie à : *Markus Jelsma <ma...@openindex.io>
> *Date : *17/03/2016 10:47
> > Hi
> >
> > Thanks a lot I will do that.
> >
> >
> >
> > -------- Message original --------
> > *Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler 
> > with Solr 5.3.0
> > *De : *Markus Jelsma <ma...@openindex.io>
> > *Pour : *user@nutch.apache.org <us...@nutch.apache.org>
> > *Date : *17/03/2016 10:45
> >> But you can patch 1.11 with NUTCH-2197
> >>
> >>
> >>     -----Original message-----
> >>> From:Victor D'agostino <vi...@fiducial.net>
> >>> Sent: Thursday 17th March 2016 10:29
> >>> To: user@nutch.apache.org
> >>> Cc: Markus Jelsma <ma...@openindex.io>
> >>> Subject: Re: I am having trouble connecting the Nutch 1.10 web 
> >>> crawler with Solr 5.3.0
> >>>
> >>> Hi
> >>>
> >>> Does someone know if Nutch 1.12 will be released soon in a few days or
> >>> months later ?
> >>>
> >>> Regards
> >>>
> >>> Victor
> >>>
> >>>
> >>> -------- Message original --------
> >>> *Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler
> >>> with Solr 5.3.0
> >>> *De : *Markus Jelsma <ma...@openindex.io>
> >>> *Pour : *user@nutch.apache.org <us...@nutch.apache.org>
> >>> *Date : *16/03/2016 10:40
> >>>> Sure it works, you can always just index to a single server, Solr 
> >>>> will then distribute the documents itself, this works for both 
> >>>> Nutch'. Nutch 1.12 will have full support for Solr 5.x.
> >>>> Markus
> >>>>
> >>>> -----Original message-----
> >>>>> From:Luis Magaña <lu...@euphorica.com>
> >>>>> Sent: Wednesday 16th March 2016 0:13
> >>>>> To: user@nutch.apache.org
> >>>>> Subject: Re: I am having trouble connecting the Nutch 1.10 web 
> >>>>> crawler with Solr 5.3.0
> >>>>>
> >>>>> Solr 5 is not proven/compatible with Nutch AFAIK, try lowering to 
> >>>>> Solr 4
> >>>>> which I know works for Nutch 2.x not sure about 1.x.
> >>>>>
> >>>>> Cheers.
> >>>>>
> >>>>> On 03/15/2016 04:56 PM, John Mitchell wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I am having trouble connecting the Nutch 1.10 web crawler with 
> >>>>>> Solr 5.30.
> >>>>>>
> >>>>>> I have Solr correctly setup via "bin/solr Start -c cloud 
> >>>>>> -noprompt" and I
> >>>>>> have even crawled data with Norconex web crawler and been able to
> >>>>>> successfully commit this crawled data into Solr but I want to see 
> >>>>>> if I can
> >>>>>> commit Apache Nutch crawled data into Solr.
> >>>>>>
> >>>>>> I tried the tutorial Integrate Solr with Nutchat
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_nutch_NutchTutorial-23Integrate-5FSolr-5Fwith-5FNutch&d=CwIFaQ&c=1tDFxPZjcWEmlmmx4CZtyA&r=GIbD6pb1nH9ZrxFDfhl_c8kJe7NkpbmXG1YHXBYFth8&m=rbmCzi_8Yo6YX3uk4ChFYooBay4lxdW2ng0w4w1OrS4&s=ftqWN9drLeQsIb-kqHTKCTMz_M4VhK9_bMpm2ZOgIPg&e= 
> >>>>>> but
> >>>>>> the location and files referred to don't match my Solr 5.3.0 setup.
> >>>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> John Mitchell
> >>>>>>
> >>>>> -- 
> >>>>> Luis Magaña
> >>>>> www.euphorica.com
> >>>>>
> >>>>>
> >>> 
> 
> 
> 
> ________________
> Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 

RE: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0

Posted by Markus Jelsma <ma...@openindex.io>.
Hi - not within a few days it won't and probably not within the next two months.
Markus

 
 
-----Original message-----
> From:Victor D'agostino <vi...@fiducial.net>
> Sent: Thursday 17th March 2016 10:29
> To: user@nutch.apache.org
> Cc: Markus Jelsma <ma...@openindex.io>
> Subject: Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0
> 
> Hi
> 
> Does someone know if Nutch 1.12 will be released soon in a few days or 
> months later ?
> 
> Regards
> 
> Victor
> 
> 
> -------- Message original --------
> *Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler 
> with Solr 5.3.0
> *De : *Markus Jelsma <ma...@openindex.io>
> *Pour : *user@nutch.apache.org <us...@nutch.apache.org>
> *Date : *16/03/2016 10:40
> > Sure it works, you can always just index to a single server, Solr will then distribute the documents itself, this works for both Nutch'. Nutch 1.12 will have full support for Solr 5.x.
> > Markus
> >
> > -----Original message-----
> >> From:Luis Magaña <lu...@euphorica.com>
> >> Sent: Wednesday 16th March 2016 0:13
> >> To: user@nutch.apache.org
> >> Subject: Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0
> >>
> >> Solr 5 is not proven/compatible with Nutch AFAIK, try lowering to Solr 4
> >> which I know works for Nutch 2.x not sure about 1.x.
> >>
> >> Cheers.
> >>
> >> On 03/15/2016 04:56 PM, John Mitchell wrote:
> >>> Hi,
> >>>
> >>> I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.30.
> >>>
> >>> I have Solr correctly setup via "bin/solr Start -c cloud -noprompt" and I
> >>> have even crawled data with Norconex web crawler and been able to
> >>> successfully commit this crawled data into Solr but I want to see if I can
> >>> commit Apache Nutch crawled data into Solr.
> >>>
> >>> I tried the tutorial Integrate Solr with Nutchat
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_nutch_NutchTutorial-23Integrate-5FSolr-5Fwith-5FNutch&d=CwIFaQ&c=1tDFxPZjcWEmlmmx4CZtyA&r=GIbD6pb1nH9ZrxFDfhl_c8kJe7NkpbmXG1YHXBYFth8&m=rbmCzi_8Yo6YX3uk4ChFYooBay4lxdW2ng0w4w1OrS4&s=ftqWN9drLeQsIb-kqHTKCTMz_M4VhK9_bMpm2ZOgIPg&e=  but
> >>> the location and files referred to don't match my Solr 5.3.0 setup.
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> John Mitchell
> >>>
> >> -- 
> >> Luis Magaña
> >> www.euphorica.com
> >>
> >>
> >   
> 
> 
> 
> ________________
> Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 

Re: Is nutch suitable with postgresql as datasource

Posted by Binoy Dalal <bi...@gmail.com>.
Nutch is a web crawler.

Just use the DIH that comes with solr. It's really easy to setup and use.
Check here:
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler

How did you get this idea of crawling a db with nutch anyhow?

On Thu, 17 Mar 2016, 17:18 Victor D'agostino, <
victor.d.agostino@fiducial.net> wrote:

> Hi guys
>
> I have a postgresql database which contains the data I would like to
> crawl in Solr.
>
> I couldn't find any postgresql site configuration file.
>
> Is nutch suitable with postgresql as datasource or does it only crawl
> websites ?
>
> Best regards
> Victor
>
> 
> ________________
> Ce message et les éventuels documents joints peuvent contenir des
> informations confidentielles. Au cas où il ne vous serait pas destiné, nous
> vous remercions de bien vouloir le supprimer et en aviser immédiatement
> l'expéditeur. Toute utilisation de ce message non conforme à sa
> destination, toute diffusion ou publication, totale ou partielle et quel
> qu'en soit le moyen est formellement interdite. Les communications sur
> internet n'étant pas sécurisées, l'intégrité de ce message n'est pas
> assurée et la société émettrice ne peut être tenue pour responsable de son
> contenu.
>
-- 
Regards,
Binoy Dalal

Re: Is nutch suitable with postgresql as datasource

Posted by Victor D'agostino <vi...@fiducial.net>.
Damn !

We are curently using DIH at my company and I am building a Solr 5 
architecture with 3 Solr nodes and 1 zookeeper.

We need real time indexing and multi-node indexing because we have more 
then 100 GB of new data per day.

Do you know if a "postgresql crawler" exists ?

- Victor



-------- Message original --------
*Sujet: *Re: Is nutch suitable with postgresql as datasource
*De : *Markus Jelsma <ma...@openindex.io>
*Pour : *user@nutch.apache.org <us...@nutch.apache.org>
*Date : *17/03/2016 12:54
> Hi - no, Nutch cannot do that. But Solr has a data import handler, it should read data from Postgresql fine.
> Markus
>
>   
>   
> -----Original message-----
>> From:Victor D'agostino <vi...@fiducial.net>
>> Sent: Thursday 17th March 2016 12:48
>> To: user@nutch.apache.org
>> Subject: Is nutch suitable with postgresql as datasource
>>
>> Hi guys
>>
>> I have a postgresql database which contains the data I would like to
>> crawl in Solr.
>>
>> I couldn't find any postgresql site configuration file.
>>
>> Is nutch suitable with postgresql as datasource or does it only crawl
>> websites ?
>>
>> Best regards
>> Victor
>>
>> 



________________
Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 

RE: Is nutch suitable with postgresql as datasource

Posted by Markus Jelsma <ma...@openindex.io>.
Hi - no, Nutch cannot do that. But Solr has a data import handler, it should read data from Postgresql fine.
Markus

 
 
-----Original message-----
> From:Victor D'agostino <vi...@fiducial.net>
> Sent: Thursday 17th March 2016 12:48
> To: user@nutch.apache.org
> Subject: Is nutch suitable with postgresql as datasource
> 
> Hi guys
> 
> I have a postgresql database which contains the data I would like to 
> crawl in Solr.
> 
> I couldn't find any postgresql site configuration file.
> 
> Is nutch suitable with postgresql as datasource or does it only crawl 
> websites ?
> 
> Best regards
> Victor
> 
> 
> ________________
> Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 
> 

Is nutch suitable with postgresql as datasource

Posted by Victor D'agostino <vi...@fiducial.net>.
Hi guys

I have a postgresql database which contains the data I would like to 
crawl in Solr.

I couldn't find any postgresql site configuration file.

Is nutch suitable with postgresql as datasource or does it only crawl 
websites ?

Best regards
Victor


________________
Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 

Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0

Posted by Victor D'agostino <vi...@fiducial.net>.
Last question (sorry for parasiting this topic) :
Would you recommend using Solr 5.5.0 or stay with an older version of 
Solr 5.4.x ?

Victor

-------- Message original --------
*Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler 
with Solr 5.3.0
*De : *Victor D'agostino <vi...@fiducial.net>
*Pour : *user@nutch.apache.org
*Copie à : *Markus Jelsma <ma...@openindex.io>
*Date : *17/03/2016 10:47
> Hi
>
> Thanks a lot I will do that.
>
>
>
> -------- Message original --------
> *Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler 
> with Solr 5.3.0
> *De : *Markus Jelsma <ma...@openindex.io>
> *Pour : *user@nutch.apache.org <us...@nutch.apache.org>
> *Date : *17/03/2016 10:45
>> But you can patch 1.11 with NUTCH-2197
>>
>>
>>     -----Original message-----
>>> From:Victor D'agostino <vi...@fiducial.net>
>>> Sent: Thursday 17th March 2016 10:29
>>> To: user@nutch.apache.org
>>> Cc: Markus Jelsma <ma...@openindex.io>
>>> Subject: Re: I am having trouble connecting the Nutch 1.10 web 
>>> crawler with Solr 5.3.0
>>>
>>> Hi
>>>
>>> Does someone know if Nutch 1.12 will be released soon in a few days or
>>> months later ?
>>>
>>> Regards
>>>
>>> Victor
>>>
>>>
>>> -------- Message original --------
>>> *Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler
>>> with Solr 5.3.0
>>> *De : *Markus Jelsma <ma...@openindex.io>
>>> *Pour : *user@nutch.apache.org <us...@nutch.apache.org>
>>> *Date : *16/03/2016 10:40
>>>> Sure it works, you can always just index to a single server, Solr 
>>>> will then distribute the documents itself, this works for both 
>>>> Nutch'. Nutch 1.12 will have full support for Solr 5.x.
>>>> Markus
>>>>
>>>> -----Original message-----
>>>>> From:Luis Magaña <lu...@euphorica.com>
>>>>> Sent: Wednesday 16th March 2016 0:13
>>>>> To: user@nutch.apache.org
>>>>> Subject: Re: I am having trouble connecting the Nutch 1.10 web 
>>>>> crawler with Solr 5.3.0
>>>>>
>>>>> Solr 5 is not proven/compatible with Nutch AFAIK, try lowering to 
>>>>> Solr 4
>>>>> which I know works for Nutch 2.x not sure about 1.x.
>>>>>
>>>>> Cheers.
>>>>>
>>>>> On 03/15/2016 04:56 PM, John Mitchell wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am having trouble connecting the Nutch 1.10 web crawler with 
>>>>>> Solr 5.30.
>>>>>>
>>>>>> I have Solr correctly setup via "bin/solr Start -c cloud 
>>>>>> -noprompt" and I
>>>>>> have even crawled data with Norconex web crawler and been able to
>>>>>> successfully commit this crawled data into Solr but I want to see 
>>>>>> if I can
>>>>>> commit Apache Nutch crawled data into Solr.
>>>>>>
>>>>>> I tried the tutorial Integrate Solr with Nutchat
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_nutch_NutchTutorial-23Integrate-5FSolr-5Fwith-5FNutch&d=CwIFaQ&c=1tDFxPZjcWEmlmmx4CZtyA&r=GIbD6pb1nH9ZrxFDfhl_c8kJe7NkpbmXG1YHXBYFth8&m=rbmCzi_8Yo6YX3uk4ChFYooBay4lxdW2ng0w4w1OrS4&s=ftqWN9drLeQsIb-kqHTKCTMz_M4VhK9_bMpm2ZOgIPg&e= 
>>>>>> but
>>>>>> the location and files referred to don't match my Solr 5.3.0 setup.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> John Mitchell
>>>>>>
>>>>> -- 
>>>>> Luis Magaña
>>>>> www.euphorica.com
>>>>>
>>>>>
>>> 



________________
Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 

Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0

Posted by Victor D'agostino <vi...@fiducial.net>.
Hi

Thanks a lot I will do that.



-------- Message original --------
*Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler 
with Solr 5.3.0
*De : *Markus Jelsma <ma...@openindex.io>
*Pour : *user@nutch.apache.org <us...@nutch.apache.org>
*Date : *17/03/2016 10:45
> But you can patch 1.11 with NUTCH-2197
>
>
>   
>   
> -----Original message-----
>> From:Victor D'agostino <vi...@fiducial.net>
>> Sent: Thursday 17th March 2016 10:29
>> To: user@nutch.apache.org
>> Cc: Markus Jelsma <ma...@openindex.io>
>> Subject: Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0
>>
>> Hi
>>
>> Does someone know if Nutch 1.12 will be released soon in a few days or
>> months later ?
>>
>> Regards
>>
>> Victor
>>
>>
>> -------- Message original --------
>> *Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler
>> with Solr 5.3.0
>> *De : *Markus Jelsma <ma...@openindex.io>
>> *Pour : *user@nutch.apache.org <us...@nutch.apache.org>
>> *Date : *16/03/2016 10:40
>>> Sure it works, you can always just index to a single server, Solr will then distribute the documents itself, this works for both Nutch'. Nutch 1.12 will have full support for Solr 5.x.
>>> Markus
>>>
>>> -----Original message-----
>>>> From:Luis Magaña <lu...@euphorica.com>
>>>> Sent: Wednesday 16th March 2016 0:13
>>>> To: user@nutch.apache.org
>>>> Subject: Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0
>>>>
>>>> Solr 5 is not proven/compatible with Nutch AFAIK, try lowering to Solr 4
>>>> which I know works for Nutch 2.x not sure about 1.x.
>>>>
>>>> Cheers.
>>>>
>>>> On 03/15/2016 04:56 PM, John Mitchell wrote:
>>>>> Hi,
>>>>>
>>>>> I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.30.
>>>>>
>>>>> I have Solr correctly setup via "bin/solr Start -c cloud -noprompt" and I
>>>>> have even crawled data with Norconex web crawler and been able to
>>>>> successfully commit this crawled data into Solr but I want to see if I can
>>>>> commit Apache Nutch crawled data into Solr.
>>>>>
>>>>> I tried the tutorial Integrate Solr with Nutchat
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_nutch_NutchTutorial-23Integrate-5FSolr-5Fwith-5FNutch&d=CwIFaQ&c=1tDFxPZjcWEmlmmx4CZtyA&r=GIbD6pb1nH9ZrxFDfhl_c8kJe7NkpbmXG1YHXBYFth8&m=rbmCzi_8Yo6YX3uk4ChFYooBay4lxdW2ng0w4w1OrS4&s=ftqWN9drLeQsIb-kqHTKCTMz_M4VhK9_bMpm2ZOgIPg&e=  but
>>>>> the location and files referred to don't match my Solr 5.3.0 setup.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John Mitchell
>>>>>
>>>> -- 
>>>> Luis Magaña
>>>> www.euphorica.com
>>>>
>>>>
>>>    
>> 
>> ________________
>> Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu.



________________
Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 

RE: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0

Posted by Markus Jelsma <ma...@openindex.io>.
But you can patch 1.11 with NUTCH-2197
https://issues.apache.org/jira/browse/NUTCH-2197

 
 
-----Original message-----
> From:Victor D'agostino <vi...@fiducial.net>
> Sent: Thursday 17th March 2016 10:29
> To: user@nutch.apache.org
> Cc: Markus Jelsma <ma...@openindex.io>
> Subject: Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0
> 
> Hi
> 
> Does someone know if Nutch 1.12 will be released soon in a few days or 
> months later ?
> 
> Regards
> 
> Victor
> 
> 
> -------- Message original --------
> *Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler 
> with Solr 5.3.0
> *De : *Markus Jelsma <ma...@openindex.io>
> *Pour : *user@nutch.apache.org <us...@nutch.apache.org>
> *Date : *16/03/2016 10:40
> > Sure it works, you can always just index to a single server, Solr will then distribute the documents itself, this works for both Nutch'. Nutch 1.12 will have full support for Solr 5.x.
> > Markus
> >
> > -----Original message-----
> >> From:Luis Magaña <lu...@euphorica.com>
> >> Sent: Wednesday 16th March 2016 0:13
> >> To: user@nutch.apache.org
> >> Subject: Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0
> >>
> >> Solr 5 is not proven/compatible with Nutch AFAIK, try lowering to Solr 4
> >> which I know works for Nutch 2.x not sure about 1.x.
> >>
> >> Cheers.
> >>
> >> On 03/15/2016 04:56 PM, John Mitchell wrote:
> >>> Hi,
> >>>
> >>> I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.30.
> >>>
> >>> I have Solr correctly setup via "bin/solr Start -c cloud -noprompt" and I
> >>> have even crawled data with Norconex web crawler and been able to
> >>> successfully commit this crawled data into Solr but I want to see if I can
> >>> commit Apache Nutch crawled data into Solr.
> >>>
> >>> I tried the tutorial Integrate Solr with Nutchat
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_nutch_NutchTutorial-23Integrate-5FSolr-5Fwith-5FNutch&d=CwIFaQ&c=1tDFxPZjcWEmlmmx4CZtyA&r=GIbD6pb1nH9ZrxFDfhl_c8kJe7NkpbmXG1YHXBYFth8&m=rbmCzi_8Yo6YX3uk4ChFYooBay4lxdW2ng0w4w1OrS4&s=ftqWN9drLeQsIb-kqHTKCTMz_M4VhK9_bMpm2ZOgIPg&e=  but
> >>> the location and files referred to don't match my Solr 5.3.0 setup.
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> John Mitchell
> >>>
> >> -- 
> >> Luis Magaña
> >> www.euphorica.com
> >>
> >>
> >   
> 
> 
> 
> ________________
> Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 

Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0

Posted by Victor D'agostino <vi...@fiducial.net>.
Hi

Does someone know if Nutch 1.12 will be released soon in a few days or 
months later ?

Regards

Victor


-------- Message original --------
*Sujet: *Re: I am having trouble connecting the Nutch 1.10 web crawler 
with Solr 5.3.0
*De : *Markus Jelsma <ma...@openindex.io>
*Pour : *user@nutch.apache.org <us...@nutch.apache.org>
*Date : *16/03/2016 10:40
> Sure it works, you can always just index to a single server, Solr will then distribute the documents itself, this works for both Nutch'. Nutch 1.12 will have full support for Solr 5.x.
> Markus
>
> -----Original message-----
>> From:Luis Magaña <lu...@euphorica.com>
>> Sent: Wednesday 16th March 2016 0:13
>> To: user@nutch.apache.org
>> Subject: Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0
>>
>> Solr 5 is not proven/compatible with Nutch AFAIK, try lowering to Solr 4
>> which I know works for Nutch 2.x not sure about 1.x.
>>
>> Cheers.
>>
>> On 03/15/2016 04:56 PM, John Mitchell wrote:
>>> Hi,
>>>
>>> I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.30.
>>>
>>> I have Solr correctly setup via "bin/solr Start -c cloud -noprompt" and I
>>> have even crawled data with Norconex web crawler and been able to
>>> successfully commit this crawled data into Solr but I want to see if I can
>>> commit Apache Nutch crawled data into Solr.
>>>
>>> I tried the tutorial Integrate Solr with Nutchat
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_nutch_NutchTutorial-23Integrate-5FSolr-5Fwith-5FNutch&d=CwIFaQ&c=1tDFxPZjcWEmlmmx4CZtyA&r=GIbD6pb1nH9ZrxFDfhl_c8kJe7NkpbmXG1YHXBYFth8&m=rbmCzi_8Yo6YX3uk4ChFYooBay4lxdW2ng0w4w1OrS4&s=ftqWN9drLeQsIb-kqHTKCTMz_M4VhK9_bMpm2ZOgIPg&e=  but
>>> the location and files referred to don't match my Solr 5.3.0 setup.
>>>
>>>
>>> Thanks,
>>>
>>> John Mitchell
>>>
>> -- 
>> Luis Magaña
>> www.euphorica.com
>>
>>
>   



________________
Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite. Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. 

RE: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0

Posted by Markus Jelsma <ma...@openindex.io>.
Sure it works, you can always just index to a single server, Solr will then distribute the documents itself, this works for both Nutch'. Nutch 1.12 will have full support for Solr 5.x.
Markus

-----Original message-----
> From:Luis Magaña <lu...@euphorica.com>
> Sent: Wednesday 16th March 2016 0:13
> To: user@nutch.apache.org
> Subject: Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0
> 
> Solr 5 is not proven/compatible with Nutch AFAIK, try lowering to Solr 4
> which I know works for Nutch 2.x not sure about 1.x.
> 
> Cheers.
> 
> On 03/15/2016 04:56 PM, John Mitchell wrote:
> > Hi,
> >
> > I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0.
> >
> > I have Solr correctly setup via "bin/solr Start -c cloud -noprompt" and I
> > have even crawled data with Norconex web crawler and been able to
> > successfully commit this crawled data into Solr but I want to see if I can
> > commit Apache Nutch crawled data into Solr.
> >
> > I tried the tutorial Integrate Solr with Nutchat
> > https://wiki.apache.org/nutch/NutchTutorial#Integrate_Solr_with_Nutch but
> > the location and files referred to don't match my Solr 5.3.0 setup.
> >
> >
> > Thanks,
> >
> > John Mitchell
> >
> 
> -- 
> Luis Magaña
> www.euphorica.com
> 
> 

Re: I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0

Posted by Luis Magaña <lu...@euphorica.com>.
Solr 5 is not proven/compatible with Nutch AFAIK, try lowering to Solr 4
which I know works for Nutch 2.x not sure about 1.x.

Cheers.

On 03/15/2016 04:56 PM, John Mitchell wrote:
> Hi,
>
> I am having trouble connecting the Nutch 1.10 web crawler with Solr 5.3.0.
>
> I have Solr correctly setup via "bin/solr Start -c cloud -noprompt" and I
> have even crawled data with Norconex web crawler and been able to
> successfully commit this crawled data into Solr but I want to see if I can
> commit Apache Nutch crawled data into Solr.
>
> I tried the tutorial Integrate Solr with Nutchat
> https://wiki.apache.org/nutch/NutchTutorial#Integrate_Solr_with_Nutch but
> the location and files referred to don't match my Solr 5.3.0 setup.
>
>
> Thanks,
>
> John Mitchell
>

-- 
Luis Magaña
www.euphorica.com