You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Timeka Cobb <co...@gmail.com> on 2018/10/01 01:38:42 UTC

Nutch integration with Solr

Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked at
the section to connect the 2 but have an extreme hard time understanding.
Can someone help me with connecting the 2..I want to crawl entire websites
and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—

Blessings,
Timeka Cobb

Re: Nutch integration with Solr

Posted by Timeka Cobb <co...@gmail.com>.
Also I totally agree Sir Sebastian they could be much more detailed so that
newbies like me can understand better

On Mon, Oct 1, 2018, 10:35 AM Sebastian Nagel
<wa...@googlemail.com.invalid> wrote:

> Hi Timeka,
>
> you mean the steps given in
>   https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search ?
>
> The "nutch" core is defined only by a directory
>    ${APACHE_SOLR_HOME}/server/solr/configsets/nutch
> which must contain the correct schema in the conf/ subfolder.
>
> Commands to setup Solr and copy the schema are given in the tutorial
> as Unix/Linux commands.
>
> Could you tell us what is confusing. Agreed, the description could be
> more detailed.
>
> Thanks,
> Sebastian
>
> On 10/01/2018 03:29 PM, Timeka Cobb wrote:
> > Thank you for the answer but I still think I'm missing things..on Wiki
> > where is says to install Solr I don't understand the directions given
> that
> > lead up to creating a nutch core..how do I copy resources and manage
> > schema,etc..the breakdown confuses me.. Thank you againπŸ˜ŠπŸ’œ
> >
> > Timeka
> >
> > On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
> > <wa...@googlemail.com.invalid> wrote:
> >
> >> Hi Timeka,
> >>
> >> well, the really short answer is: Nutch sends "documents" to Solr using
> >> the Solr4j client library. A "document" is a single web page fetched,
> >> parsed
> >> and split into indexable fields, e.g., "title", "keywords", "content".
> >>
> >> For further information you may look into
> >>
> >>
> >>
> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
> >>
> >>   https://wiki.apache.org/nutch/IndexWriters
> >>
> >>   https://wiki.apache.org/nutch/Presentations
> >>   https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
> >>
> >> For the tiny details, you may need to inspect the Nutch source code
> >> directly.
> >>
> >> Best,
> >> Sebastian
> >>
> >> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> >>> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've
> looked
> >> at
> >>> the section to connect the 2 but have an extreme hard time
> understanding.
> >>> Can someone help me with connecting the 2..I want to crawl entire
> >> websites
> >>> and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—
> >>>
> >>> Blessings,
> >>> Timeka Cobb
> >>>
> >>
> >>
> >
>
>

Re: Nutch integration with Solr

Posted by Timeka Cobb <co...@gmail.com>.
I'm actually using Ubuntu to configure it all so that is not the issue

Example: the copy resources command- I'm already in the Solr home folder so
the command would be cp -r solr-7.4.0 /server/../../../ in terminal?

Where I see {APACHE_SOLR_HOME} or {APACHE_NUTCH_HOME} I'm suppose to say
solr-7.4.0 or nutch-1.15 in the command line in place of these?

I copy and paste what I see and I get a kickback..Im just  trying to figure
out what are the proper commands to place in terminal to get both connected
and the core to run properly.  Thanks again for all your help..it's greatly
appreciated 😊

Timeka

On Mon, Oct 1, 2018, 10:35 AM Sebastian Nagel
<wa...@googlemail.com.invalid> wrote:

> Hi Timeka,
>
> you mean the steps given in
>   https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search ?
>
> The "nutch" core is defined only by a directory
>    ${APACHE_SOLR_HOME}/server/solr/configsets/nutch
> which must contain the correct schema in the conf/ subfolder.
>
> Commands to setup Solr and copy the schema are given in the tutorial
> as Unix/Linux commands.
>
> Could you tell us what is confusing. Agreed, the description could be
> more detailed.
>
> Thanks,
> Sebastian
>
> On 10/01/2018 03:29 PM, Timeka Cobb wrote:
> > Thank you for the answer but I still think I'm missing things..on Wiki
> > where is says to install Solr I don't understand the directions given
> that
> > lead up to creating a nutch core..how do I copy resources and manage
> > schema,etc..the breakdown confuses me.. Thank you againπŸ˜ŠπŸ’œ
> >
> > Timeka
> >
> > On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
> > <wa...@googlemail.com.invalid> wrote:
> >
> >> Hi Timeka,
> >>
> >> well, the really short answer is: Nutch sends "documents" to Solr using
> >> the Solr4j client library. A "document" is a single web page fetched,
> >> parsed
> >> and split into indexable fields, e.g., "title", "keywords", "content".
> >>
> >> For further information you may look into
> >>
> >>
> >>
> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
> >>
> >>   https://wiki.apache.org/nutch/IndexWriters
> >>
> >>   https://wiki.apache.org/nutch/Presentations
> >>   https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
> >>
> >> For the tiny details, you may need to inspect the Nutch source code
> >> directly.
> >>
> >> Best,
> >> Sebastian
> >>
> >> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> >>> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've
> looked
> >> at
> >>> the section to connect the 2 but have an extreme hard time
> understanding.
> >>> Can someone help me with connecting the 2..I want to crawl entire
> >> websites
> >>> and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—
> >>>
> >>> Blessings,
> >>> Timeka Cobb
> >>>
> >>
> >>
> >
>
>

Re: Nutch integration with Solr

Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi Timeka,

you mean the steps given in
  https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search ?

The "nutch" core is defined only by a directory
   ${APACHE_SOLR_HOME}/server/solr/configsets/nutch
which must contain the correct schema in the conf/ subfolder.

Commands to setup Solr and copy the schema are given in the tutorial
as Unix/Linux commands.

Could you tell us what is confusing. Agreed, the description could be
more detailed.

Thanks,
Sebastian

On 10/01/2018 03:29 PM, Timeka Cobb wrote:
> Thank you for the answer but I still think I'm missing things..on Wiki
> where is says to install Solr I don't understand the directions given that
> lead up to creating a nutch core..how do I copy resources and manage
> schema,etc..the breakdown confuses me.. Thank you againπŸ˜ŠπŸ’œ
> 
> Timeka
> 
> On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
> <wa...@googlemail.com.invalid> wrote:
> 
>> Hi Timeka,
>>
>> well, the really short answer is: Nutch sends "documents" to Solr using
>> the Solr4j client library. A "document" is a single web page fetched,
>> parsed
>> and split into indexable fields, e.g., "title", "keywords", "content".
>>
>> For further information you may look into
>>
>>
>> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
>>
>>   https://wiki.apache.org/nutch/IndexWriters
>>
>>   https://wiki.apache.org/nutch/Presentations
>>   https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
>>
>> For the tiny details, you may need to inspect the Nutch source code
>> directly.
>>
>> Best,
>> Sebastian
>>
>> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
>>> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked
>> at
>>> the section to connect the 2 but have an extreme hard time understanding.
>>> Can someone help me with connecting the 2..I want to crawl entire
>> websites
>>> and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—
>>>
>>> Blessings,
>>> Timeka Cobb
>>>
>>
>>
> 


Re: Nutch integration with Solr

Posted by Timeka Cobb <co...@gmail.com>.
Thank you for the answer but I still think I'm missing things..on Wiki
where is says to install Solr I don't understand the directions given that
lead up to creating a nutch core..how do I copy resources and manage
schema,etc..the breakdown confuses me.. Thank you againπŸ˜ŠπŸ’œ

Timeka

On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
<wa...@googlemail.com.invalid> wrote:

> Hi Timeka,
>
> well, the really short answer is: Nutch sends "documents" to Solr using
> the Solr4j client library. A "document" is a single web page fetched,
> parsed
> and split into indexable fields, e.g., "title", "keywords", "content".
>
> For further information you may look into
>
>
> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
>
>   https://wiki.apache.org/nutch/IndexWriters
>
>   https://wiki.apache.org/nutch/Presentations
>   https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
>
> For the tiny details, you may need to inspect the Nutch source code
> directly.
>
> Best,
> Sebastian
>
> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> > Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked
> at
> > the section to connect the 2 but have an extreme hard time understanding.
> > Can someone help me with connecting the 2..I want to crawl entire
> websites
> > and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—
> >
> > Blessings,
> > Timeka Cobb
> >
>
>

Re: Nutch integration with Solr

Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi Timeka,

well, the really short answer is: Nutch sends "documents" to Solr using
the Solr4j client library. A "document" is a single web page fetched, parsed
and split into indexable fields, e.g., "title", "keywords", "content".

For further information you may look into

  https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr

  https://wiki.apache.org/nutch/IndexWriters

  https://wiki.apache.org/nutch/Presentations
  https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch

For the tiny details, you may need to inspect the Nutch source code directly.

Best,
Sebastian

On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked at
> the section to connect the 2 but have an extreme hard time understanding.
> Can someone help me with connecting the 2..I want to crawl entire websites
> and add a search engine to my site. Thank ya kindly πŸ˜ŠπŸ’—
> 
> Blessings,
> Timeka Cobb
>