You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Timeka Cobb <co...@gmail.com> on 2018/10/01 01:38:42 UTC
Nutch integration with Solr
Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked at
the section to connect the 2 but have an extreme hard time understanding.
Can someone help me with connecting the 2..I want to crawl entire websites
and add a search engine to my site. Thank ya kindly ππ
Blessings,
Timeka Cobb
Re: Nutch integration with Solr
Posted by Timeka Cobb <co...@gmail.com>.
Also I totally agree Sir Sebastian they could be much more detailed so that
newbies like me can understand better
On Mon, Oct 1, 2018, 10:35 AM Sebastian Nagel
<wa...@googlemail.com.invalid> wrote:
> Hi Timeka,
>
> you mean the steps given in
> https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search ?
>
> The "nutch" core is defined only by a directory
> ${APACHE_SOLR_HOME}/server/solr/configsets/nutch
> which must contain the correct schema in the conf/ subfolder.
>
> Commands to setup Solr and copy the schema are given in the tutorial
> as Unix/Linux commands.
>
> Could you tell us what is confusing. Agreed, the description could be
> more detailed.
>
> Thanks,
> Sebastian
>
> On 10/01/2018 03:29 PM, Timeka Cobb wrote:
> > Thank you for the answer but I still think I'm missing things..on Wiki
> > where is says to install Solr I don't understand the directions given
> that
> > lead up to creating a nutch core..how do I copy resources and manage
> > schema,etc..the breakdown confuses me.. Thank you againππ
> >
> > Timeka
> >
> > On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
> > <wa...@googlemail.com.invalid> wrote:
> >
> >> Hi Timeka,
> >>
> >> well, the really short answer is: Nutch sends "documents" to Solr using
> >> the Solr4j client library. A "document" is a single web page fetched,
> >> parsed
> >> and split into indexable fields, e.g., "title", "keywords", "content".
> >>
> >> For further information you may look into
> >>
> >>
> >>
> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
> >>
> >> https://wiki.apache.org/nutch/IndexWriters
> >>
> >> https://wiki.apache.org/nutch/Presentations
> >> https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
> >>
> >> For the tiny details, you may need to inspect the Nutch source code
> >> directly.
> >>
> >> Best,
> >> Sebastian
> >>
> >> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> >>> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've
> looked
> >> at
> >>> the section to connect the 2 but have an extreme hard time
> understanding.
> >>> Can someone help me with connecting the 2..I want to crawl entire
> >> websites
> >>> and add a search engine to my site. Thank ya kindly ππ
> >>>
> >>> Blessings,
> >>> Timeka Cobb
> >>>
> >>
> >>
> >
>
>
Re: Nutch integration with Solr
Posted by Timeka Cobb <co...@gmail.com>.
I'm actually using Ubuntu to configure it all so that is not the issue
Example: the copy resources command- I'm already in the Solr home folder so
the command would be cp -r solr-7.4.0 /server/../../../ in terminal?
Where I see {APACHE_SOLR_HOME} or {APACHE_NUTCH_HOME} I'm suppose to say
solr-7.4.0 or nutch-1.15 in the command line in place of these?
I copy and paste what I see and I get a kickback..Im just trying to figure
out what are the proper commands to place in terminal to get both connected
and the core to run properly. Thanks again for all your help..it's greatly
appreciated π
Timeka
On Mon, Oct 1, 2018, 10:35 AM Sebastian Nagel
<wa...@googlemail.com.invalid> wrote:
> Hi Timeka,
>
> you mean the steps given in
> https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search ?
>
> The "nutch" core is defined only by a directory
> ${APACHE_SOLR_HOME}/server/solr/configsets/nutch
> which must contain the correct schema in the conf/ subfolder.
>
> Commands to setup Solr and copy the schema are given in the tutorial
> as Unix/Linux commands.
>
> Could you tell us what is confusing. Agreed, the description could be
> more detailed.
>
> Thanks,
> Sebastian
>
> On 10/01/2018 03:29 PM, Timeka Cobb wrote:
> > Thank you for the answer but I still think I'm missing things..on Wiki
> > where is says to install Solr I don't understand the directions given
> that
> > lead up to creating a nutch core..how do I copy resources and manage
> > schema,etc..the breakdown confuses me.. Thank you againππ
> >
> > Timeka
> >
> > On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
> > <wa...@googlemail.com.invalid> wrote:
> >
> >> Hi Timeka,
> >>
> >> well, the really short answer is: Nutch sends "documents" to Solr using
> >> the Solr4j client library. A "document" is a single web page fetched,
> >> parsed
> >> and split into indexable fields, e.g., "title", "keywords", "content".
> >>
> >> For further information you may look into
> >>
> >>
> >>
> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
> >>
> >> https://wiki.apache.org/nutch/IndexWriters
> >>
> >> https://wiki.apache.org/nutch/Presentations
> >> https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
> >>
> >> For the tiny details, you may need to inspect the Nutch source code
> >> directly.
> >>
> >> Best,
> >> Sebastian
> >>
> >> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> >>> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've
> looked
> >> at
> >>> the section to connect the 2 but have an extreme hard time
> understanding.
> >>> Can someone help me with connecting the 2..I want to crawl entire
> >> websites
> >>> and add a search engine to my site. Thank ya kindly ππ
> >>>
> >>> Blessings,
> >>> Timeka Cobb
> >>>
> >>
> >>
> >
>
>
Re: Nutch integration with Solr
Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi Timeka,
you mean the steps given in
https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search ?
The "nutch" core is defined only by a directory
${APACHE_SOLR_HOME}/server/solr/configsets/nutch
which must contain the correct schema in the conf/ subfolder.
Commands to setup Solr and copy the schema are given in the tutorial
as Unix/Linux commands.
Could you tell us what is confusing. Agreed, the description could be
more detailed.
Thanks,
Sebastian
On 10/01/2018 03:29 PM, Timeka Cobb wrote:
> Thank you for the answer but I still think I'm missing things..on Wiki
> where is says to install Solr I don't understand the directions given that
> lead up to creating a nutch core..how do I copy resources and manage
> schema,etc..the breakdown confuses me.. Thank you againππ
>
> Timeka
>
> On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
> <wa...@googlemail.com.invalid> wrote:
>
>> Hi Timeka,
>>
>> well, the really short answer is: Nutch sends "documents" to Solr using
>> the Solr4j client library. A "document" is a single web page fetched,
>> parsed
>> and split into indexable fields, e.g., "title", "keywords", "content".
>>
>> For further information you may look into
>>
>>
>> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
>>
>> https://wiki.apache.org/nutch/IndexWriters
>>
>> https://wiki.apache.org/nutch/Presentations
>> https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
>>
>> For the tiny details, you may need to inspect the Nutch source code
>> directly.
>>
>> Best,
>> Sebastian
>>
>> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
>>> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked
>> at
>>> the section to connect the 2 but have an extreme hard time understanding.
>>> Can someone help me with connecting the 2..I want to crawl entire
>> websites
>>> and add a search engine to my site. Thank ya kindly ππ
>>>
>>> Blessings,
>>> Timeka Cobb
>>>
>>
>>
>
Re: Nutch integration with Solr
Posted by Timeka Cobb <co...@gmail.com>.
Thank you for the answer but I still think I'm missing things..on Wiki
where is says to install Solr I don't understand the directions given that
lead up to creating a nutch core..how do I copy resources and manage
schema,etc..the breakdown confuses me.. Thank you againππ
Timeka
On Mon, Oct 1, 2018, 7:12 AM Sebastian Nagel
<wa...@googlemail.com.invalid> wrote:
> Hi Timeka,
>
> well, the really short answer is: Nutch sends "documents" to Solr using
> the Solr4j client library. A "document" is a single web page fetched,
> parsed
> and split into indexable fields, e.g., "title", "keywords", "content".
>
> For further information you may look into
>
>
> https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
>
> https://wiki.apache.org/nutch/IndexWriters
>
> https://wiki.apache.org/nutch/Presentations
> https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
>
> For the tiny details, you may need to inspect the Nutch source code
> directly.
>
> Best,
> Sebastian
>
> On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> > Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked
> at
> > the section to connect the 2 but have an extreme hard time understanding.
> > Can someone help me with connecting the 2..I want to crawl entire
> websites
> > and add a search engine to my site. Thank ya kindly ππ
> >
> > Blessings,
> > Timeka Cobb
> >
>
>
Re: Nutch integration with Solr
Posted by Sebastian Nagel <wa...@googlemail.com.INVALID>.
Hi Timeka,
well, the really short answer is: Nutch sends "documents" to Solr using
the Solr4j client library. A "document" is a single web page fetched, parsed
and split into indexable fields, e.g., "title", "keywords", "content".
For further information you may look into
https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
https://wiki.apache.org/nutch/IndexWriters
https://wiki.apache.org/nutch/Presentations
https://www.slideshare.net/search/slideshow?searchfrom=header&q=nutch
For the tiny details, you may need to inspect the Nutch source code directly.
Best,
Sebastian
On 10/01/2018 03:38 AM, Timeka Cobb wrote:
> Hello! I've installed Nutch 1.15 and Solr 7.4 very recently. I've looked at
> the section to connect the 2 but have an extreme hard time understanding.
> Can someone help me with connecting the 2..I want to crawl entire websites
> and add a search engine to my site. Thank ya kindly ππ
>
> Blessings,
> Timeka Cobb
>