You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Marek Bachmann <m....@uni-kassel.de> on 2011/07/11 15:46:37 UTC
The solrindex command
Hello there,
where can I find informations about the solr document structure which
the solrindex command sends to solr for indexing?
As far as I know, you add data to the solr index by sending a document
with specific fields to the engine.
I would like to know how nutch creates these documents and which fields
these documents contain.
In other words, what kind of information about a website is transferred
to solr?
Thank you very much.
Re: The solrindex command
Posted by Marek Bachmann <m....@uni-kassel.de>.
On 11.07.2011 16:15, Markus Jelsma wrote:
>
>
> On Monday 11 July 2011 16:11:47 Marek Bachmann wrote:
>> Thank you very much
>>
>> On 11.07.2011 15:48, Markus Jelsma wrote:
>>> Hi,
>>>
>>> Using the brand-new IndexingFiltersChecker in 1.4-dev you can see exactly
>>> what Nutch is going to send. It comes down to the plugins you have
>>> defined. See the schema config for a list of fields per plug-in:
>>>
>>> http://svn.apache.org/viewvc/nutch/branches/branch-1.4/conf/schema.xml?vi
>>> ew=markup
>>>
>>> Cheers
>>
>> So, as there is no "score" field in the schema.xml I guess the score for
>> a webpage in the crawl db has no effect in solr by default, am I right? :)
>
> There is no score field indeed but there is a boost field. This contains the
> score. Nutch will also set the Lucene document boost and field boost weights
> with this value.
>
Ahh! This is really an important information for me! :-) Thanks!
>>
>>> On Monday 11 July 2011 15:46:37 Marek Bachmann wrote:
>>>> Hello there,
>>>>
>>>> where can I find informations about the solr document structure which
>>>> the solrindex command sends to solr for indexing?
>>>>
>>>> As far as I know, you add data to the solr index by sending a document
>>>> with specific fields to the engine.
>>>>
>>>> I would like to know how nutch creates these documents and which fields
>>>> these documents contain.
>>>>
>>>> In other words, what kind of information about a website is transferred
>>>> to solr?
>>>>
>>>> Thank you very much.
>
Re: The solrindex command
Posted by Markus Jelsma <ma...@openindex.io>.
On Monday 11 July 2011 16:11:47 Marek Bachmann wrote:
> Thank you very much
>
> On 11.07.2011 15:48, Markus Jelsma wrote:
> > Hi,
> >
> > Using the brand-new IndexingFiltersChecker in 1.4-dev you can see exactly
> > what Nutch is going to send. It comes down to the plugins you have
> > defined. See the schema config for a list of fields per plug-in:
> >
> > http://svn.apache.org/viewvc/nutch/branches/branch-1.4/conf/schema.xml?vi
> > ew=markup
> >
> > Cheers
>
> So, as there is no "score" field in the schema.xml I guess the score for
> a webpage in the crawl db has no effect in solr by default, am I right? :)
There is no score field indeed but there is a boost field. This contains the
score. Nutch will also set the Lucene document boost and field boost weights
with this value.
>
> > On Monday 11 July 2011 15:46:37 Marek Bachmann wrote:
> >> Hello there,
> >>
> >> where can I find informations about the solr document structure which
> >> the solrindex command sends to solr for indexing?
> >>
> >> As far as I know, you add data to the solr index by sending a document
> >> with specific fields to the engine.
> >>
> >> I would like to know how nutch creates these documents and which fields
> >> these documents contain.
> >>
> >> In other words, what kind of information about a website is transferred
> >> to solr?
> >>
> >> Thank you very much.
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Re: The solrindex command
Posted by Marek Bachmann <m....@uni-kassel.de>.
Thank you very much
On 11.07.2011 15:48, Markus Jelsma wrote:
> Hi,
>
> Using the brand-new IndexingFiltersChecker in 1.4-dev you can see exactly what
> Nutch is going to send. It comes down to the plugins you have defined. See the
> schema config for a list of fields per plug-in:
>
> http://svn.apache.org/viewvc/nutch/branches/branch-1.4/conf/schema.xml?view=markup
>
> Cheers
So, as there is no "score" field in the schema.xml I guess the score for
a webpage in the crawl db has no effect in solr by default, am I right? :)
>
> On Monday 11 July 2011 15:46:37 Marek Bachmann wrote:
>> Hello there,
>>
>> where can I find informations about the solr document structure which
>> the solrindex command sends to solr for indexing?
>>
>> As far as I know, you add data to the solr index by sending a document
>> with specific fields to the engine.
>>
>> I would like to know how nutch creates these documents and which fields
>> these documents contain.
>>
>> In other words, what kind of information about a website is transferred
>> to solr?
>>
>> Thank you very much.
>
Re: The solrindex command
Posted by Markus Jelsma <ma...@openindex.io>.
Hi,
Using the brand-new IndexingFiltersChecker in 1.4-dev you can see exactly what
Nutch is going to send. It comes down to the plugins you have defined. See the
schema config for a list of fields per plug-in:
http://svn.apache.org/viewvc/nutch/branches/branch-1.4/conf/schema.xml?view=markup
Cheers
On Monday 11 July 2011 15:46:37 Marek Bachmann wrote:
> Hello there,
>
> where can I find informations about the solr document structure which
> the solrindex command sends to solr for indexing?
>
> As far as I know, you add data to the solr index by sending a document
> with specific fields to the engine.
>
> I would like to know how nutch creates these documents and which fields
> these documents contain.
>
> In other words, what kind of information about a website is transferred
> to solr?
>
> Thank you very much.
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350