You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by al...@aim.com on 2013/02/15 02:05:43 UTC

fields in solrindex-mapping.xml

Hello,

I see that there are 

                <field dest="segment" source="segment"/>
                <field dest="boost" source="boost"/>
                <field dest="digest" source="digest"/>
                <field dest="tstamp" source="tstamp"/>

fields in addition to title, host and content ones in nutch-2.x' solr-mapping.xml. I thought tstamp may be needed for sorting documents. What about the other fields,
segment, boost and digest? Can someone explain, why these fields are included in solr-mapping.xml?


Thanks.
Alex.



Re: fields in solrindex-mapping.xml

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Alex,
So we can tack this one.
https://issues.apache.org/jira/browse/NUTCH-1532
Thanks
Lewis

On Fri, Feb 15, 2013 at 4:21 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Alex,
> OK so we can certainly remove segment from 2.x solr-index-mapping.xml. It
> would however be nice to replace this with the appropriate batchId.
> Can someone advise where the 'segment' field currently comes from in trunk?
> That way we can at least map the field to the batchId equivalent in 2.x
>
> Thank you
> Lewis
>
>
> On Fri, Feb 15, 2013 at 2:23 PM, <al...@aim.com> wrote:
>
>> Hi Lewis,
>>
>> If I exclude one of the fileds tstamp, digest, and boost from
>> solindex-mapping and schema.xml, solrindex gives error
>>
>> SEVERE: org.apache.solr.common.SolrException: ERROR:
>> [doc=com.yahoo:http/] unknown field 'tstamp'
>>
>> for each of above fields, except segment.
>>
>> Alex.
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Lewis John Mcgibbney <le...@gmail.com>
>> To: user <us...@nutch.apache.org>
>> Sent: Thu, Feb 14, 2013 8:34 pm
>> Subject: Re: fields in solrindex-mapping.xml
>>
>>
>> Hi Alex,
>> Tstamp represents fetch tiem, used for deduplication.
>> Boost is for scoring-opic and link. This is required in 2.x as well.
>> I don't have the code right now, but you can try removing digest and
>> segment. To me they both look legacy.
>> There is a wiki page on index structure which you can consult and/or add
>> to
>> should you wish.
>> Thank you
>> Lewis
>>
>> On Thursday, February 14, 2013,  <al...@aim.com> wrote:
>> > Hello,
>> >
>> > I see that there are
>> >
>> >                 <field dest="segment" source="segment"/>
>> >                 <field dest="boost" source="boost"/>
>> >                 <field dest="digest" source="digest"/>
>> >                 <field dest="tstamp" source="tstamp"/>
>> >
>> > fields in addition to title, host and content ones in nutch-2.x'
>> solr-mapping.xml. I thought tstamp may be needed for sorting documents.
>> What about the other fields,
>> > segment, boost and digest? Can someone explain, why these fields are
>> included in solr-mapping.xml?
>> >
>> >
>> > Thanks.
>> > Alex.
>> >
>> >
>> >
>>
>> --
>> *Lewis*
>>
>>
>>
>
>
> --
> *Lewis*
>



-- 
*Lewis*

Re: fields in solrindex-mapping.xml

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Well look, sharding is for distributed queries right?
Regardless of what fields you store in a nutch document, the reason you
store them is to have 'some' level of structure in your data.
Boost, id, tstamp etc. are merely fields which enable you to do so...
nothing more.
Nutch plugins (and some of the indexing core) enables you to add or remove
such fields in your index depending on what you want, how you feel, what
the weather is like etc. The id, boost, segment and digest fields are no
different.


On Saturday, February 16, 2013,  <al...@aim.com> wrote:
> Do you mean  they help when sharding?
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Lewis John Mcgibbney <le...@gmail.com>
> To: user <us...@nutch.apache.org>
> Sent: Sat, Feb 16, 2013 10:58 am
> Subject: Re: fields in solrindex-mapping.xml
>
>
> In short, it helps with searching when you can slice your data using these
> fields
>
> On Saturday, February 16, 2013, Markus Jelsma <ma...@openindex.io>
> wrote:
>> Those are added by IndexerMapReduce (or 2.x equivalent) and index-basic.
> They contain the crawl datum's signature, the time stamp (see index-basic)
> and crawl datum score. If you think you don't need them, you can safely
> omit them.
>>
>> -----Original message-----
>>> From:alxsss@aim.com <al...@aim.com>
>>> Sent: Sat 16-Feb-2013 19:21
>>> To: user@nutch.apache.org
>>> Subject: Re: fields in solrindex-mapping.xml
>>>
>>> Hi Lewis,
>>>
>>> Why do we need to include digest, tstamp, boost and batchid fields in
> solrindex?
>>>
>>> Thanks.
>>> Alex.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Lewis John Mcgibbney <le...@gmail.com>
>>> To: user <us...@nutch.apache.org>
>>> Sent: Fri, Feb 15, 2013 4:21 pm
>>> Subject: Re: fields in solrindex-mapping.xml
>>>
>>>
>>> Hi Alex,
>>> OK so we can certainly remove segment from 2.x solr-index-mapping.xml.
It
>>> would however be nice to replace this with the appropriate batchId.
>>> Can someone advise where the 'segment' field currently comes from in
> trunk?
>>> That way we can at least map the field to the batchId equivalent in 2.x
>>>
>>> Thank you
>>> Lewis
>>>
>>> On Fri, Feb 15, 2013 at 2:23 PM, <al...@aim.com> wrote:
>>>
>>> > Hi Lewis,
>>> >
>>> > If I exclude one of the fileds tstamp, digest, and boost from
>>> > solindex-mapping and schema.xml, solrindex gives error
>>> >
>>> > SEVERE: org.apache.solr.common.SolrException: ERROR:
> [doc=com.yahoo:http/]
>>> > unknown field 'tstamp'
>>> >
>>> > for each of above fields, except segment.
>>> >
>>> > Alex.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > -----Original Message-----
>>> > From: Lewis John Mcgibbney <le...@gmail.com>
>>> > To: user <us...@nutch.apache.org>
>>> > Sent: Thu, Feb 14, 2013 8:34 pm
>>> > Subject: Re: fields in solrindex-mapping.xml
>>> >
>>> >
>>> > Hi Alex,
>>> > Tstamp represents fetch tiem, used for deduplication.
>>> > Boost is for scoring-opic and link. This is required in 2.x as well.
>>> > I don't have the code right now, but you can try removing digest and
>>> > segment. To me they both look legacy.
>>> > There is a wiki page on index structure which you can consult and/or
> add to
>>> > should you wish.
>>> > Thank you
>>> > Lewis
>>> >
>>> > On Thursday, February 14, 2013,  <al...@aim.com> wrote:
>>> > > Hello,
>>> > >
>>> > > I see that there are
>>> > >
>>> > >                 <field dest="segment" source="segment"/>
>>> > >                 <field dest="boost" source="boost"/>
>>> > >                 <field dest="digest" source="digest"/>
>>> > >                 <field dest="tstamp" source="tstamp"/*Lewis*
>
>
>

-- 
*Lewis*

Re: fields in solrindex-mapping.xml

Posted by al...@aim.com.
Do you mean  they help when sharding?

Thanks.
Alex.

 

 

 

-----Original Message-----
From: Lewis John Mcgibbney <le...@gmail.com>
To: user <us...@nutch.apache.org>
Sent: Sat, Feb 16, 2013 10:58 am
Subject: Re: fields in solrindex-mapping.xml


In short, it helps with searching when you can slice your data using these
fields

On Saturday, February 16, 2013, Markus Jelsma <ma...@openindex.io>
wrote:
> Those are added by IndexerMapReduce (or 2.x equivalent) and index-basic.
They contain the crawl datum's signature, the time stamp (see index-basic)
and crawl datum score. If you think you don't need them, you can safely
omit them.
>
> -----Original message-----
>> From:alxsss@aim.com <al...@aim.com>
>> Sent: Sat 16-Feb-2013 19:21
>> To: user@nutch.apache.org
>> Subject: Re: fields in solrindex-mapping.xml
>>
>> Hi Lewis,
>>
>> Why do we need to include digest, tstamp, boost and batchid fields in
solrindex?
>>
>> Thanks.
>> Alex.
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Lewis John Mcgibbney <le...@gmail.com>
>> To: user <us...@nutch.apache.org>
>> Sent: Fri, Feb 15, 2013 4:21 pm
>> Subject: Re: fields in solrindex-mapping.xml
>>
>>
>> Hi Alex,
>> OK so we can certainly remove segment from 2.x solr-index-mapping.xml. It
>> would however be nice to replace this with the appropriate batchId.
>> Can someone advise where the 'segment' field currently comes from in
trunk?
>> That way we can at least map the field to the batchId equivalent in 2.x
>>
>> Thank you
>> Lewis
>>
>> On Fri, Feb 15, 2013 at 2:23 PM, <al...@aim.com> wrote:
>>
>> > Hi Lewis,
>> >
>> > If I exclude one of the fileds tstamp, digest, and boost from
>> > solindex-mapping and schema.xml, solrindex gives error
>> >
>> > SEVERE: org.apache.solr.common.SolrException: ERROR:
[doc=com.yahoo:http/]
>> > unknown field 'tstamp'
>> >
>> > for each of above fields, except segment.
>> >
>> > Alex.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Lewis John Mcgibbney <le...@gmail.com>
>> > To: user <us...@nutch.apache.org>
>> > Sent: Thu, Feb 14, 2013 8:34 pm
>> > Subject: Re: fields in solrindex-mapping.xml
>> >
>> >
>> > Hi Alex,
>> > Tstamp represents fetch tiem, used for deduplication.
>> > Boost is for scoring-opic and link. This is required in 2.x as well.
>> > I don't have the code right now, but you can try removing digest and
>> > segment. To me they both look legacy.
>> > There is a wiki page on index structure which you can consult and/or
add to
>> > should you wish.
>> > Thank you
>> > Lewis
>> >
>> > On Thursday, February 14, 2013,  <al...@aim.com> wrote:
>> > > Hello,
>> > >
>> > > I see that there are
>> > >
>> > >                 <field dest="segment" source="segment"/>
>> > >                 <field dest="boost" source="boost"/>
>> > >                 <field dest="digest" source="digest"/>
>> > >                 <field dest="tstamp" source="tstamp"/>
>> > >
>> > > fields in addition to title, host and content ones in nutch-2.x'
>> > solr-mapping.xml. I thought tstamp may be needed for sorting documents.
>> > What about the other fields,
>> > > segment, boost and digest? Can someone explain, why these fields are
>> > included in solr-mapping.xml?
>> > >
>> > >
>> > > Thanks.
>> > > Alex.
>> > >
>> > >
>> > >
>> >
>> > --
>> > *Lewis*
>> >
>> >
>> >
>>
>>
>> --
>> *Lewis*
>>
>>
>>
>

-- 
*Lewis*

 

Re: fields in solrindex-mapping.xml

Posted by Lewis John Mcgibbney <le...@gmail.com>.
In short, it helps with searching when you can slice your data using these
fields

On Saturday, February 16, 2013, Markus Jelsma <ma...@openindex.io>
wrote:
> Those are added by IndexerMapReduce (or 2.x equivalent) and index-basic.
They contain the crawl datum's signature, the time stamp (see index-basic)
and crawl datum score. If you think you don't need them, you can safely
omit them.
>
> -----Original message-----
>> From:alxsss@aim.com <al...@aim.com>
>> Sent: Sat 16-Feb-2013 19:21
>> To: user@nutch.apache.org
>> Subject: Re: fields in solrindex-mapping.xml
>>
>> Hi Lewis,
>>
>> Why do we need to include digest, tstamp, boost and batchid fields in
solrindex?
>>
>> Thanks.
>> Alex.
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Lewis John Mcgibbney <le...@gmail.com>
>> To: user <us...@nutch.apache.org>
>> Sent: Fri, Feb 15, 2013 4:21 pm
>> Subject: Re: fields in solrindex-mapping.xml
>>
>>
>> Hi Alex,
>> OK so we can certainly remove segment from 2.x solr-index-mapping.xml. It
>> would however be nice to replace this with the appropriate batchId.
>> Can someone advise where the 'segment' field currently comes from in
trunk?
>> That way we can at least map the field to the batchId equivalent in 2.x
>>
>> Thank you
>> Lewis
>>
>> On Fri, Feb 15, 2013 at 2:23 PM, <al...@aim.com> wrote:
>>
>> > Hi Lewis,
>> >
>> > If I exclude one of the fileds tstamp, digest, and boost from
>> > solindex-mapping and schema.xml, solrindex gives error
>> >
>> > SEVERE: org.apache.solr.common.SolrException: ERROR:
[doc=com.yahoo:http/]
>> > unknown field 'tstamp'
>> >
>> > for each of above fields, except segment.
>> >
>> > Alex.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Lewis John Mcgibbney <le...@gmail.com>
>> > To: user <us...@nutch.apache.org>
>> > Sent: Thu, Feb 14, 2013 8:34 pm
>> > Subject: Re: fields in solrindex-mapping.xml
>> >
>> >
>> > Hi Alex,
>> > Tstamp represents fetch tiem, used for deduplication.
>> > Boost is for scoring-opic and link. This is required in 2.x as well.
>> > I don't have the code right now, but you can try removing digest and
>> > segment. To me they both look legacy.
>> > There is a wiki page on index structure which you can consult and/or
add to
>> > should you wish.
>> > Thank you
>> > Lewis
>> >
>> > On Thursday, February 14, 2013,  <al...@aim.com> wrote:
>> > > Hello,
>> > >
>> > > I see that there are
>> > >
>> > >                 <field dest="segment" source="segment"/>
>> > >                 <field dest="boost" source="boost"/>
>> > >                 <field dest="digest" source="digest"/>
>> > >                 <field dest="tstamp" source="tstamp"/>
>> > >
>> > > fields in addition to title, host and content ones in nutch-2.x'
>> > solr-mapping.xml. I thought tstamp may be needed for sorting documents.
>> > What about the other fields,
>> > > segment, boost and digest? Can someone explain, why these fields are
>> > included in solr-mapping.xml?
>> > >
>> > >
>> > > Thanks.
>> > > Alex.
>> > >
>> > >
>> > >
>> >
>> > --
>> > *Lewis*
>> >
>> >
>> >
>>
>>
>> --
>> *Lewis*
>>
>>
>>
>

-- 
*Lewis*

RE: fields in solrindex-mapping.xml

Posted by Markus Jelsma <ma...@openindex.io>.
Those are added by IndexerMapReduce (or 2.x equivalent) and index-basic. They contain the crawl datum's signature, the time stamp (see index-basic) and crawl datum score. If you think you don't need them, you can safely omit them. 
 
-----Original message-----
> From:alxsss@aim.com <al...@aim.com>
> Sent: Sat 16-Feb-2013 19:21
> To: user@nutch.apache.org
> Subject: Re: fields in solrindex-mapping.xml
> 
> Hi Lewis,
> 
> Why do we need to include digest, tstamp, boost and batchid fields in solrindex?
> 
> Thanks.
> Alex.
> 
>  
> 
>  
> 
>  
> 
> -----Original Message-----
> From: Lewis John Mcgibbney <le...@gmail.com>
> To: user <us...@nutch.apache.org>
> Sent: Fri, Feb 15, 2013 4:21 pm
> Subject: Re: fields in solrindex-mapping.xml
> 
> 
> Hi Alex,
> OK so we can certainly remove segment from 2.x solr-index-mapping.xml. It
> would however be nice to replace this with the appropriate batchId.
> Can someone advise where the 'segment' field currently comes from in trunk?
> That way we can at least map the field to the batchId equivalent in 2.x
> 
> Thank you
> Lewis
> 
> On Fri, Feb 15, 2013 at 2:23 PM, <al...@aim.com> wrote:
> 
> > Hi Lewis,
> >
> > If I exclude one of the fileds tstamp, digest, and boost from
> > solindex-mapping and schema.xml, solrindex gives error
> >
> > SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=com.yahoo:http/]
> > unknown field 'tstamp'
> >
> > for each of above fields, except segment.
> >
> > Alex.
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Lewis John Mcgibbney <le...@gmail.com>
> > To: user <us...@nutch.apache.org>
> > Sent: Thu, Feb 14, 2013 8:34 pm
> > Subject: Re: fields in solrindex-mapping.xml
> >
> >
> > Hi Alex,
> > Tstamp represents fetch tiem, used for deduplication.
> > Boost is for scoring-opic and link. This is required in 2.x as well.
> > I don't have the code right now, but you can try removing digest and
> > segment. To me they both look legacy.
> > There is a wiki page on index structure which you can consult and/or add to
> > should you wish.
> > Thank you
> > Lewis
> >
> > On Thursday, February 14, 2013,  <al...@aim.com> wrote:
> > > Hello,
> > >
> > > I see that there are
> > >
> > >                 <field dest="segment" source="segment"/>
> > >                 <field dest="boost" source="boost"/>
> > >                 <field dest="digest" source="digest"/>
> > >                 <field dest="tstamp" source="tstamp"/>
> > >
> > > fields in addition to title, host and content ones in nutch-2.x'
> > solr-mapping.xml. I thought tstamp may be needed for sorting documents.
> > What about the other fields,
> > > segment, boost and digest? Can someone explain, why these fields are
> > included in solr-mapping.xml?
> > >
> > >
> > > Thanks.
> > > Alex.
> > >
> > >
> > >
> >
> > --
> > *Lewis*
> >
> >
> >
> 
> 
> -- 
> *Lewis*
> 
>  
> 

Re: fields in solrindex-mapping.xml

Posted by al...@aim.com.
Hi Lewis,

Why do we need to include digest, tstamp, boost and batchid fields in solrindex?

Thanks.
Alex.

 

 

 

-----Original Message-----
From: Lewis John Mcgibbney <le...@gmail.com>
To: user <us...@nutch.apache.org>
Sent: Fri, Feb 15, 2013 4:21 pm
Subject: Re: fields in solrindex-mapping.xml


Hi Alex,
OK so we can certainly remove segment from 2.x solr-index-mapping.xml. It
would however be nice to replace this with the appropriate batchId.
Can someone advise where the 'segment' field currently comes from in trunk?
That way we can at least map the field to the batchId equivalent in 2.x

Thank you
Lewis

On Fri, Feb 15, 2013 at 2:23 PM, <al...@aim.com> wrote:

> Hi Lewis,
>
> If I exclude one of the fileds tstamp, digest, and boost from
> solindex-mapping and schema.xml, solrindex gives error
>
> SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=com.yahoo:http/]
> unknown field 'tstamp'
>
> for each of above fields, except segment.
>
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Lewis John Mcgibbney <le...@gmail.com>
> To: user <us...@nutch.apache.org>
> Sent: Thu, Feb 14, 2013 8:34 pm
> Subject: Re: fields in solrindex-mapping.xml
>
>
> Hi Alex,
> Tstamp represents fetch tiem, used for deduplication.
> Boost is for scoring-opic and link. This is required in 2.x as well.
> I don't have the code right now, but you can try removing digest and
> segment. To me they both look legacy.
> There is a wiki page on index structure which you can consult and/or add to
> should you wish.
> Thank you
> Lewis
>
> On Thursday, February 14, 2013,  <al...@aim.com> wrote:
> > Hello,
> >
> > I see that there are
> >
> >                 <field dest="segment" source="segment"/>
> >                 <field dest="boost" source="boost"/>
> >                 <field dest="digest" source="digest"/>
> >                 <field dest="tstamp" source="tstamp"/>
> >
> > fields in addition to title, host and content ones in nutch-2.x'
> solr-mapping.xml. I thought tstamp may be needed for sorting documents.
> What about the other fields,
> > segment, boost and digest? Can someone explain, why these fields are
> included in solr-mapping.xml?
> >
> >
> > Thanks.
> > Alex.
> >
> >
> >
>
> --
> *Lewis*
>
>
>


-- 
*Lewis*

 

Re: fields in solrindex-mapping.xml

Posted by al...@aim.com.
 

 

 

-----Original Message-----
From: Lewis John Mcgibbney <le...@gmail.com>
To: user <us...@nutch.apache.org>
Sent: Fri, Feb 15, 2013 4:21 pm
Subject: Re: fields in solrindex-mapping.xml


Hi Alex,
OK so we can certainly remove segment from 2.x solr-index-mapping.xml. It
would however be nice to replace this with the appropriate batchId.
Can someone advise where the 'segment' field currently comes from in trunk?
That way we can at least map the field to the batchId equivalent in 2.x

Thank you
Lewis

On Fri, Feb 15, 2013 at 2:23 PM, <al...@aim.com> wrote:

> Hi Lewis,
>
> If I exclude one of the fileds tstamp, digest, and boost from
> solindex-mapping and schema.xml, solrindex gives error
>
> SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=com.yahoo:http/]
> unknown field 'tstamp'
>
> for each of above fields, except segment.
>
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Lewis John Mcgibbney <le...@gmail.com>
> To: user <us...@nutch.apache.org>
> Sent: Thu, Feb 14, 2013 8:34 pm
> Subject: Re: fields in solrindex-mapping.xml
>
>
> Hi Alex,
> Tstamp represents fetch tiem, used for deduplication.
> Boost is for scoring-opic and link. This is required in 2.x as well.
> I don't have the code right now, but you can try removing digest and
> segment. To me they both look legacy.
> There is a wiki page on index structure which you can consult and/or add to
> should you wish.
> Thank you
> Lewis
>
> On Thursday, February 14, 2013,  <al...@aim.com> wrote:
> > Hello,
> >
> > I see that there are
> >
> >                 <field dest="segment" source="segment"/>
> >                 <field dest="boost" source="boost"/>
> >                 <field dest="digest" source="digest"/>
> >                 <field dest="tstamp" source="tstamp"/>
> >
> > fields in addition to title, host and content ones in nutch-2.x'
> solr-mapping.xml. I thought tstamp may be needed for sorting documents.
> What about the other fields,
> > segment, boost and digest? Can someone explain, why these fields are
> included in solr-mapping.xml?
> >
> >
> > Thanks.
> > Alex.
> >
> >
> >
>
> --
> *Lewis*
>
>
>


-- 
*Lewis*

 

Re: fields in solrindex-mapping.xml

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Alex,
OK so we can certainly remove segment from 2.x solr-index-mapping.xml. It
would however be nice to replace this with the appropriate batchId.
Can someone advise where the 'segment' field currently comes from in trunk?
That way we can at least map the field to the batchId equivalent in 2.x

Thank you
Lewis

On Fri, Feb 15, 2013 at 2:23 PM, <al...@aim.com> wrote:

> Hi Lewis,
>
> If I exclude one of the fileds tstamp, digest, and boost from
> solindex-mapping and schema.xml, solrindex gives error
>
> SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=com.yahoo:http/]
> unknown field 'tstamp'
>
> for each of above fields, except segment.
>
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Lewis John Mcgibbney <le...@gmail.com>
> To: user <us...@nutch.apache.org>
> Sent: Thu, Feb 14, 2013 8:34 pm
> Subject: Re: fields in solrindex-mapping.xml
>
>
> Hi Alex,
> Tstamp represents fetch tiem, used for deduplication.
> Boost is for scoring-opic and link. This is required in 2.x as well.
> I don't have the code right now, but you can try removing digest and
> segment. To me they both look legacy.
> There is a wiki page on index structure which you can consult and/or add to
> should you wish.
> Thank you
> Lewis
>
> On Thursday, February 14, 2013,  <al...@aim.com> wrote:
> > Hello,
> >
> > I see that there are
> >
> >                 <field dest="segment" source="segment"/>
> >                 <field dest="boost" source="boost"/>
> >                 <field dest="digest" source="digest"/>
> >                 <field dest="tstamp" source="tstamp"/>
> >
> > fields in addition to title, host and content ones in nutch-2.x'
> solr-mapping.xml. I thought tstamp may be needed for sorting documents.
> What about the other fields,
> > segment, boost and digest? Can someone explain, why these fields are
> included in solr-mapping.xml?
> >
> >
> > Thanks.
> > Alex.
> >
> >
> >
>
> --
> *Lewis*
>
>
>


-- 
*Lewis*

Re: fields in solrindex-mapping.xml

Posted by al...@aim.com.
Hi Lewis,

If I exclude one of the fileds tstamp, digest, and boost from solindex-mapping and schema.xml, solrindex gives error

SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=com.yahoo:http/] unknown field 'tstamp'

for each of above fields, except segment.

Alex.

 

 

 

-----Original Message-----
From: Lewis John Mcgibbney <le...@gmail.com>
To: user <us...@nutch.apache.org>
Sent: Thu, Feb 14, 2013 8:34 pm
Subject: Re: fields in solrindex-mapping.xml


Hi Alex,
Tstamp represents fetch tiem, used for deduplication.
Boost is for scoring-opic and link. This is required in 2.x as well.
I don't have the code right now, but you can try removing digest and
segment. To me they both look legacy.
There is a wiki page on index structure which you can consult and/or add to
should you wish.
Thank you
Lewis

On Thursday, February 14, 2013,  <al...@aim.com> wrote:
> Hello,
>
> I see that there are
>
>                 <field dest="segment" source="segment"/>
>                 <field dest="boost" source="boost"/>
>                 <field dest="digest" source="digest"/>
>                 <field dest="tstamp" source="tstamp"/>
>
> fields in addition to title, host and content ones in nutch-2.x'
solr-mapping.xml. I thought tstamp may be needed for sorting documents.
What about the other fields,
> segment, boost and digest? Can someone explain, why these fields are
included in solr-mapping.xml?
>
>
> Thanks.
> Alex.
>
>
>

-- 
*Lewis*

 

Re: fields in solrindex-mapping.xml

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Alex,
Tstamp represents fetch tiem, used for deduplication.
Boost is for scoring-opic and link. This is required in 2.x as well.
I don't have the code right now, but you can try removing digest and
segment. To me they both look legacy.
There is a wiki page on index structure which you can consult and/or add to
should you wish.
Thank you
Lewis

On Thursday, February 14, 2013,  <al...@aim.com> wrote:
> Hello,
>
> I see that there are
>
>                 <field dest="segment" source="segment"/>
>                 <field dest="boost" source="boost"/>
>                 <field dest="digest" source="digest"/>
>                 <field dest="tstamp" source="tstamp"/>
>
> fields in addition to title, host and content ones in nutch-2.x'
solr-mapping.xml. I thought tstamp may be needed for sorting documents.
What about the other fields,
> segment, boost and digest? Can someone explain, why these fields are
included in solr-mapping.xml?
>
>
> Thanks.
> Alex.
>
>
>

-- 
*Lewis*