You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Manish Bafna <ma...@gmail.com> on 2012/04/05 09:53:17 UTC

counter field

>
> Hi,
> Is it possible to define a field as "Counter Column" which can be
> auto-incremented.
>
> Thanks,
> Manish.
>

Re: counter field

Posted by Manish Bafna <ma...@gmail.com>.
Yes, before indexing, we go and check whether that document is already
there in index or not.
Because along with the document, we also have meta-data information which
needs to be appended.

So, we have few multivalued metadata fields, which we update if the same
document is found again.


On Fri, Apr 6, 2012 at 10:17 AM, Walter Underwood <wu...@wunderwood.org>wrote:

> So you will need to do a search for each document before adding it to the
> index, in case it is already there. That will be slow.
>
> And where do you store the last-assigned number?
>
> And there are plenty of other problems, like reloading after a corrupted
> index (disk failure), or deleted documents which are re-added later, or
> duplicates, splitting content across shards (requires a global lock across
> all shards to index each document), ...
>
> Two recommendations:
>
> 1. Having two different unique IDs is likely to cause problems, so choose
> one.
>
> 2. If you must have two IDs, use one table in a lightweight relational
> database to store the relationships between the md5 value and the serial
> number.
>
> wunder
>
> On Apr 5, 2012, at 9:37 PM, Manish Bafna wrote:
>
> > Actually not.
> > If i am updating the existing document, i need to keep the old number
> > itself.
> >
> > may be this way we can do it.
> > If we pass the number to the field, it will take that value, if we dont
> > pass it, it will do auto-increment.
> > Because if we update, i will have old number and i will pass it as a
> field
> > again.
> >
> > On Fri, Apr 6, 2012 at 9:59 AM, Walter Underwood <wunder@wunderwood.org
> >wrote:
> >
> >> Why?
> >>
> >> When you reindex, is it OK if they all change?
> >>
> >> If you reindex one document, is it OK if it gets a new sequential
> number?
> >>
> >> wunder
> >>
> >> On Apr 5, 2012, at 9:23 PM, Manish Bafna wrote:
> >>
> >>> We already have a unique key (We use md5 value).
> >>> We need another id (sequential numbers).
> >>>
> >>> On Fri, Apr 6, 2012 at 9:47 AM, Chris Hostetter <
> >> hossman_lucene@fucit.org>wrote:
> >>>
> >>>>
> >>>> : We need to have a document id available for every document (Per
> core).
> >>>>
> >>>> : We can pass docid as one of the parameter for fq, and it will return
> >> the
> >>>> : docid in the search result.
> >>>>
> >>>>
> >>>> So it sounds like you need a *unique* id, but nothing you described
> >>>> requies that it be a counter.
> >>>>
> >>>> Take a look at the UUIDField, or consider using the
> >>>> SignatureUpdateProcessor to generate a key based on a hash of all the
> >>>> field values.
> >>>>
> >>>> -Hoss
> >>>>
> >>
> >>
> >>
> >>
> >>
> >>
>
> --
> Walter Underwood
> wunder@wunderwood.org
>
>
>
>

Re: counter field

Posted by Walter Underwood <wu...@wunderwood.org>.
So you will need to do a search for each document before adding it to the index, in case it is already there. That will be slow.

And where do you store the last-assigned number?

And there are plenty of other problems, like reloading after a corrupted index (disk failure), or deleted documents which are re-added later, or duplicates, splitting content across shards (requires a global lock across all shards to index each document), ...

Two recommendations:

1. Having two different unique IDs is likely to cause problems, so choose one.

2. If you must have two IDs, use one table in a lightweight relational database to store the relationships between the md5 value and the serial number.

wunder

On Apr 5, 2012, at 9:37 PM, Manish Bafna wrote:

> Actually not.
> If i am updating the existing document, i need to keep the old number
> itself.
> 
> may be this way we can do it.
> If we pass the number to the field, it will take that value, if we dont
> pass it, it will do auto-increment.
> Because if we update, i will have old number and i will pass it as a field
> again.
> 
> On Fri, Apr 6, 2012 at 9:59 AM, Walter Underwood <wu...@wunderwood.org>wrote:
> 
>> Why?
>> 
>> When you reindex, is it OK if they all change?
>> 
>> If you reindex one document, is it OK if it gets a new sequential number?
>> 
>> wunder
>> 
>> On Apr 5, 2012, at 9:23 PM, Manish Bafna wrote:
>> 
>>> We already have a unique key (We use md5 value).
>>> We need another id (sequential numbers).
>>> 
>>> On Fri, Apr 6, 2012 at 9:47 AM, Chris Hostetter <
>> hossman_lucene@fucit.org>wrote:
>>> 
>>>> 
>>>> : We need to have a document id available for every document (Per core).
>>>> 
>>>> : We can pass docid as one of the parameter for fq, and it will return
>> the
>>>> : docid in the search result.
>>>> 
>>>> 
>>>> So it sounds like you need a *unique* id, but nothing you described
>>>> requies that it be a counter.
>>>> 
>>>> Take a look at the UUIDField, or consider using the
>>>> SignatureUpdateProcessor to generate a key based on a hash of all the
>>>> field values.
>>>> 
>>>> -Hoss
>>>> 
>> 
>> 
>> 
>> 
>> 
>> 

--
Walter Underwood
wunder@wunderwood.org




Re: counter field

Posted by Manish Bafna <ma...@gmail.com>.
Actually not.
If i am updating the existing document, i need to keep the old number
itself.

may be this way we can do it.
If we pass the number to the field, it will take that value, if we dont
pass it, it will do auto-increment.
Because if we update, i will have old number and i will pass it as a field
again.

On Fri, Apr 6, 2012 at 9:59 AM, Walter Underwood <wu...@wunderwood.org>wrote:

> Why?
>
> When you reindex, is it OK if they all change?
>
> If you reindex one document, is it OK if it gets a new sequential number?
>
> wunder
>
> On Apr 5, 2012, at 9:23 PM, Manish Bafna wrote:
>
> > We already have a unique key (We use md5 value).
> > We need another id (sequential numbers).
> >
> > On Fri, Apr 6, 2012 at 9:47 AM, Chris Hostetter <
> hossman_lucene@fucit.org>wrote:
> >
> >>
> >> : We need to have a document id available for every document (Per core).
> >>
> >> : We can pass docid as one of the parameter for fq, and it will return
> the
> >> : docid in the search result.
> >>
> >>
> >> So it sounds like you need a *unique* id, but nothing you described
> >> requies that it be a counter.
> >>
> >> Take a look at the UUIDField, or consider using the
> >> SignatureUpdateProcessor to generate a key based on a hash of all the
> >> field values.
> >>
> >> -Hoss
> >>
>
>
>
>
>
>

Re: counter field

Posted by Walter Underwood <wu...@wunderwood.org>.
Why?

When you reindex, is it OK if they all change?

If you reindex one document, is it OK if it gets a new sequential number?

wunder

On Apr 5, 2012, at 9:23 PM, Manish Bafna wrote:

> We already have a unique key (We use md5 value).
> We need another id (sequential numbers).
> 
> On Fri, Apr 6, 2012 at 9:47 AM, Chris Hostetter <ho...@fucit.org>wrote:
> 
>> 
>> : We need to have a document id available for every document (Per core).
>> 
>> : We can pass docid as one of the parameter for fq, and it will return the
>> : docid in the search result.
>> 
>> 
>> So it sounds like you need a *unique* id, but nothing you described
>> requies that it be a counter.
>> 
>> Take a look at the UUIDField, or consider using the
>> SignatureUpdateProcessor to generate a key based on a hash of all the
>> field values.
>> 
>> -Hoss
>> 






Re: counter field

Posted by Manish Bafna <ma...@gmail.com>.
We already have a unique key (We use md5 value).
We need another id (sequential numbers).

On Fri, Apr 6, 2012 at 9:47 AM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : We need to have a document id available for every document (Per core).
>
> : We can pass docid as one of the parameter for fq, and it will return the
> : docid in the search result.
>
>
> So it sounds like you need a *unique* id, but nothing you described
> requies that it be a counter.
>
> Take a look at the UUIDField, or consider using the
> SignatureUpdateProcessor to generate a key based on a hash of all the
> field values.
>
> -Hoss
>

Re: counter field

Posted by Chris Hostetter <ho...@fucit.org>.
: We need to have a document id available for every document (Per core).

: We can pass docid as one of the parameter for fq, and it will return the
: docid in the search result.


So it sounds like you need a *unique* id, but nothing you described 
requies that it be a counter.

Take a look at the UUIDField, or consider using the 
SignatureUpdateProcessor to generate a key based on a hash of all the 
field values.

-Hoss

Re: counter field

Posted by Manish Bafna <ma...@gmail.com>.
We need to have a document id available for every document (Per core).
There is DocID in Lucene Index but did not find any API to expose it using
Solr.

May be if we can alter Solr to optionally return the DocId (which is
unique),

We can pass docid as one of the parameter for fq, and it will return the
docid in the search result.

On Thu, Apr 5, 2012 at 10:13 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : > Is it possible to define a field as "Counter Column" which can be
> : > auto-incremented.
>
> a feature like this does not exist in Solr at the moment, but it would be
> possible to implement this fairly easily in an UpdateProcessor -- however
> it would only be functional in very limited situations (ie: all docs must
> use the same update chain, single node indexes only -- no distributed
> search / solr cloud)
>
> a better question is: why?  what do you wnat to do with a field like this?
>
> https://people.apache.org/~hossman/#xyproblem
> XY Problem
>
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
>
>
> -Hoss
>

Re: counter field

Posted by Chris Hostetter <ho...@fucit.org>.
: > Is it possible to define a field as "Counter Column" which can be
: > auto-incremented.

a feature like this does not exist in Solr at the moment, but it would be 
possible to implement this fairly easily in an UpdateProcessor -- however 
it would only be functional in very limited situations (ie: all docs must 
use the same update chain, single node indexes only -- no distributed 
search / solr cloud)

a better question is: why?  what do you wnat to do with a field like this?

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss

Re: counter field

Posted by Manish Bafna <ma...@gmail.com>.
Our data comes from file system and from web and so adding the
auto-increment field at the datasource is not possible.

On Fri, Apr 6, 2012 at 6:50 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 4/5/2012 1:53 AM, Manish Bafna wrote:
>
>> Hi,
>>> Is it possible to define a field as "Counter Column" which can be
>>> auto-incremented.
>>>
>>
> Manish,
>
> Where does your data come from?  Can you add the autoincrement field to
> the data source?
>
> My data comes from MySQL, where the private key is an autoincrement field.
>  MySQL is very good at autoincrement fields.
>
> Walter, we do have two unique ID values in our system, enforced by MySQL,
> and it hasn't caused us any problems yet.  One is the autoincrement field
> just mentioned and the other is another id that is specific to our
> application.  We use the autoincrement field to identify deleted documents
> and as a position indicator for the build program to add new documents to
> Solr.  The other unique field is Solr's uniqueKey.
>
> Thanks,
> Shawn
>
>

Re: counter field

Posted by Shawn Heisey <so...@elyograg.org>.
On 4/5/2012 1:53 AM, Manish Bafna wrote:
>> Hi,
>> Is it possible to define a field as "Counter Column" which can be
>> auto-incremented.

Manish,

Where does your data come from?  Can you add the autoincrement field to 
the data source?

My data comes from MySQL, where the private key is an autoincrement 
field.  MySQL is very good at autoincrement fields.

Walter, we do have two unique ID values in our system, enforced by 
MySQL, and it hasn't caused us any problems yet.  One is the 
autoincrement field just mentioned and the other is another id that is 
specific to our application.  We use the autoincrement field to identify 
deleted documents and as a position indicator for the build program to 
add new documents to Solr.  The other unique field is Solr's uniqueKey.

Thanks,
Shawn