You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "marks1900-post01@yahoo.com.au" <ma...@yahoo.com.au> on 2013/03/06 08:31:58 UTC

Solr 4.x auto-increment/sequence/counter functionality.

I am looking into how to add auto-increment/sequence/counter functionality to Solr 4.x. I specifically want to do this, so that I have numeric field which records the document insertion order that can be sorted against. This numeric field would have to be unique and not be allowed to change over time.  Unfortunately using a insertion "date" would provide numerous collisions.  Any feedback or ideas on an approach that would help me achieve this would be appreciated.

I am thinking that this could be achieved multiple ways:
* Via Remote Solr Document calls.  (A Solr Singleton for remote calls + Solr calls to get the current sequence value and then a call to increment the value )
* A Solr Plugin (extend RequestHandlerBase - http://..../sequence?q=name&size=1000 and return the next sequence/counter number ) 
* Using a standard RDBMS such as PostgreSQL.
* Some special Solr/Lucene functionality that I don't know about.

The closest information I could find is outlined here:
http://lucene.472066.n3.nabble.com/counter-field-td3886549.html


A bit more background:

I am using Solr as a NoSQL solution with great text search capabilities.  Currently, I am inserting beans using SolrJ and each of these beans has an id which is comprised of bean string type (Such as "CUSTOMER", "BOOK", "STORE" ) concatenated with a unique bean type identifier string ( Customer - UUID.randomUUID().toString().toLowerCase(Locale.ENGLISH), Book - ISDN, Store - name).  For instance, "CUSTOMER-b245659b-825c-4357-aab0-6d592468889a", "BOOK-978-1782161325" or "STORE-TheUniquelyNamedStore".  Ideally I am aiming to add a numeric field to these beans that represents insertion position, that will then be used as a sorting field.

Re: Solr 4.x auto-increment/sequence/counter functionality.

Posted by Otis Gospodnetic <ot...@gmail.com>.
Consider this then:
http://engineering.twitter.com/2010/06/announcing-snowflake.html

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Mar 6, 2013 at 10:00 AM, mark12345 <ma...@yahoo.com.au>wrote:

> Appending a random value only reduces the chance of a collision (And I need
> to ensure continuous uniqueness) and could hurt how the field is later
> sorted.  I have not written a custom UpdateRequestProcessor before, is
> there
> a way to incorporate a Singleton that ensures one instance across a
> cluster?
> SolrCloud?
>
> I guess the main thing is that I want the value would also be kept unique
> across a cluster of Solr instances.    As far as I know in Solr, the only
> *free* uniqueness check is with the "<uniqueKey>id</uniqueKey>" declaration
> in "schema.xml".  Are there other options that I should be considering?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4045239.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr 4.x auto-increment/sequence/counter functionality.

Posted by Erick Erickson <er...@gmail.com>.
Don't use the internal Lucene doc ID. It _will_ change, even the
relationship between existing docs will change. When cores are merged, the
Lucene doc IDs are renumbered. Segments are NOT merged in insertion order,
they're merged to try to not keep rewriting large segments.

So if you rely on any ordering based on insertion order by trying to use
internal Lucene doc ID, you'll be disappointed.

I really think you'll have to generate something yourself that you can
count on.

Best
Erick


On Sun, Mar 10, 2013 at 9:50 AM, mark12345 <ma...@yahoo.com.au>wrote:

> A slightly different approach.
>
> * I noticed that I can sort by the internal Lucene _docid_.
>
> ->   http://wiki.apache.org/solr/CommonQueryParameters
> <http://wiki.apache.org/solr/CommonQueryParameters>
>
> > You can sort by index id using sort=_docid_ asc or sort=_docid_ desc
>
> * I have also read the docid is represented by a sequential number.
>
> ->
>
> http://lucene.472066.n3.nabble.com/Get-DocID-after-Document-insert-td556278.html
> <
> http://lucene.472066.n3.nabble.com/Get-DocID-after-Document-insert-td556278.html
> >
>
> >  Your document IDs may change, and in fact *will* change if you delete a
> > document and then optimize. Say you index 100 docs, delete number 50 and
> > optimize. Documents that originally had IDs 51-100 will now have IDs
> 50-99
> > and your hierarchy will be messed up.
>
>
> So there is a slight chance that the _docid_ might represent document
> creation order.  Does anyone have knowledge and experience with the
> internals of the Lucene _docid_ field?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4046137.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr 4.x auto-increment/sequence/counter functionality.

Posted by mark12345 <ma...@yahoo.com.au>.
A slightly different approach.

* I noticed that I can sort by the internal Lucene _docid_.

->   http://wiki.apache.org/solr/CommonQueryParameters
<http://wiki.apache.org/solr/CommonQueryParameters>  

> You can sort by index id using sort=_docid_ asc or sort=_docid_ desc

* I have also read the docid is represented by a sequential number.

->  
http://lucene.472066.n3.nabble.com/Get-DocID-after-Document-insert-td556278.html
<http://lucene.472066.n3.nabble.com/Get-DocID-after-Document-insert-td556278.html>  

>  Your document IDs may change, and in fact *will* change if you delete a
> document and then optimize. Say you index 100 docs, delete number 50 and
> optimize. Documents that originally had IDs 51-100 will now have IDs 50-99
> and your hierarchy will be messed up. 


So there is a slight chance that the _docid_ might represent document
creation order.  Does anyone have knowledge and experience with the
internals of the Lucene _docid_ field?



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4046137.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.x auto-increment/sequence/counter functionality.

Posted by mark12345 <ma...@yahoo.com.au>.
So I think I took the easiest option by creating an UpdateRequestProcessor
implementation (I was unsure of the performance implications and object
model of ScriptUpdateProcessor).  The below
DocumentCreationDetailsProcessorFactory class seems to achieve my aim of
allowing me to sort my Solr Documents by a creation order (To an extent - I
don't think it is exactly the commit order..), though the
auto-increment/sequence/counter functionality is not continuous.

Solr Sort Parameter String:
sort=created_time_stamp_l asc, created_processing_sequence_number_l asc,
created_by_solr_thread_id_l asc, created_by_solr_core_name_s asc,
created_by_solr_shard_id_s asc


Any comments or feedback would be appreciated.

//----------------------------------------------------------------
// UpdateRequestProcessor implementation
//----------------------------------------------------------------
public class DocumentCreationDetailsProcessorFactory extends
UpdateRequestProcessorFactory {

    private static final AtomicLong processingSequenceNumber = new
AtomicLong();

    @Override
    public UpdateRequestProcessor getInstance(SolrQueryRequest req,
SolrQueryResponse rsp, UpdateRequestProcessor next) {
        return new DocumentCreationDetailsProcessor(req, rsp, next,
processingSequenceNumber);
    }
}

class DocumentCreationDetailsProcessor extends UpdateRequestProcessor {

    private final SolrQueryRequest req;

    @SuppressWarnings("unused")
    private final SolrQueryResponse rsp;

    @SuppressWarnings("unused")
    private final UpdateRequestProcessor next;

    private final AtomicLong processingSequenceNumber;


    public DocumentCreationDetailsProcessor(SolrQueryRequest req,
SolrQueryResponse rsp, UpdateRequestProcessor next, AtomicLong
processingSequenceNumber ) {
        super(next);

        this.req = req;
        this.rsp = rsp;
        this.next = next;

        this.processingSequenceNumber = processingSequenceNumber;

    }

    @Override
    public void processAdd(AddUpdateCommand cmd) throws IOException {

        SolrInputDocument solrInputDocument = cmd.getSolrInputDocument();

        solrInputDocument.addField("created_time_stamp_l",
System.currentTimeMillis());

        solrInputDocument.addField("created_processing_sequence_number_l",
processingSequenceNumber.incrementAndGet());

        String solrCoreName = null;
        String solrShardId = null;

        if (req != null
                && req.getCore() != null
                && req.getCore().getCoreDescriptor() != null
                ) {

            SolrCore solrCore = req.getCore();
            CoreDescriptor coreDesc = null;
            CloudDescriptor cloudDesc = null;

            if ( solrCore != null ) {
                solrCoreName = solrCore.getName();
                coreDesc = req.getCore().getCoreDescriptor();

                if (coreDesc != null) {

                    cloudDesc = coreDesc.getCloudDescriptor();
                }

                if (cloudDesc != null) {
                    solrShardId = cloudDesc.getShardId();
                }
            }
        }


        solrInputDocument.addField("created_by_solr_thread_id_l",
Thread.currentThread().getId());
        solrInputDocument.addField("created_by_solr_core_name_s",
solrCoreName);
        solrInputDocument.addField("created_by_solr_shard_id_s",
solrShardId);


        // pass it up the chain
        super.processAdd(cmd);
    }
}
//----------------------------------------------------------------



//----------------------------------------------------------------
//  Added the below for a bit of context
(http://wiki.apache.org/solr/SolrPlugins)
//----------------------------------------------------------------

mkdir /opt/solr/instances/test/collection1/lib
cp /home/user/download/test-solr-plugins-0.0.1.jar
/opt/solr/instances/test/collection1/lib/
chown root:tomcat7 /opt/solr/instances/test/collection1/lib/*

vim /opt/solr/instances/test/collection1/conf/solrconfig.xml
<updateRequestProcessorChain name="mychain">
	<processor
class="com.test.solr.plugins.DocumentCreationDetailsProcessorFactory">
	</processor>
	<processor class="solr.LogUpdateProcessorFactory" />
	<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>


vim /opt/solr/instances/test/collection1/conf/solrconfig.xml
<requestHandler name="/update" class="solr.UpdateRequestHandler">
	<lst name="defaults">
			<str name="update.chain">mychain</str>
	</lst>
</requestHandler>




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4045725.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.x auto-increment/sequence/counter functionality.

Posted by Upayavira <uv...@odoko.co.uk>.
If you want to mess with UpdateRequestProcessors, try the
ScriptUpdateProcessor, with which you can write your update logic in
Javascript. That would allow you to add your unique field. Use something
like timestamp+threadno+shardno and you'd have something unique
(assuming you can access those from Javascript).

Upayavira

On Wed, Mar 6, 2013, at 03:42 PM, Timothy Potter wrote:
> This sounds like a job for Zookeeper (distributed coordination is what it
> does).
> 
> Take a look at:
> http://zookeeper-user.578899.n2.nabble.com/Sequence-Number-Generation-With-Zookeeper-td5378618.html
> 
> On Wed, Mar 6, 2013 at 10:00 AM, mark12345
> <ma...@yahoo.com.au> wrote:
> > Appending a random value only reduces the chance of a collision (And I need
> > to ensure continuous uniqueness) and could hurt how the field is later
> > sorted.  I have not written a custom UpdateRequestProcessor before, is there
> > a way to incorporate a Singleton that ensures one instance across a cluster?
> > SolrCloud?
> >
> > I guess the main thing is that I want the value would also be kept unique
> > across a cluster of Solr instances.    As far as I know in Solr, the only
> > *free* uniqueness check is with the "<uniqueKey>id</uniqueKey>" declaration
> > in "schema.xml".  Are there other options that I should be considering?
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4045239.html
> > Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.x auto-increment/sequence/counter functionality.

Posted by Timothy Potter <th...@gmail.com>.
This sounds like a job for Zookeeper (distributed coordination is what it does).

Take a look at:
http://zookeeper-user.578899.n2.nabble.com/Sequence-Number-Generation-With-Zookeeper-td5378618.html

On Wed, Mar 6, 2013 at 10:00 AM, mark12345
<ma...@yahoo.com.au> wrote:
> Appending a random value only reduces the chance of a collision (And I need
> to ensure continuous uniqueness) and could hurt how the field is later
> sorted.  I have not written a custom UpdateRequestProcessor before, is there
> a way to incorporate a Singleton that ensures one instance across a cluster?
> SolrCloud?
>
> I guess the main thing is that I want the value would also be kept unique
> across a cluster of Solr instances.    As far as I know in Solr, the only
> *free* uniqueness check is with the "<uniqueKey>id</uniqueKey>" declaration
> in "schema.xml".  Are there other options that I should be considering?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4045239.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.x auto-increment/sequence/counter functionality.

Posted by mark12345 <ma...@yahoo.com.au>.
Appending a random value only reduces the chance of a collision (And I need
to ensure continuous uniqueness) and could hurt how the field is later
sorted.  I have not written a custom UpdateRequestProcessor before, is there
a way to incorporate a Singleton that ensures one instance across a cluster? 
SolrCloud?

I guess the main thing is that I want the value would also be kept unique
across a cluster of Solr instances.    As far as I know in Solr, the only
*free* uniqueness check is with the "<uniqueKey>id</uniqueKey>" declaration
in "schema.xml".  Are there other options that I should be considering?



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-auto-increment-sequence-counter-functionality-tp4045125p4045239.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.x auto-increment/sequence/counter functionality.

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

How about a custom UpdateRequestProcessor that uses milliseconds or even
nanoseconds and stores them in some field?  If that is enough resolution
and you still want to avoid collision, append a random letter/string/number
to it, a la <millis or nanos>_<extra stuff to make it unique>.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Mar 6, 2013 at 2:31 AM, marks1900-post01@yahoo.com.au <
marks1900-post01@yahoo.com.au> wrote:

>
> I am looking into how to add auto-increment/sequence/counter functionality
> to Solr 4.x. I specifically want to do this, so that I have numeric field
> which records the document insertion order that can be sorted against. This
> numeric field would have to be unique and not be allowed to change over
> time.  Unfortunately using a insertion "date" would
> provide numerous collisions.  Any feedback or ideas on an approach that
> would help me achieve this would be appreciated.
>
> I am thinking that this could be achieved multiple ways:
> * Via Remote Solr Document calls.  (A Solr Singleton for remote calls +
> Solr calls to get the current sequence value and then a call to increment
> the value )
> * A Solr Plugin (extend RequestHandlerBase - http://..../sequence?q=name&size=1000 and
> return the next sequence/counter number )
> * Using a standard RDBMS such as PostgreSQL.
> * Some special Solr/Lucene functionality that I don't know about.
>
> The closest information I could find is outlined here:
> http://lucene.472066.n3.nabble.com/counter-field-td3886549.html
>
>
> A bit more background:
>
> I am using Solr as a NoSQL solution with great text search capabilities.
>  Currently, I am inserting beans using SolrJ and each of these beans has an
> id which is comprised of bean string type (Such as "CUSTOMER", "BOOK",
> "STORE" ) concatenated with a unique bean type identifier string ( Customer
> - UUID.randomUUID().toString().toLowerCase(Locale.ENGLISH), Book - ISDN,
> Store - name).  For instance,
> "CUSTOMER-b245659b-825c-4357-aab0-6d592468889a", "BOOK-978-1782161325" or
> "STORE-TheUniquelyNamedStore".  Ideally I am aiming to add a numeric field
> to these beans that represents insertion position, that will then be used
> as a sorting field.
>