You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "liwen .apabi (李文)" <l....@FOUNDER.COM.CN> on 2015/05/11 11:26:22 UTC

答复: 答复: How to get the docs id after commit

You are right. I get last commit time and current commit time in the newsearcher listener, then query from last commit time to current commit time that I can get the newest committed docs.Thanks.

Best,
WenLi

-----邮件原件-----
发件人: Erick Erickson [mailto:erickerickson@gmail.com] 
发送时间: 2015年5月11日 9:47
收件人: solr-user@lucene.apache.org
主题: Re: 答复: How to get the docs id after commit

Not something really built into Solr. It's easy enough, at least
conceptually, to build in a "batch_id". The idea here would be that
every doc in each batch would have a unique id (really, something you
changed after each commit). That pretty much requires, though, that
you control the indexing carefully (we're probably talking SolrJ
here). There's no good way that I know to get this info after an
autocommit for instance. I suppose you could use a
TimestampUpdateProcessorFactory and keep "high water marks" so a query
like q=timestamp:[last_timestamp_I_checked TO most_recent_timestamp]
would do it. Even that, though, has some issues in SolrCloud because
each server's time may be slightly off. You can get around this by
placing the TimestampUpdateProcessorFactory in _front_ of the
distributed update processor in your update chain, but then you'd
really require that all updates be sent to the _same_ machine, or that
the commit intervals were guaranteed to be outside the clock skew on
your machines.

"Bottom line" is that you'd have to build it yourself, there's no OOB
functionality here. Even "all the docs that last committed" is
ambiguous. What about autocommits? Does "last committed" mean _just_
the ones between the last two autocommits? It seems like you really
want "all the docs committed since last time I asked". And for that,
you really need to control the mechanism yourself. Not only does Solr
not provide this OOB, I'm not even sure what it could be implemented
in a general case unless Solr became transactional.

Best,
Erick

On Sun, May 10, 2015 at 5:38 PM, liwen(李文).apabi <l....@founder.com.cn> wrote:
> Sorry. The "newest" means all the docs that last committed, I need to get ids of these docs to trigger another server to do something.
>
> -----邮件原件-----
> 发件人: Erick Erickson [mailto:erickerickson@gmail.com]
> 发送时间: 2015年5月10日 23:22
> 收件人: solr-user@lucene.apache.org
> 主题: Re: How to get the docs id after commit
>
> Not really. It's an ambiguous thing though, what's a "newest" document
> when a whole batch is committed at once? And in distributed mode, you
> can fire docs to any node in the cloud and they'll get to the right
> shard, but order is not guaranteed so "newest" is a fuzzy concept.
>
> I'd put a counter in my docs that I guaranteed was increasing and just
> q=*:*&rows=1&sort=timestamp desc. That should give you the most recent
> doc. Beware using a timestamp though if you're not absolutely sure
> that the clock times you use are comparable!
>
> Best,
> Erick
>
> On Sun, May 10, 2015 at 12:57 AM, liwen(李文).apabi <l....@founder.com.cn> wrote:
>> Hi, Solr Developers
>>
>>
>>
>>       I want to get the newest commited docs in the postcommit event, then nofity the other server which data can be used, but I can not find any way to get the newest docs after commited, so is there any way to do this?
>>
>>
>>
>>          Thank you.
>>
>>          Wen Li
>>
>>
>>
>