You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by phanichaitanya <pv...@gmail.com> on 2013/09/12 18:07:45 UTC

Get the commit time of a document in Solr

I'd like to know when a document is committed in Solr vs. the indexed time. 

For indexed time, I can add a field as : <field name="indexed_time"
type="date" default="NOW" indexed="true" stored="true" />.

If I have say, 10 million docs indexed and I want to know the actual commit
time of the document which makes it searchable. The problem is to just find
the time when a document can be searchable which will be after it is
committed ? (I don't want to do any soft commits).

If there is a way to know this, please let me know so that I'd like to know
more details based on it.



--
View this message in context: http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get the commit time of a document in Solr

Posted by Shawn Heisey <so...@elyograg.org>.
On 9/12/2013 11:04 AM, phanichaitanya wrote:
> So, now I want to know when that document becomes searchable or when it is
> committed. I've the following scenario:
> 
> 1) Indexing starts at say 9:00 AM - with the above additions to the
> schema.xml I'll know the indexed time of each document I send to Solr via
> the update handler. Say 9:01, 9:02 and so on ... lets say I send a document
> for every second between 9 - 9:30 AM and it makes it 30*60 = 1800 docs
> 2) Now at 9:30 AM, I issue a hard commit and now I'll be able to search
> these 1800 documents which is fine.
> 3) Now I want to know that I can search these 1800 documents only at >=9:30
> AM but not < 9:30 AM as I did not do a hard commit before 9:30 AM. 
> 
> In order to know that, is there a way in Solr rather than some application
> keeping track of the documents it sends to Solr between any two commits. The
> reason I'm asking is, if there are say two parallel processes indexing to
> the same index and one process issues a commit - then whatever documents
> process two indexed until that point of time would also be committed right ?
> Now if I keep track of commit times in each process it doesn't reflect the
> true commit times as they are inter-twined.

>From what I understand, if you use the default of NOW for a field in
your schema, then all documents indexed in that request will have the
timestamp of the time that indexing started.

Assuming what I understand is the way it actually works, if you want the
time to reflect anything even close to commit time, then you will need
to send very small batches and you will need to commit after every
batch.  If you are indexing very quickly, you'll probably want those
commits to be soft commits.

You'll also want to have an autoCommit set up to do hard commits less
frequently with openSearcher=false, or you'll run into the problem
described at the link below.  There is a good autoCommit example there:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

I've heard (but have not tested) that with the NOW default, large
imports with the dataimporthandler will all have the timestamp of when
the DIH request started, no matter what you do with autoCommit or
autoSoftCommit.

Thanks,
Shawn


Re: Get the commit time of a document in Solr

Posted by Raymond Wiker <rw...@gmail.com>.
On Sep 12, 2013, at 20:55 , phanichaitanya <pv...@gmail.com> wrote:
> Apologies again. But here is another try :
> 
> I want to make sure that documents that are indexed are committed in say an
> hour. I agree that if you pass commitWithIn params and the like will make
> sure of that based on the time configurations we set. But, I want to make
> sure that the document is really committed within whatever time we set using
> commitWithIn.
> 
> It's a question asking for proof that Solr commits within that time if we
> add commitWithIn parameter to the configuration.
> 
> That is about commitWithIn parameter option that you suggested.
> 
> Now is there a way to explicitly get all the documents that are committed
> when a hard commit request is issued ? This might not make sense but we are
> pondered with that question.
> 

If you have a timestamp field that defaults to NOW, you could do queries for a single document (q=*), ranked by descending timestamp. If you're  feeding constantly, and run these queries regularly, you should be able to get some sort of feel for the latency in the system.

Re: Get the commit time of a document in Solr

Posted by phanichaitanya <pv...@gmail.com>.
Thanks Jack, Shawn and Raymond.

Shawn - I've to do it with every commit. So I guess apparently there is no
way apart from writing custom plugins to Solr.

I'll look into the pointers you suggested.

Regards,
Phani.



-----
Phani Chaitanya
--
View this message in context: http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089722.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get the commit time of a document in Solr

Posted by Shawn Heisey <so...@elyograg.org>.
On 9/12/2013 12:55 PM, phanichaitanya wrote:
> I want to make sure that documents that are indexed are committed in say an
> hour. I agree that if you pass commitWithIn params and the like will make
> sure of that based on the time configurations we set. But, I want to make
> sure that the document is really committed within whatever time we set using
> commitWithIn.
> 
> It's a question asking for proof that Solr commits within that time if we
> add commitWithIn parameter to the configuration.
> 
> That is about commitWithIn parameter option that you suggested.
> 
> Now is there a way to explicitly get all the documents that are committed
> when a hard commit request is issued ? This might not make sense but we are
> pondered with that question.

If these are ongoing requirements that you need to with every commit or
with a large subset of commits, then I don't think there is any way to
do it without writing custom plugins for Solr.

If you are just trying to prove to someone that Solr is doing what you
say it is, then you can do some simple testing:

Send an update request with as many documents as you want to test, and
include commit=true on the request.  If you are planning to use
commitWithin, also include SoftCommit=true, because commitWithin is a
soft commit.

Time how long it takes for the update request to complete.  That's
approximately how long it will take for a "real" update/commit to
happen.  There will be some extra time for the indexing itself, but
unless the document count is absolutely enormous, it shouldn't matter
too much.

If you want to test just the commit time, then (after making sure
nothing else is sending updates or commits) send the update without any
commit parameters, then send a commit request by itself and time how
long the commit request takes.

With enough RAM for proper OS disk caching, commits should be very fast
even on an index with 10 million documents.  Here is a wiki page that
has a small amount of discussion about slow commits:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits

Thanks,
Shawn


Re: Get the commit time of a document in Solr

Posted by phanichaitanya <pv...@gmail.com>.
Thanks Otis. I'll look into it if I can use it to solve my problem.




-----
Phani Chaitanya
--
View this message in context: http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089949.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get the commit time of a document in Solr

Posted by Otis Gospodnetic <ot...@gmail.com>.
Solr admin exposes time of last commit. You can use that.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Sep 12, 2013 3:22 PM, "phanichaitanya" <pv...@gmail.com> wrote:

> Apologies again. But here is another try :
>
> I want to make sure that documents that are indexed are committed in say an
> hour. I agree that if you pass commitWithIn params and the like will make
> sure of that based on the time configurations we set. But, I want to make
> sure that the document is really committed within whatever time we set
> using
> commitWithIn.
>
> It's a question asking for proof that Solr commits within that time if we
> add commitWithIn parameter to the configuration.
>
> That is about commitWithIn parameter option that you suggested.
>
> Now is there a way to explicitly get all the documents that are committed
> when a hard commit request is issued ? This might not make sense but we are
> pondered with that question.
>
>
>
> -----
> Phani Chaitanya
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089687.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Get the commit time of a document in Solr

Posted by phanichaitanya <pv...@gmail.com>.
Apologies again. But here is another try :

I want to make sure that documents that are indexed are committed in say an
hour. I agree that if you pass commitWithIn params and the like will make
sure of that based on the time configurations we set. But, I want to make
sure that the document is really committed within whatever time we set using
commitWithIn.

It's a question asking for proof that Solr commits within that time if we
add commitWithIn parameter to the configuration.

That is about commitWithIn parameter option that you suggested.

Now is there a way to explicitly get all the documents that are committed
when a hard commit request is issued ? This might not make sense but we are
pondered with that question.



-----
Phani Chaitanya
--
View this message in context: http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089687.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get the commit time of a document in Solr

Posted by Jack Krupansky <ja...@basetechnology.com>.
Sorry, but all you've done is reshuffle your previous statements but without 
telling us about the actual problem that you are trying to solve!

Repeating myself: You, the application developer can send a hard commit any 
time you want to assure that documents are searchable. Maybe not every 
millisecond, but, say, once a second with a soft commit and once a minute 
for a hard commit, using "commit within" to minimize commits when multiple 
processes are indexing data.

AFAICT, no application should ever have to care when a document is actually 
committed - and you have control with commit, anyway.

You the application developer can "tune" the commit interval to balance 
searchability and overall efficiency. There shouldn't be any problem there, 
given the variety of commit methods that Solr supports, but you have to make 
the choices.

So, what's the problem you are trying to solve? You still haven't 
articulated it.

It sounds as if you are trying to solve a non-problem. But, we can't be sure 
since you haven't articulated what the actual problem (if any) really is.

-- Jack Krupansky

-----Original Message----- 
From: phanichaitanya
Sent: Thursday, September 12, 2013 1:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Get the commit time of a document in Solr

Hi Jack,

  Sorry, I was not clear earlier. What I'm trying to achieve is :

I want to know when a document is committed (hard commit). There can be a
lot of time lapse (1 hour or more) between the time you indexed that
document vs you issue a commit in my case. Now, I exactly want to know when
a document is committed.

In my previous example all 1800 docs are committed at 9:30 AM and I want to
know that time for those 1800 docs. In other batch it'll be some other time.

The use-case is I've have more than 1 process sending the update requests to
Solr and each of those process has a separate commit step and I want to know
the commit time of the documents that were committed when I gave a commit
request.

I hope I'm clear now - please let me know if I'm not.



-----
Phani Chaitanya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089662.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Get the commit time of a document in Solr

Posted by phanichaitanya <pv...@gmail.com>.
Hi Jack,

  Sorry, I was not clear earlier. What I'm trying to achieve is :

I want to know when a document is committed (hard commit). There can be a
lot of time lapse (1 hour or more) between the time you indexed that
document vs you issue a commit in my case. Now, I exactly want to know when
a document is committed.

In my previous example all 1800 docs are committed at 9:30 AM and I want to
know that time for those 1800 docs. In other batch it'll be some other time.

The use-case is I've have more than 1 process sending the update requests to
Solr and each of those process has a separate commit step and I want to know
the commit time of the documents that were committed when I gave a commit
request.

I hope I'm clear now - please let me know if I'm not. 



-----
Phani Chaitanya
--
View this message in context: http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089662.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get the commit time of a document in Solr

Posted by Jack Krupansky <ja...@basetechnology.com>.
Slow down, back up, and now tell us what problem (if any!) you are really 
trying to solve. Don't leap to a proposed solution before you clearly state 
the problem to be solved.

First, why do you think there is any problem at all?

Or, what are you really trying to achieve?

-- Jack Krupansky

-----Original Message----- 
From: phanichaitanya
Sent: Thursday, September 12, 2013 1:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Get the commit time of a document in Solr

So, now I want to know when that document becomes searchable or when it is
committed. I've the following scenario:

1) Indexing starts at say 9:00 AM - with the above additions to the
schema.xml I'll know the indexed time of each document I send to Solr via
the update handler. Say 9:01, 9:02 and so on ... lets say I send a document
for every second between 9 - 9:30 AM and it makes it 30*60 = 1800 docs
2) Now at 9:30 AM, I issue a hard commit and now I'll be able to search
these 1800 documents which is fine.
3) Now I want to know that I can search these 1800 documents only at >=9:30
AM but not < 9:30 AM as I did not do a hard commit before 9:30 AM.

In order to know that, is there a way in Solr rather than some application
keeping track of the documents it sends to Solr between any two commits. The
reason I'm asking is, if there are say two parallel processes indexing to
the same index and one process issues a commit - then whatever documents
process two indexed until that point of time would also be committed right ?
Now if I keep track of commit times in each process it doesn't reflect the
true commit times as they are inter-twined.



-----
Phani Chaitanya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089638.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Get the commit time of a document in Solr

Posted by phanichaitanya <pv...@gmail.com>.
So, now I want to know when that document becomes searchable or when it is
committed. I've the following scenario:

1) Indexing starts at say 9:00 AM - with the above additions to the
schema.xml I'll know the indexed time of each document I send to Solr via
the update handler. Say 9:01, 9:02 and so on ... lets say I send a document
for every second between 9 - 9:30 AM and it makes it 30*60 = 1800 docs
2) Now at 9:30 AM, I issue a hard commit and now I'll be able to search
these 1800 documents which is fine.
3) Now I want to know that I can search these 1800 documents only at >=9:30
AM but not < 9:30 AM as I did not do a hard commit before 9:30 AM. 

In order to know that, is there a way in Solr rather than some application
keeping track of the documents it sends to Solr between any two commits. The
reason I'm asking is, if there are say two parallel processes indexing to
the same index and one process issues a commit - then whatever documents
process two indexed until that point of time would also be committed right ?
Now if I keep track of commit times in each process it doesn't reflect the
true commit times as they are inter-twined.



-----
Phani Chaitanya
--
View this message in context: http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089638.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get the commit time of a document in Solr

Posted by Jack Krupansky <ja...@basetechnology.com>.
Yes, the document will be searchable after it is committed.

Although you can also do auto commits and commitWithin which do not 
guarantee immediate visibility of index changes, you can do a hard commit 
any time you want to make a document searchable.

-- Jack Krupansky

-----Original Message----- 
From: phanichaitanya
Sent: Thursday, September 12, 2013 12:07 PM
To: solr-user@lucene.apache.org
Subject: Get the commit time of a document in Solr

I'd like to know when a document is committed in Solr vs. the indexed time.

For indexed time, I can add a field as : <field name="indexed_time"
type="date" default="NOW" indexed="true" stored="true" />.

If I have say, 10 million docs indexed and I want to know the actual commit
time of the document which makes it searchable. The problem is to just find
the time when a document can be searchable which will be after it is
committed ? (I don't want to do any soft commits).

If there is a way to know this, please let me know so that I'd like to know
more details based on it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624.html
Sent from the Solr - User mailing list archive at Nabble.com.