You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Renee Sun <re...@mcafee.com> on 2015/09/03 21:52:02 UTC

any easy way to find out when a core's index physical file has been last updated?

I will need to figure out when was last index activity on a core. 

I can't use [corename]/index timestamp, because it only reflex the file
deletion or addition, not file update.

I am curious if any solr core admin RESTful api sort of thing thing I can
use to get last modified timestamp on physical index ...

Thanks
Renee



--
View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Erick Erickson <er...@gmail.com>.
I'm pretty sure, soft commits didn't come along until
4.0....

Best,
Erick

On Thu, Sep 3, 2015 at 4:24 PM, Erik Hatcher <er...@gmail.com> wrote:
> /admin/luke can give you a lastModified time stamp.  The Solr admin UI makes a request to display this on the core overview screen, making a request like this:  http://localhost:8983/solr/<core>/admin/luke?wt=json&show=index&numTerms=0 <http://localhost:8983/solr/%3Ccore%3E/admin/luke?wt=json&show=index&numTerms=0>
>
> and the index section of the response has this:
> lastModified: "2015-09-03T15:17:22.708Z"
> Does that help?
>
> —
> Erik Hatcher, Senior Solutions Architect
> http://www.lucidworks.com <http://www.lucidworks.com/>
>
>
>
>
>> On Sep 3, 2015, at 3:52 PM, Renee Sun <re...@mcafee.com> wrote:
>>
>> I will need to figure out when was last index activity on a core.
>>
>> I can't use [corename]/index timestamp, because it only reflex the file
>> deletion or addition, not file update.
>>
>> I am curious if any solr core admin RESTful api sort of thing thing I can
>> use to get last modified timestamp on physical index ...
>>
>> Thanks
>> Renee
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Erik Hatcher <er...@gmail.com>.
/admin/luke can give you a lastModified time stamp.  The Solr admin UI makes a request to display this on the core overview screen, making a request like this:  http://localhost:8983/solr/<core>/admin/luke?wt=json&show=index&numTerms=0 <http://localhost:8983/solr/%3Ccore%3E/admin/luke?wt=json&show=index&numTerms=0>

and the index section of the response has this:  
lastModified: "2015-09-03T15:17:22.708Z"
Does that help?

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com <http://www.lucidworks.com/>




> On Sep 3, 2015, at 3:52 PM, Renee Sun <re...@mcafee.com> wrote:
> 
> I will need to figure out when was last index activity on a core. 
> 
> I can't use [corename]/index timestamp, because it only reflex the file
> deletion or addition, not file update.
> 
> I am curious if any solr core admin RESTful api sort of thing thing I can
> use to get last modified timestamp on physical index ...
> 
> Thanks
> Renee
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Renee Sun <re...@mcafee.com>.
thank you! I will look into that.

Also I came across autosoftcommit, it seems to be useful... we are still
using solr 3.5, I hope autosoftcommit is included in solr 3.5...



--
View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227098.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Put the IgnoreCommit on the default handler to stop clients from
forcing the commit:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/IgnoreCommitOptimizeUpdateProcessorFactory.html

Then have a separate normal handler and send your real commits through
that if you need an emergency option.

Regards,
   Alex.

----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 3 September 2015 at 18:51, Renee Sun <re...@mcafee.com> wrote:
> Walter, thanks!
>
> I will do some tests using auto commit, I guess if there is requirement for
> console UI to make documents searchable in 10 minutes, we will need to use
> the autocommit with maxTime instead of maxDoc.
>
> I wonder if in case we need to do a 'force commit', the autocommit will not
> get in the way by its not yet its maxTime, as long as there are updates?
>
> thanks
> Renee
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227091.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Renee Sun <re...@mcafee.com>.
Walter, thanks! 

I will do some tests using auto commit, I guess if there is requirement for
console UI to make documents searchable in 10 minutes, we will need to use
the autocommit with maxTime instead of maxDoc.

I wonder if in case we need to do a 'force commit', the autocommit will not
get in the way by its not yet its maxTime, as long as there are updates?

thanks
Renee



--
View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227091.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Walter Underwood <wu...@wunderwood.org>.
Instead of writing new code, you could configure an autocommit interval in Solr. That already does what you want, no more than one commit in the interval and no commits if there were no adds or deletes.

Then the clients would never need to commit.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Sep 3, 2015, at 3:20 PM, Renee Sun <re...@mcafee.com> wrote:

> this make sense now. Thanks!
> 
> why I got on this idea is:
> 
> In our system we have large customer base and lots of cores, each customer
> may have multiple cores.
> 
> there are also a lot of processes running in our system processing the data
> for these customers, and once a while, they would ask a center piece of
> webapp that we wrote to commit on a core.
> 
> In this center piece webapp, I deploy it with solr in same tomcat container,
> its task is mainly a wrapper around the local cores to manage monitoring of
> the core size, merge cores if needed etc. I also have controls over the
> commit requests this webapp receives from time to time, try to space the
> commit out. In the case where multiple processes asking commits to the same
> core , my webapp will guarantee only one commit in x mintues interval get
> executed and drop the other commit requests.
> 
> Now I just discovered some of the processes send in large amount of commit
> requests on many cores which never had any changes in the last interval.
> This was due to a bug in those other processes but the programmers there are
> behind on fixing the issue. this triggers me to the idea of verifying the
> incoming commit requests by checking the physical index files to see if any
> updates really occurred in the last interval.
> 
> I was searching for any solr core admin RESTful api to get some meta data
> about the core such as 'last modified timestamp' ... but did not have any
> luck. 
> 
> I thought I could use 'index' folder timestamp to get accurate last modified
> time, but with what you just explained, it would not be the case. I will
> have to traverse through the files in the folder and figure out the last
> modified file.
> 
> any input will be appreciated. Thanks a lot!
> Renee
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227084.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Renee Sun <re...@mcafee.com>.
Thanks a lot Shawn, for the details, it is very helpful !






--
View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227274.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/4/2015 10:14 AM, Renee Sun wrote:
> I will start use autocommit with confidence it will greatly help reducing
> the false commit requests (a lot) from processes in our system.
>
> Regarding the solr version, it is actually a big problem we have to resolve
> sooner or later.
>
> When we upgraded to Solr 3.5 about 2 years ago, to avoid re-index our large
> data, we used :
>
> <luceneMatchVersion>LUCENE_29</luceneMatchVersion>
>
> which seems to work fine except a lot of such warnings in catalina.out:
>
> WARNING: StopFilterFactory is using deprecated LUCENE_29 emulation. You
> should at some point declare and reindex to at least 3.0, because 2.x
> emulation is deprecated and will be removed in 4.0

Setting luceneMatchVersion to an older version, contrary to most
people's expectations, does NOT change the index format.  It basically
turns on a compatibility mode for analysis components like
StopFilterFactory so that the created terms work like the older version,
if the code has a check for that older version that produces different
behavior.  Basically you use LMV to disable analysis bugfixes that don't
work for you.  In your case, any index segments created since your
upgrade are Lucene 3.5 format, not Lucene 2.9.

> We have a built a infrastructure which scales well using solr, is it a good
> practice to upgrade to solr 4.x without using solrCloud if it is possible at
> all?

Almost all of my Solr servers (running 4.x, we have not yet upgraded to
5.x) are NOT running in cloud mode.  Although it would make some aspects
of maintaining my index easier, I would lose some of the functionality
if I upgraded to a fully replicated SolrCloud setup.

Thanks,
Shawn


Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Renee Sun <re...@mcafee.com>.
Shawn, thanks so much, and this user forum is so helpful!

I will start use autocommit with confidence it will greatly help reducing
the false commit requests (a lot) from processes in our system.

Regarding the solr version, it is actually a big problem we have to resolve
sooner or later.

When we upgraded to Solr 3.5 about 2 years ago, to avoid re-index our large
data, we used :

<luceneMatchVersion>LUCENE_29</luceneMatchVersion>

which seems to work fine except a lot of such warnings in catalina.out:

WARNING: StopFilterFactory is using deprecated LUCENE_29 emulation. You
should at some point declare and reindex to at least 3.0, because 2.x
emulation is deprecated and will be removed in 4.0

We have a built a infrastructure which scales well using solr, is it a good
practice to upgrade to solr 4.x without using solrCloud if it is possible at
all?

thanks!
Renee 




--
View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227220.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/3/2015 10:00 PM, Renee Sun wrote:
> unfortunately we are still using solr 3.5 with lucene 2.9.3 :-( If we
> upgrade to solr 4.x it will require upgrade of lucene away from 2.x.x
> which will need re-index of all our data. With current measures, it
> might take about 8-9 for the data we have to be re-indexed, a big
> concern.

Solr 3.5 should include Lucene 3.5, and I expect it might have some
serious problems running if you tried to replace the Lucene jars in the
.war file with 2.x versions.  What evidence do you have that it's
running Lucene 2.x?  Solr 4.x can use a 3.x index.

I believe that Solr 1.5 (which was never released) included a 2.9
version of Lucene.  Solr 1.4 certainly did.

> so to understand autocommit better, since it says:
> 
> <maxTime>300000</maxTime>
> 
> I want to know
> 
> 1) if I have a batch of 2000 documents being added to index, it may
> span of 3 minutes to index all 2000 document. Will the autocommit
> defined above kick off a commit 5 minutes after the first of 2000
> document being indexed?
> 
> 2) the autocommit will NOT commit if there is no update in last 5
> minutes?
> 
> 3) will maxTime counts in the document deletion or it only cares
> about adding a document?  In another word, should I use
> maxPendingDeletes for document deletion?


I cannot find any complete information about what maxPendingDeletes
does, except that it can affect memory usage.  I believe that this
setting is no longer there in 4.x, which is probably why I can't find
anything.

The way I understand autocommit relative to your questions:  With a
maxTime of five minutes, if it takes three minutes to index your
documents, the commit should happen two minutes after the last document
is indexed.  *Any* change to the index will start the autoCommit timer,
including deletes.

If the index hasn't been updated, autoCommit will not do anything.

Thanks,
Shawn

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Renee Sun <re...@mcafee.com>.
unfortunately we are still using solr 3.5 with lucene 2.9.3 :-( If we upgrade
to solr 4.x it will require upgrade of lucene away from 2.x.x which will
need re-index of all our data. With current measures, it might take about
8-9 for the data we have to be re-indexed, a big concern.

so to understand autocommit better, since it says:

      <maxTime>300000</maxTime> 

I want to know 

1) if I have a batch of 2000 documents being added to index, it may span of
3 minutes to index all 2000 document. Will the autocommit defined above kick
off a commit 5 minutes after the first of 2000 document being indexed? 

2) the autocommit will NOT commit if there is no update in last 5 minutes? 

3) will maxTime counts in the document deletion or it only cares about
adding a document?  In another word, should I use maxPendingDeletes for
document deletion?

thanks
Renee




--
View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227132.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/3/2015 4:20 PM, Renee Sun wrote:
> Now I just discovered some of the processes send in large amount of commit
> requests on many cores which never had any changes in the last interval.

If you are using a Solr version released in the last two years (at least
version 4.4), then a commit on an index that hasn't actually changed
will basically be a null operation.

https://issues.apache.org/jira/browse/SOLR-4965

Thanks,
Shawn


Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Renee Sun <re...@mcafee.com>.
this make sense now. Thanks!

why I got on this idea is:

In our system we have large customer base and lots of cores, each customer
may have multiple cores.

there are also a lot of processes running in our system processing the data
for these customers, and once a while, they would ask a center piece of
webapp that we wrote to commit on a core.

In this center piece webapp, I deploy it with solr in same tomcat container,
its task is mainly a wrapper around the local cores to manage monitoring of
the core size, merge cores if needed etc. I also have controls over the
commit requests this webapp receives from time to time, try to space the
commit out. In the case where multiple processes asking commits to the same
core , my webapp will guarantee only one commit in x mintues interval get
executed and drop the other commit requests.

Now I just discovered some of the processes send in large amount of commit
requests on many cores which never had any changes in the last interval.
This was due to a bug in those other processes but the programmers there are
behind on fixing the issue. this triggers me to the idea of verifying the
incoming commit requests by checking the physical index files to see if any
updates really occurred in the last interval.

I was searching for any solr core admin RESTful api to get some meta data
about the core such as 'last modified timestamp' ... but did not have any
luck. 

I thought I could use 'index' folder timestamp to get accurate last modified
time, but with what you just explained, it would not be the case. I will
have to traverse through the files in the folder and figure out the last
modified file.

any input will be appreciated. Thanks a lot!
Renee



--
View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227084.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Upayavira <uv...@odoko.co.uk>.
what matters here is the time between hard commits. If you do infrequent
hard commits, then it is possible that files will be written to over
that time. Those files are part-complete segment files, and are not yet
referred to by the segments file, and thus are not really yet a part of
the index. A commit will seal off those files, reference them from the
segments file and from that point on they will not be touched. So you
are right - a file may be added to before a commit, but I bet that even
there - once the commit happens, those files are renamed to put them
into the right place.

Again, my question is why do you want to know this timestamp? There is
probably an easier way to achieve what you are trying to do.

Upayavira

On Thu, Sep 3, 2015, at 10:39 PM, Toke Eskildsen wrote:
> Renee Sun <re...@mcafee.com> wrote:
> > [core]/index is a folder holding index files.
> 
> Agree so far.
> 
> > But index files in that folder is not just being deleted or added, they are
> > also being updated.
> 
> So you say. That contradicts my understanding, as well as the first 10
> hits in Google for "lucene segment files immutable". The one file that is
> updated is "segments.gen", which is tiny and keeps track of which
> segments makes up the overall index.
> 
> > on Linux file system, the folder's timestamp will only be updated if the
> > files in it is being added or deleted, NOT updated.  So if I check the index
> > folder, it will not be accurately reflexing the last time the index files
> > are updated.
> 
> Just watch index/segments.gen. That is precise as it tracks when the
> logical index was last updated, whereas segment files currently being
> written are with later timestamps but not part of the index yet.
> 
> - Toke Eskildsen

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Renee Sun <re...@mcafee.com> wrote:
> [core]/index is a folder holding index files.

Agree so far.

> But index files in that folder is not just being deleted or added, they are
> also being updated.

So you say. That contradicts my understanding, as well as the first 10 hits in Google for "lucene segment files immutable". The one file that is updated is "segments.gen", which is tiny and keeps track of which segments makes up the overall index.

> on Linux file system, the folder's timestamp will only be updated if the
> files in it is being added or deleted, NOT updated.  So if I check the index
> folder, it will not be accurately reflexing the last time the index files
> are updated.

Just watch index/segments.gen. That is precise as it tracks when the logical index was last updated, whereas segment files currently being written are with later timestamps but not part of the index yet.

- Toke Eskildsen

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Renee Sun <re...@mcafee.com>.
[core]/index is a folder holding index files.

But index files in that folder is not just being deleted or added, they are
also being updated.

on Linux file system, the folder's timestamp will only be updated if the
files in it is being added or deleted, NOT updated.  So if I check the index
folder, it will not be accurately reflexing the last time the index files
are updated.



--
View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227058.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Renee Sun <re...@mcafee.com> wrote:
> But I did a test with heavy indexing on going, and observed the index file
> in [core]/index with a latest updated timestamp keep growing for about 7
> minutes...

That is not a file, but the folder that holds the immutable segment files. What you observe is segments being written, which updates the folder timestamp.

- Toke Eskilsen

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Renee Sun <re...@mcafee.com>.
hum... at beginning I also assumed segment index files will only be deleted
or added, but not modified.

But I did a test with heavy indexing on going, and observed the index file
in [core]/index with a latest updated timestamp keep growing for about 7
minutes... not sure if the new write caused any merge and the file being
updated has pretty big size, so it could be merging...  but that does mean
index files can be modified.

thanks
Renee



--
View this message in context: http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227049.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: any easy way to find out when a core's index physical file has been last updated?

Posted by Upayavira <uv...@odoko.co.uk>.
in a lucene index, files are never updated, only ever added or deleted.

You may well be able to use the ReplicationHandler to answer that
question for you, as it can tell stuff about an index for the purpose of
replicating it - I'm not sure what precisely it tells.

Why do you need to know this?

Upayavira

On Thu, Sep 3, 2015, at 08:52 PM, Renee Sun wrote:
> I will need to figure out when was last index activity on a core. 
> 
> I can't use [corename]/index timestamp, because it only reflex the file
> deletion or addition, not file update.
> 
> I am curious if any solr core admin RESTful api sort of thing thing I can
> use to get last modified timestamp on physical index ...
> 
> Thanks
> Renee
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044.html
> Sent from the Solr - User mailing list archive at Nabble.com.