You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2013/11/20 16:39:39 UTC

Multiple data/index.YYYYMMDD.... dirs == bug?

Hi,

When full index replication is happening via SnapPuller, a temporary
"timestamped" index dir is created.

Questions:
1) Under normal circumstances could more than 1 timestamped index
directory ever be present?
2) Should there always be an the .../data/index directory present?

I'm asking because I see the following situation on one SolrCloud node:

$ du -ms /home/solr/data/*
1188367    /home/solr/data/index.20131118152402344
709050    /home/solr/data/index.20131119210950598
1    /home/solr/data/index.properties
1    /home/solr/data/replication.properties
3053    /home/solr/data/tlog

Note:
1) there are 2 timestamped directories
2) there is no data/index directory

According to SnapPuller, the timestamped index dir is a temporary dir
and should be removed after replication..... unless maybe some error
case is not being handled correctly and timestamped index dirs are
"leaking".

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

RE: Multiple data/index.YYYYMMDD.... dirs == bug?

Posted by Markus Jelsma <ma...@openindex.io>.

 
 
-----Original message-----
> From:Otis Gospodnetic <ot...@gmail.com>
> Sent: Wednesday 20th November 2013 16:40
> To: solr-user@lucene.apache.org
> Subject: Multiple data/index.YYYYMMDD.... dirs == bug?
> 
> Hi,
> 
> When full index replication is happening via SnapPuller, a temporary
> "timestamped" index dir is created.
> 
> Questions:
> 1) Under normal circumstances could more than 1 timestamped index
> directory ever be present?

No, except during replication.
> 2) Should there always be an the .../data/index directory present?

No, the directory can also be index.<TIME>. It is pointed to from index.properties.

> 
> I'm asking because I see the following situation on one SolrCloud node:
> 
> $ du -ms /home/solr/data/*
> 1188367    /home/solr/data/index.20131118152402344
> 709050    /home/solr/data/index.20131119210950598
> 1    /home/solr/data/index.properties
> 1    /home/solr/data/replication.properties
> 3053    /home/solr/data/tlog
> 
> Note:
> 1) there are 2 timestamped directories
> 2) there is no data/index directory

This is not good but you can safely remove all that are not in index.properties, usually keep only the newest.

> 
> According to SnapPuller, the timestamped index dir is a temporary dir
> and should be removed after replication..... unless maybe some error
> case is not being handled correctly and timestamped index dirs are
> "leaking".

It can happen when Solr dies, they are not removed on start up.

> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
> 

Re: Multiple data/index.YYYYMMDD.... dirs == bug?

Posted by Mark Miller <ma...@gmail.com>.
There might be a JIRA issue out there about replication not cleaning up on all fails - e.g. on startup or something - kind of rings a bell…if so, it will be addressed eventually.

Otherwise, you might have two for a bit just due to multiple searchers being around at once for a while or something - but it should not be something that lasts a long time.

- Mark

On Nov 20, 2013, at 11:50 AM, Daniel Collins <da...@gmail.com> wrote:

> In our experience (with SolrCloud), if you trigger a full replication (e.g.
> new replica), you get the "timestamp" directory, it never renames back to
> just "index".  Since index.properties gives you the name of the real
> directory, we had never considered that a problem/bug.  Why bother with the
> rename afterwards, it just seems unnecessary?
> 
> So to answer your questions:
> 
> 1) Not in normal circumstances, but if replication crashes or stops, it
> might leave it hanging.
> 2) No, as long as there is an index.properties file.
> 
> Not official answers, but that's our experience.
> 
> 
> On 20 November 2013 15:55, michael.boom <my...@yahoo.com> wrote:
> 
>> I encountered this problem often when i restarted a solr instance before
>> replication was finished more than once.
>> I would then have multiple timestamped directories and the index directory.
>> However, the index.properties points to the active index directory.
>> 
>> The moment when the replication succeeded the temp dir is renamed "index"
>> and the index.properties is gone.
>> 
>> On the situation when the index is missing, not sure about that. Maybe this
>> happens when the replica is too old and an old-school replication is done.
>> 
>> 
>> 
>> -----
>> Thanks,
>> Michael
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Multiple-data-index-YYYYMMDD-dirs-bug-tp4102163p4102168.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 


Re: Multiple data/index.YYYYMMDD.... dirs == bug?

Posted by Daniel Collins <da...@gmail.com>.
In our experience (with SolrCloud), if you trigger a full replication (e.g.
new replica), you get the "timestamp" directory, it never renames back to
just "index".  Since index.properties gives you the name of the real
directory, we had never considered that a problem/bug.  Why bother with the
rename afterwards, it just seems unnecessary?

So to answer your questions:

1) Not in normal circumstances, but if replication crashes or stops, it
might leave it hanging.
2) No, as long as there is an index.properties file.

Not official answers, but that's our experience.


On 20 November 2013 15:55, michael.boom <my...@yahoo.com> wrote:

> I encountered this problem often when i restarted a solr instance before
> replication was finished more than once.
> I would then have multiple timestamped directories and the index directory.
> However, the index.properties points to the active index directory.
>
> The moment when the replication succeeded the temp dir is renamed "index"
> and the index.properties is gone.
>
> On the situation when the index is missing, not sure about that. Maybe this
> happens when the replica is too old and an old-school replication is done.
>
>
>
> -----
> Thanks,
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multiple-data-index-YYYYMMDD-dirs-bug-tp4102163p4102168.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Multiple data/index.YYYYMMDD.... dirs == bug?

Posted by "michael.boom" <my...@yahoo.com>.
I encountered this problem often when i restarted a solr instance before
replication was finished more than once.
I would then have multiple timestamped directories and the index directory. 
However, the index.properties points to the active index directory.

The moment when the replication succeeded the temp dir is renamed "index"
and the index.properties is gone.  

On the situation when the index is missing, not sure about that. Maybe this
happens when the replica is too old and an old-school replication is done.



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/Multiple-data-index-YYYYMMDD-dirs-bug-tp4102163p4102168.html
Sent from the Solr - User mailing list archive at Nabble.com.