You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mathieu Menard <Ma...@realdolmen.com> on 2019/02/08 13:19:06 UTC

Solr Index Size after reindex

Hello,

I would like to have your point of view about an observation we have made on our two alfresco install (Production and Staging environment) and more specifically on the size of our solr indexes on these two environments.

Regularly we do a rsync between the Production and the Staging environment, we make a copy of the Alfresco's DB and a copy of the entire contenstore after that we reindex all the alfresco content.

We have noticed that for the production environment we have 19 Gb of indexes while in the staging we have "only" 11. Gb of indexes. We have some difficulties to understand this difference because we assume that the indexes optimization in the same for a full reindex or for the normal use of solr.

I've verified the configuration between the two solr instances and I don't see any differences could you help me to better understand  this phenomenon.

Here you can find some information about our two environment, if you need more details, I will give you as soon as possible:



PRODUCTION

STAGING

Alfresco version

5.1.1.4

5.1.1.4

Solr Version

[cid:image003.jpg@01D4BFB9.3D87C530]

[cid:image009.jpg@01D4BFB9.3D87C530]

Java version

[cid:image007.png@01D4BFB7.5CE9BED0]

[cid:image008.png@01D4BFB7.5CE9BED0]

Linux Machine

See Staging_caracteristics.txt file in attachment

See Staging_caracteristics.txt file in attachment


Please let me know if you any other information I will sent it to you rapidly.

Kind Regards

Matthieu


Re: Solr Index Size after reindex

Posted by David Hastings <ha...@gmail.com>.
The other thing I would be curious about is in your reindexing process, do
you clear out the entire index before hand?  if so perhaps there is content
missing/moved

On Thu, Feb 14, 2019 at 11:07 AM Erick Erickson <er...@gmail.com>
wrote:

> Basically, this is not possible ;). Therefore there's something I
> don't understand....
>
> There's nothing anywhere except what's in the index. By that I mean that
> _if_
> you copy an index (the data directory and children) from one place to
> another,
> that's all there is. No information about what's in the index is stored
> anywhere
> else. So there are a couple of possibilities I see:
>
> 1> Your rsync isn't doing what you think. By that I mean that "somehow" it
> isn't
> copying segments (perhaps with the same name, although the size and time
> checks would make it extremely unlikely to skip one). What happens if
> you _delete_ the data index on your target system first?
>
> 2> I'm not entirely sure what happens if there are multiple
> "segments_n" files. in
> the index. That file "points" to all the current segments. From a strictly
> theoretical standpoint, my _guess_ is that Lucene chooses the one with the
> highest "_n" value. So if you have multiple ones of those, it would be
> interesting
> to know,
>
> 3> Has Solr been restarted (or at least the core reloaded) on the target?
>
> So here's the experiment I'd run:
> 1> shut down the Solr running on the target
> 2> delete the data dir.
> 3> restart Solr and verify that you have zero docs. This will recreate
> the data dir and verify that that Solr instance is pointing where you
> think it is as a sanity check.
> 4> stop Solr again on the target.
> 5> do a hard commit on the source.
> 6> get a a long listing "ls -l" on your source index. This should be a
> lot of flies like _0.tim, _0.fdt...., _1.tim, _1.fdt.... etc .
> 7> do your rsync. You should _not_ be indexing to the source at this time.
> 8> start Solr on the target.
> 9> check the target again. Assuming that you have _not_ been adding
> any documents to the source system during the rsync, I'd be stunned if
> there were any differences.
> 10> If there are incorrect counts or other anomalies:
> 10.1> double-check your rsync. Is it really getting the files from your
> source?
> 10.2> compare the long listing from your index you took in <6> with
> the target. Are all files identical size-wise? Are there any files on
> the target that are not on the source and vice-versa? If there are
> differences, that would explain your issues and would point to your
> rsync process being messed up.
>
> If the index directories are identical on the source and target and
> you _still_ see differences then there's an alternate reality that we
> occupy ;).
>
> And the Alfresco folks would probably be the ones to contact.
>
> Best,
> Erick
>
>
>
> On Wed, Feb 13, 2019 at 11:28 PM Mathieu Menard
> <Ma...@realdolmen.com> wrote:
> >
> > Hello Andrea,
> >
> > I'm really sorry for the delay of my answer but I beed more information
> before answer you.
> >
> > Yes 5.365.213 is the numDocs you got just after the sync and yes
> 4.537.651 is the numDocs you got in the staging server after the reindexing
> and the colleague who realized the rsync confirm that it has been entirely
> completed.
> >
> > I don't see any transaction not completed that normaly means that the
> indexation is completed. That's why I don't understand the difference.
> >
> > Kind Regards
> >
> > Matthieu
> >
> > ----Original Message-----
> > From: Andrea Gazzarini [mailto:a.gazzarini@sease.io]
> > Sent: samedi 9 février 2019 16:56
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr Index Size after reindex
> >
> > Yes, those numbers are different and that should explain the different
> size. I think you should be able to find some information in the Alfresco
> or Solr log. There must be a reason about the missing content.
> > For example, are those numbers coming from two comparable snapshots? In
> other words, I imagine that at a given moment X you rsync-ed the two servers
> >
> >   * 5.365.213 is the numDocs you got just after the sync, isn't it?
> >   * 4.537.651 is the numDocs you got in the staging server after the
> >     reindexing isn't it? Are you sure the whole reindexing is completed?
> >
> > MaxDocs is the number of documents you have in the index including the
> deleted docs not yet cleared by a merge. In the console you should also see
> the "Deleted docs" count which should be equal to (maxdocs - numdocs)
> >
> > Ciao
> >
> > Andrea
> >
> > On 08/02/2019 15:53, Mathieu Menard wrote:
> > >
> > > Hi Andrea,
> > >
> > > I've checked this information and here is the result:
> > >
> > >
> > >
> > > PRODUCTION
> > >
> > >
> > >
> > > STAGING
> > >
> > > *numDocs*
> > >
> > >
> > >
> > > 5.365.213
> > >
> > >
> > >
> > > 4.537.651
> > >
> > > *MaxDoc*
> > >
> > >
> > >
> > > 5.845.469
> > >
> > >
> > >
> > > 5.129.556
> > >
> > > It seems that there is more than 800.00 docs in PRODUCTION that will
> > > explain the size of indexes more important. But there is a thing that
> > > I don't understand, we have copied the DB and the contenstore the
> > > numDocs for the two environments should be the same no?
> > >
> > > Could you also explain me the meaning of the maxDocs value pleases?
> > >
> > > Thanks
> > >
> > > Matthieu
> > >
> > > *From:*Andrea Gazzarini [mailto:a.gazzarini@sease.io]
> > > *Sent:* vendredi 8 février 2019 14:54
> > > *To:* solr-user@lucene.apache.org
> > > *Subject:* Re: Solr Index Size after reindex
> > >
> > > Hi Mathieu,
> > > what about the docs in the two infrastructures? Do they have the same
> > > numbers (numdocs / maxdocs)? Any meaningful message (error or not) in
> > > log files?
> > >
> > > Andrea
> > >
> > > On 08/02/2019 14:19, Mathieu Menard wrote:
> > >
> > >     Hello,
> > >
> > >     I would like to have your point of view about an observation we
> > >     have made on our two alfresco install (Production and Staging
> > >     environment) and more specifically on the size of our solr indexes
> > >     on these two environments.
> > >
> > >     Regularly we do a rsync between the Production and the Staging
> > >     environment, we make a copy of the Alfresco's DB and a copy of the
> > >     entire contenstore after that we reindex all the alfresco content.
> > >
> > >     We have noticed that for the production environment we have 19 Gb
> > >     of indexes while in the staging we have "only" 11. Gb of indexes.
> > >     We have some difficulties to understand this difference because we
> > >     assume that the indexes optimization in the same for a full
> > >     reindex or for the normal use of solr.
> > >
> > >     I've verified the configuration between the two solr instances and
> > >     I don't see any differences could you help me to better understand
> > >      this phenomenon.
> > >
> > >     Here you can find some information about our two environment, if
> > >     you need more details, I will give you as soon as possible:
> > >
> > >
> > >
> > >     PRODUCTION
> > >
> > >
> > >
> > >     STAGING
> > >
> > >     Alfresco version
> > >
> > >
> > >
> > >     5.1.1.4
> > >
> > >
> > >
> > >     5.1.1.4
> > >
> > >     Solr Version
> > >
> > >
> > >
> > >
> > >
> > >     Java version
> > >
> > >
> > >
> > >
> > >
> > >     Linux Machine
> > >
> > >
> > >
> > >     See Staging_caracteristics.txt file in attachment
> > >
> > >
> > >
> > >     See Staging_caracteristics.txt file in attachment
> > >
> > >     Please let me know if you any other information I will sent it to
> > >     you rapidly.
> > >
> > >     Kind Regards
> > >
> > >     Matthieu
> > >
>

Re: Solr Index Size after reindex

Posted by Erick Erickson <er...@gmail.com>.
Basically, this is not possible ;). Therefore there's something I
don't understand....

There's nothing anywhere except what's in the index. By that I mean that _if_
you copy an index (the data directory and children) from one place to another,
that's all there is. No information about what's in the index is stored anywhere
else. So there are a couple of possibilities I see:

1> Your rsync isn't doing what you think. By that I mean that "somehow" it isn't
copying segments (perhaps with the same name, although the size and time
checks would make it extremely unlikely to skip one). What happens if
you _delete_ the data index on your target system first?

2> I'm not entirely sure what happens if there are multiple
"segments_n" files. in
the index. That file "points" to all the current segments. From a strictly
theoretical standpoint, my _guess_ is that Lucene chooses the one with the
highest "_n" value. So if you have multiple ones of those, it would be
interesting
to know,

3> Has Solr been restarted (or at least the core reloaded) on the target?

So here's the experiment I'd run:
1> shut down the Solr running on the target
2> delete the data dir.
3> restart Solr and verify that you have zero docs. This will recreate
the data dir and verify that that Solr instance is pointing where you
think it is as a sanity check.
4> stop Solr again on the target.
5> do a hard commit on the source.
6> get a a long listing "ls -l" on your source index. This should be a
lot of flies like _0.tim, _0.fdt...., _1.tim, _1.fdt.... etc .
7> do your rsync. You should _not_ be indexing to the source at this time.
8> start Solr on the target.
9> check the target again. Assuming that you have _not_ been adding
any documents to the source system during the rsync, I'd be stunned if
there were any differences.
10> If there are incorrect counts or other anomalies:
10.1> double-check your rsync. Is it really getting the files from your source?
10.2> compare the long listing from your index you took in <6> with
the target. Are all files identical size-wise? Are there any files on
the target that are not on the source and vice-versa? If there are
differences, that would explain your issues and would point to your
rsync process being messed up.

If the index directories are identical on the source and target and
you _still_ see differences then there's an alternate reality that we
occupy ;).

And the Alfresco folks would probably be the ones to contact.

Best,
Erick



On Wed, Feb 13, 2019 at 11:28 PM Mathieu Menard
<Ma...@realdolmen.com> wrote:
>
> Hello Andrea,
>
> I'm really sorry for the delay of my answer but I beed more information before answer you.
>
> Yes 5.365.213 is the numDocs you got just after the sync and yes 4.537.651 is the numDocs you got in the staging server after the reindexing and the colleague who realized the rsync confirm that it has been entirely completed.
>
> I don't see any transaction not completed that normaly means that the indexation is completed. That's why I don't understand the difference.
>
> Kind Regards
>
> Matthieu
>
> ----Original Message-----
> From: Andrea Gazzarini [mailto:a.gazzarini@sease.io]
> Sent: samedi 9 février 2019 16:56
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Index Size after reindex
>
> Yes, those numbers are different and that should explain the different size. I think you should be able to find some information in the Alfresco or Solr log. There must be a reason about the missing content.
> For example, are those numbers coming from two comparable snapshots? In other words, I imagine that at a given moment X you rsync-ed the two servers
>
>   * 5.365.213 is the numDocs you got just after the sync, isn't it?
>   * 4.537.651 is the numDocs you got in the staging server after the
>     reindexing isn't it? Are you sure the whole reindexing is completed?
>
> MaxDocs is the number of documents you have in the index including the deleted docs not yet cleared by a merge. In the console you should also see the "Deleted docs" count which should be equal to (maxdocs - numdocs)
>
> Ciao
>
> Andrea
>
> On 08/02/2019 15:53, Mathieu Menard wrote:
> >
> > Hi Andrea,
> >
> > I've checked this information and here is the result:
> >
> >
> >
> > PRODUCTION
> >
> >
> >
> > STAGING
> >
> > *numDocs*
> >
> >
> >
> > 5.365.213
> >
> >
> >
> > 4.537.651
> >
> > *MaxDoc*
> >
> >
> >
> > 5.845.469
> >
> >
> >
> > 5.129.556
> >
> > It seems that there is more than 800.00 docs in PRODUCTION that will
> > explain the size of indexes more important. But there is a thing that
> > I don't understand, we have copied the DB and the contenstore the
> > numDocs for the two environments should be the same no?
> >
> > Could you also explain me the meaning of the maxDocs value pleases?
> >
> > Thanks
> >
> > Matthieu
> >
> > *From:*Andrea Gazzarini [mailto:a.gazzarini@sease.io]
> > *Sent:* vendredi 8 février 2019 14:54
> > *To:* solr-user@lucene.apache.org
> > *Subject:* Re: Solr Index Size after reindex
> >
> > Hi Mathieu,
> > what about the docs in the two infrastructures? Do they have the same
> > numbers (numdocs / maxdocs)? Any meaningful message (error or not) in
> > log files?
> >
> > Andrea
> >
> > On 08/02/2019 14:19, Mathieu Menard wrote:
> >
> >     Hello,
> >
> >     I would like to have your point of view about an observation we
> >     have made on our two alfresco install (Production and Staging
> >     environment) and more specifically on the size of our solr indexes
> >     on these two environments.
> >
> >     Regularly we do a rsync between the Production and the Staging
> >     environment, we make a copy of the Alfresco's DB and a copy of the
> >     entire contenstore after that we reindex all the alfresco content.
> >
> >     We have noticed that for the production environment we have 19 Gb
> >     of indexes while in the staging we have "only" 11. Gb of indexes.
> >     We have some difficulties to understand this difference because we
> >     assume that the indexes optimization in the same for a full
> >     reindex or for the normal use of solr.
> >
> >     I've verified the configuration between the two solr instances and
> >     I don't see any differences could you help me to better understand
> >      this phenomenon.
> >
> >     Here you can find some information about our two environment, if
> >     you need more details, I will give you as soon as possible:
> >
> >
> >
> >     PRODUCTION
> >
> >
> >
> >     STAGING
> >
> >     Alfresco version
> >
> >
> >
> >     5.1.1.4
> >
> >
> >
> >     5.1.1.4
> >
> >     Solr Version
> >
> >
> >
> >
> >
> >     Java version
> >
> >
> >
> >
> >
> >     Linux Machine
> >
> >
> >
> >     See Staging_caracteristics.txt file in attachment
> >
> >
> >
> >     See Staging_caracteristics.txt file in attachment
> >
> >     Please let me know if you any other information I will sent it to
> >     you rapidly.
> >
> >     Kind Regards
> >
> >     Matthieu
> >

RE: Solr Index Size after reindex

Posted by Mathieu Menard <Ma...@realdolmen.com>.
Hello Andrea,

I'm really sorry for the delay of my answer but I beed more information before answer you.

Yes 5.365.213 is the numDocs you got just after the sync and yes 4.537.651 is the numDocs you got in the staging server after the reindexing and the colleague who realized the rsync confirm that it has been entirely completed.

I don't see any transaction not completed that normaly means that the indexation is completed. That's why I don't understand the difference.

Kind Regards

Matthieu

----Original Message-----
From: Andrea Gazzarini [mailto:a.gazzarini@sease.io] 
Sent: samedi 9 février 2019 16:56
To: solr-user@lucene.apache.org
Subject: Re: Solr Index Size after reindex

Yes, those numbers are different and that should explain the different size. I think you should be able to find some information in the Alfresco or Solr log. There must be a reason about the missing content. 
For example, are those numbers coming from two comparable snapshots? In other words, I imagine that at a given moment X you rsync-ed the two servers

  * 5.365.213 is the numDocs you got just after the sync, isn't it?
  * 4.537.651 is the numDocs you got in the staging server after the
    reindexing isn't it? Are you sure the whole reindexing is completed?

MaxDocs is the number of documents you have in the index including the deleted docs not yet cleared by a merge. In the console you should also see the "Deleted docs" count which should be equal to (maxdocs - numdocs)

Ciao

Andrea

On 08/02/2019 15:53, Mathieu Menard wrote:
>
> Hi Andrea,
>
> I've checked this information and here is the result:
>
> 	
>
> PRODUCTION
>
> 	
>
> STAGING
>
> *numDocs*
>
> 	
>
> 5.365.213
>
> 	
>
> 4.537.651
>
> *MaxDoc*
>
> 	
>
> 5.845.469
>
> 	
>
> 5.129.556
>
> It seems that there is more than 800.00 docs in PRODUCTION that will 
> explain the size of indexes more important. But there is a thing that 
> I don't understand, we have copied the DB and the contenstore the 
> numDocs for the two environments should be the same no?
>
> Could you also explain me the meaning of the maxDocs value pleases?
>
> Thanks
>
> Matthieu
>
> *From:*Andrea Gazzarini [mailto:a.gazzarini@sease.io]
> *Sent:* vendredi 8 février 2019 14:54
> *To:* solr-user@lucene.apache.org
> *Subject:* Re: Solr Index Size after reindex
>
> Hi Mathieu,
> what about the docs in the two infrastructures? Do they have the same 
> numbers (numdocs / maxdocs)? Any meaningful message (error or not) in 
> log files?
>
> Andrea
>
> On 08/02/2019 14:19, Mathieu Menard wrote:
>
>     Hello,
>
>     I would like to have your point of view about an observation we
>     have made on our two alfresco install (Production and Staging
>     environment) and more specifically on the size of our solr indexes
>     on these two environments.
>
>     Regularly we do a rsync between the Production and the Staging
>     environment, we make a copy of the Alfresco's DB and a copy of the
>     entire contenstore after that we reindex all the alfresco content.
>
>     We have noticed that for the production environment we have 19 Gb
>     of indexes while in the staging we have "only" 11. Gb of indexes.
>     We have some difficulties to understand this difference because we
>     assume that the indexes optimization in the same for a full
>     reindex or for the normal use of solr.
>
>     I've verified the configuration between the two solr instances and
>     I don't see any differences could you help me to better understand
>      this phenomenon.
>
>     Here you can find some information about our two environment, if
>     you need more details, I will give you as soon as possible:
>
>     	
>
>     PRODUCTION
>
>     	
>
>     STAGING
>
>     Alfresco version
>
>     	
>
>     5.1.1.4
>
>     	
>
>     5.1.1.4
>
>     Solr Version
>
>     	
>
>     	
>
>     Java version
>
>     	
>
>     	
>
>     Linux Machine
>
>     	
>
>     See Staging_caracteristics.txt file in attachment
>
>     	
>
>     See Staging_caracteristics.txt file in attachment
>
>     Please let me know if you any other information I will sent it to
>     you rapidly.
>
>     Kind Regards
>
>     Matthieu
>

Re: Solr Index Size after reindex

Posted by Andrea Gazzarini <a....@sease.io>.
Yes, those numbers are different and that should explain the different 
size. I think you should be able to find some information in the 
Alfresco or Solr log. There must be a reason about the missing content. 
For example, are those numbers coming from two comparable snapshots? In 
other words, I imagine that at a given moment X you rsync-ed the two servers

  * 5.365.213 is the numDocs you got just after the sync, isn't it?
  * 4.537.651 is the numDocs you got in the staging server after the
    reindexing isn't it? Are you sure the whole reindexing is completed?

MaxDocs is the number of documents you have in the index including the 
deleted docs not yet cleared by a merge. In the console you should also 
see the "Deleted docs" count which should be equal to (maxdocs - numdocs)

Ciao

Andrea

On 08/02/2019 15:53, Mathieu Menard wrote:
>
> Hi Andrea,
>
> I’ve checked this information and here is the result:
>
> 	
>
> PRODUCTION
>
> 	
>
> STAGING
>
> *numDocs*
>
> 	
>
> 5.365.213
>
> 	
>
> 4.537.651
>
> *MaxDoc*
>
> 	
>
> 5.845.469
>
> 	
>
> 5.129.556
>
> It seems that there is more than 800.00 docs in PRODUCTION that will 
> explain the size of indexes more important. But there is a thing that 
> I don’t understand, we have copied the DB and the contenstore the 
> numDocs for the two environments should be the same no?
>
> Could you also explain me the meaning of the maxDocs value pleases?
>
> Thanks
>
> Matthieu
>
> *From:*Andrea Gazzarini [mailto:a.gazzarini@sease.io]
> *Sent:* vendredi 8 février 2019 14:54
> *To:* solr-user@lucene.apache.org
> *Subject:* Re: Solr Index Size after reindex
>
> Hi Mathieu,
> what about the docs in the two infrastructures? Do they have the same 
> numbers (numdocs / maxdocs)? Any meaningful message (error or not) in 
> log files?
>
> Andrea
>
> On 08/02/2019 14:19, Mathieu Menard wrote:
>
>     Hello,
>
>     I would like to have your point of view about an observation we
>     have made on our two alfresco install (Production and Staging
>     environment) and more specifically on the size of our solr indexes
>     on these two environments.
>
>     Regularly we do a rsync between the Production and the Staging
>     environment, we make a copy of the Alfresco’s DB and a copy of the
>     entire contenstore after that we reindex all the alfresco content.
>
>     We have noticed that for the production environment we have 19 Gb
>     of indexes while in the staging we have “only” 11. Gb of indexes.
>     We have some difficulties to understand this difference because we
>     assume that the indexes optimization in the same for a full
>     reindex or for the normal use of solr.
>
>     I’ve verified the configuration between the two solr instances and
>     I don’t see any differences could you help me to better understand
>      this phenomenon.
>
>     Here you can find some information about our two environment, if
>     you need more details, I will give you as soon as possible:
>
>     	
>
>     PRODUCTION
>
>     	
>
>     STAGING
>
>     Alfresco version
>
>     	
>
>     5.1.1.4
>
>     	
>
>     5.1.1.4
>
>     Solr Version
>
>     	
>
>     	
>
>     Java version
>
>     	
>
>     	
>
>     Linux Machine
>
>     	
>
>     See Staging_caracteristics.txt file in attachment
>
>     	
>
>     See Staging_caracteristics.txt file in attachment
>
>     Please let me know if you any other information I will sent it to
>     you rapidly.
>
>     Kind Regards
>
>     Matthieu
>

RE: Solr Index Size after reindex

Posted by Mathieu Menard <Ma...@realdolmen.com>.
Hi Andrea,

I've checked this information and here is the result:



PRODUCTION

STAGING

numDocs

5.365.213

4.537.651

MaxDoc

5.845.469

5.129.556


It seems that there is more than 800.00 docs in PRODUCTION that will explain the size of indexes more important. But there is a thing that I don't understand, we have copied the DB and the contenstore the numDocs for the two environments should be the same no?

Could you also explain me the meaning of the maxDocs value pleases?

Thanks

Matthieu


From: Andrea Gazzarini [mailto:a.gazzarini@sease.io]
Sent: vendredi 8 février 2019 14:54
To: solr-user@lucene.apache.org
Subject: Re: Solr Index Size after reindex

Hi Mathieu,
what about the docs in the two infrastructures? Do they have the same numbers (numdocs / maxdocs)? Any meaningful message (error or not) in log files?

Andrea
On 08/02/2019 14:19, Mathieu Menard wrote:
Hello,

I would like to have your point of view about an observation we have made on our two alfresco install (Production and Staging environment) and more specifically on the size of our solr indexes on these two environments.

Regularly we do a rsync between the Production and the Staging environment, we make a copy of the Alfresco's DB and a copy of the entire contenstore after that we reindex all the alfresco content.

We have noticed that for the production environment we have 19 Gb of indexes while in the staging we have "only" 11. Gb of indexes. We have some difficulties to understand this difference because we assume that the indexes optimization in the same for a full reindex or for the normal use of solr.

I've verified the configuration between the two solr instances and I don't see any differences could you help me to better understand  this phenomenon.

Here you can find some information about our two environment, if you need more details, I will give you as soon as possible:



PRODUCTION

STAGING

Alfresco version

5.1.1.4

5.1.1.4

Solr Version

[cid:image002.jpg@01D4BFC5.52F6DE40]

[cid:image003.jpg@01D4BFC5.52F6DE40]

Java version

[cid:image004.png@01D4BFC5.52F6DE40]

[cid:image005.png@01D4BFC5.52F6DE40]

Linux Machine

See Staging_caracteristics.txt file in attachment

See Staging_caracteristics.txt file in attachment


Please let me know if you any other information I will sent it to you rapidly.

Kind Regards

Matthieu



Re: Solr Index Size after reindex

Posted by Andrea Gazzarini <a....@sease.io>.
Hi Mathieu,
what about the docs in the two infrastructures? Do they have the same 
numbers (numdocs / maxdocs)? Any meaningful message (error or not) in 
log files?

Andrea

On 08/02/2019 14:19, Mathieu Menard wrote:
>
> Hello,
>
> I would like to have your point of view about an observation we have 
> made on our two alfresco install (Production and Staging environment) 
> and more specifically on the size of our solr indexes on these two 
> environments.
>
> Regularly we do a rsync between the Production and the Staging 
> environment, we make a copy of the Alfresco’s DB and a copy of the 
> entire contenstore after that we reindex all the alfresco content.
>
> We have noticed that for the production environment we have 19 Gb of 
> indexes while in the staging we have “only” 11. Gb of indexes. We have 
> some difficulties to understand this difference because we assume that 
> the indexes optimization in the same for a full reindex or for the 
> normal use of solr.
>
> I’ve verified the configuration between the two solr instances and I 
> don’t see any differences could you help me to better understand  this 
> phenomenon.
>
> Here you can find some information about our two environment, if you 
> need more details, I will give you as soon as possible:
>
> 	
>
> PRODUCTION
>
> 	
>
> STAGING
>
> Alfresco version
>
> 	
>
> 5.1.1.4
>
> 	
>
> 5.1.1.4
>
> Solr Version
>
> 	
>
> 	
>
> Java version
>
> 	
>
> 	
>
> Linux Machine
>
> 	
>
> See Staging_caracteristics.txt file in attachment
>
> 	
>
> See Staging_caracteristics.txt file in attachment
>
> Please let me know if you any other information I will sent it to you 
> rapidly.
>
> Kind Regards
>
> Matthieu
>