You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Massimiliano Randazzo <ma...@gmail.com> on 2020/02/25 18:28:47 UTC

Optimize sole 8.4.1

Good morning,

recently I went from version 6.4 to version 8.4.1, I access solerre through
java applications written by me to which I have updated the
solr-solrj-8.4.1.jar libraries.

I am performing the OCR indexing of a newspaper of about 550,000 pages in
production for which I have calculated at least 1,000,000,000 words and I
am experiencing slowness I wanted to know if you could advise me on changes
to the configuration.

The server I'm using is a server with 12 cores and 64GB of Ram, the only
changes I made in the configuration are:
Solr.in.sh <http://solr.in.sh/> file
SOLR_HEAP="20480m"
SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
  -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
The Java version I use is
java version "1.8.0_51"
Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)

Also comparing the solr web interface I noticed a difference in the
"Overview" page in solr 6.4 it was affected Optimized and Current and
allowed me to launch Optimized if necessary, in version 8.41 Optimized is
no longer present I hypothesized that this activity is done with the commit
or through some operation in the backgroup, if this were so, is it still
necessary to run the Optimize command from my application when I have
finished indexing? I noticed that the Optimized function requires
considerable time and resources especially in large databases

Thank you for your attention--
Inviato da Gmail Mobile

Re: Optimize solr 8.4.1

Posted by Erick Erickson <er...@gmail.com>.
As long as you have an http connection, you can use the replication API
fetchindex command to, well, fetch an index. But that copies the index but
does not shard it. I guess you could fetch into a single shard collection
and then use splitshard.

All that said, you'll have to reindex sometime if you ever change your
schema, so it'd be good to get that process set up...

On Wed, Feb 26, 2020, 08:52 Dario Rigolin <da...@comperio.it> wrote:

> Hi Massimiliano, the only way to reindex is to resend all documents to the
> indexer of the Cloud instance.
> At the moment solr doesn't have the ability to do it when the schema is
> changed or to "send" indexed data to a SolrCloud from a non cloud .
>
> For example we have in solr a field with an only stored field with the
> original document and we use this data as a source of a new reindex.
>
> Regards.
>
>
> Il giorno mer 26 feb 2020 alle ore 14:37 Massimiliano Randazzo <
> massimiliano.randazzo@gmail.com> ha scritto:
>
> > Hi Paras,
> >
> > thank you for your answer if you don't mind I would have a couple of
> > questions
> >
> > I am experiencing very long indexing times I have 8 servers for currently
> > working on 1 instance of Solr, I thought of moving to a cloud of 4 solr
> > servers with 3 zookeeeper servers to distribute the load but I was
> > wondering if I had to start over with the indexing or if there was a tool
> > to load the index of a Solr into a SolrCloud by redistributing the load?
> >
> > Currently in the "managed-schema" file I have configured the fields to be
> > indexed type="text_it" to which "lang/stopwords_it.txt" is assigned they
> > ask me to remove the stopwords, if I modify the "managed-schema" file I
> > remove the stopwords file Is it possible to re-index the database without
> > having to reload all the material but taking the documents already
> present?
> >
> > Thank you
> > Massimiliano Randazzo
> >
> > Il giorno mer 26 feb 2020 alle ore 13:26 Paras Lehana <
> > paras.lehana@indiamart.com> ha scritto:
> >
> > > Hi Massimiliano,
> > >
> > > Is it still necessary to run the Optimize command from my application
> > when
> > > > I have finished indexing?
> > >
> > >
> > > I guess you can stop worrying about optimizations and let Solr handle
> > that
> > > implicitly. There's nothing so bad about having more segments.
> > >
> > > On Wed, 26 Feb 2020 at 16:02, Massimiliano Randazzo <
> > > massimiliano.randazzo@gmail.com> wrote:
> > >
> > > > > Good morning,
> > > > >
> > > > > recently I went from version 6.4 to version 8.4.1, I access solerre
> > > > > through java applications written by me to which I have updated the
> > > > > solr-solrj-8.4.1.jar libraries.
> > > > >
> > > > > I am performing the OCR indexing of a newspaper of about 550,000
> > pages
> > > in
> > > > > production for which I have calculated at least 1,000,000,000 words
> > > and I
> > > > > am experiencing slowness I wanted to know if you could advise me on
> > > > changes
> > > > > to the configuration.
> > > > >
> > > > > The server I'm using is a server with 12 cores and 64GB of Ram, the
> > > only
> > > > > changes I made in the configuration are:
> > > > > Solr.in.sh <http://solr.in.sh/> file
> > > > > SOLR_HEAP="20480m"
> > > > > SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
> > > > > GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
> > > > >   -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> > > > > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
> > > > > The Java version I use is
> > > > > java version "1.8.0_51"
> > > > > Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
> > > > > Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)
> > > > >
> > > > > Also comparing the solr web interface I noticed a difference in the
> > > > > "Overview" page in solr 6.4 it was affected Optimized and Current
> and
> > > > > allowed me to launch Optimized if necessary, in version 8.41
> > Optimized
> > > is
> > > > > no longer present I hypothesized that this activity is done with
> the
> > > > commit
> > > > > or through some operation in the backgroup, if this were so, is it
> > > still
> > > > > necessary to run the Optimize command from my application when I
> have
> > > > > finished indexing? I noticed that the Optimized function requires
> > > > > considerable time and resources especially in large databases
> > > > >
> > > > > Thank you for your attention
> > > >
> > > > Massimiliano Randazzo
> > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > --
> > > Regards,
> > >
> > > *Paras Lehana* [65871]
> > > Development Engineer, *Auto-Suggest*,
> > > IndiaMART InterMESH Ltd,
> > >
> > > 11th Floor, Tower 2, Assotech Business Cresterra,
> > > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
> > >
> > > Mob.: +91-9560911996
> > > Work: 0120-4056700 | Extn:
> > > *1196*
> > >
> > > --
> > > *
> > > *
> > >
> > >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
> > >
> >
> >
> > --
> > Massimiliano Randazzo
> >
> > Analista Programmatore,
> > Sistemista Senior
> > Mobile +39 335 6488039
> > email: massimiliano.randazzo@gmail.com
> > pec: massimiliano.randazzo@pec.net
> >
>
>
> --
>
> Dario Rigolin
> Comperio srl - CTO
> Mobile: +39 347 7232652 - Office: +39 0425 471482
> Skype: dario.rigolin
>

Re: Optimize solr 8.4.1

Posted by Dario Rigolin <da...@comperio.it>.
Hi Massimiliano, the only way to reindex is to resend all documents to the
indexer of the Cloud instance.
At the moment solr doesn't have the ability to do it when the schema is
changed or to "send" indexed data to a SolrCloud from a non cloud .

For example we have in solr a field with an only stored field with the
original document and we use this data as a source of a new reindex.

Regards.


Il giorno mer 26 feb 2020 alle ore 14:37 Massimiliano Randazzo <
massimiliano.randazzo@gmail.com> ha scritto:

> Hi Paras,
>
> thank you for your answer if you don't mind I would have a couple of
> questions
>
> I am experiencing very long indexing times I have 8 servers for currently
> working on 1 instance of Solr, I thought of moving to a cloud of 4 solr
> servers with 3 zookeeeper servers to distribute the load but I was
> wondering if I had to start over with the indexing or if there was a tool
> to load the index of a Solr into a SolrCloud by redistributing the load?
>
> Currently in the "managed-schema" file I have configured the fields to be
> indexed type="text_it" to which "lang/stopwords_it.txt" is assigned they
> ask me to remove the stopwords, if I modify the "managed-schema" file I
> remove the stopwords file Is it possible to re-index the database without
> having to reload all the material but taking the documents already present?
>
> Thank you
> Massimiliano Randazzo
>
> Il giorno mer 26 feb 2020 alle ore 13:26 Paras Lehana <
> paras.lehana@indiamart.com> ha scritto:
>
> > Hi Massimiliano,
> >
> > Is it still necessary to run the Optimize command from my application
> when
> > > I have finished indexing?
> >
> >
> > I guess you can stop worrying about optimizations and let Solr handle
> that
> > implicitly. There's nothing so bad about having more segments.
> >
> > On Wed, 26 Feb 2020 at 16:02, Massimiliano Randazzo <
> > massimiliano.randazzo@gmail.com> wrote:
> >
> > > > Good morning,
> > > >
> > > > recently I went from version 6.4 to version 8.4.1, I access solerre
> > > > through java applications written by me to which I have updated the
> > > > solr-solrj-8.4.1.jar libraries.
> > > >
> > > > I am performing the OCR indexing of a newspaper of about 550,000
> pages
> > in
> > > > production for which I have calculated at least 1,000,000,000 words
> > and I
> > > > am experiencing slowness I wanted to know if you could advise me on
> > > changes
> > > > to the configuration.
> > > >
> > > > The server I'm using is a server with 12 cores and 64GB of Ram, the
> > only
> > > > changes I made in the configuration are:
> > > > Solr.in.sh <http://solr.in.sh/> file
> > > > SOLR_HEAP="20480m"
> > > > SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
> > > > GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
> > > >   -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> > > > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
> > > > The Java version I use is
> > > > java version "1.8.0_51"
> > > > Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
> > > > Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)
> > > >
> > > > Also comparing the solr web interface I noticed a difference in the
> > > > "Overview" page in solr 6.4 it was affected Optimized and Current and
> > > > allowed me to launch Optimized if necessary, in version 8.41
> Optimized
> > is
> > > > no longer present I hypothesized that this activity is done with the
> > > commit
> > > > or through some operation in the backgroup, if this were so, is it
> > still
> > > > necessary to run the Optimize command from my application when I have
> > > > finished indexing? I noticed that the Optimized function requires
> > > > considerable time and resources especially in large databases
> > > >
> > > > Thank you for your attention
> > >
> > > Massimiliano Randazzo
> > >
> > > >
> > > >
> > >
> >
> >
> > --
> > --
> > Regards,
> >
> > *Paras Lehana* [65871]
> > Development Engineer, *Auto-Suggest*,
> > IndiaMART InterMESH Ltd,
> >
> > 11th Floor, Tower 2, Assotech Business Cresterra,
> > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
> >
> > Mob.: +91-9560911996
> > Work: 0120-4056700 | Extn:
> > *1196*
> >
> > --
> > *
> > *
> >
> >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
> >
>
>
> --
> Massimiliano Randazzo
>
> Analista Programmatore,
> Sistemista Senior
> Mobile +39 335 6488039
> email: massimiliano.randazzo@gmail.com
> pec: massimiliano.randazzo@pec.net
>


-- 

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin

Re: Optimize solr 8.4.1

Posted by Massimiliano Randazzo <ma...@gmail.com>.
Hi Paras,

thank you for your answer if you don't mind I would have a couple of
questions

I am experiencing very long indexing times I have 8 servers for currently
working on 1 instance of Solr, I thought of moving to a cloud of 4 solr
servers with 3 zookeeeper servers to distribute the load but I was
wondering if I had to start over with the indexing or if there was a tool
to load the index of a Solr into a SolrCloud by redistributing the load?

Currently in the "managed-schema" file I have configured the fields to be
indexed type="text_it" to which "lang/stopwords_it.txt" is assigned they
ask me to remove the stopwords, if I modify the "managed-schema" file I
remove the stopwords file Is it possible to re-index the database without
having to reload all the material but taking the documents already present?

Thank you
Massimiliano Randazzo

Il giorno mer 26 feb 2020 alle ore 13:26 Paras Lehana <
paras.lehana@indiamart.com> ha scritto:

> Hi Massimiliano,
>
> Is it still necessary to run the Optimize command from my application when
> > I have finished indexing?
>
>
> I guess you can stop worrying about optimizations and let Solr handle that
> implicitly. There's nothing so bad about having more segments.
>
> On Wed, 26 Feb 2020 at 16:02, Massimiliano Randazzo <
> massimiliano.randazzo@gmail.com> wrote:
>
> > > Good morning,
> > >
> > > recently I went from version 6.4 to version 8.4.1, I access solerre
> > > through java applications written by me to which I have updated the
> > > solr-solrj-8.4.1.jar libraries.
> > >
> > > I am performing the OCR indexing of a newspaper of about 550,000 pages
> in
> > > production for which I have calculated at least 1,000,000,000 words
> and I
> > > am experiencing slowness I wanted to know if you could advise me on
> > changes
> > > to the configuration.
> > >
> > > The server I'm using is a server with 12 cores and 64GB of Ram, the
> only
> > > changes I made in the configuration are:
> > > Solr.in.sh <http://solr.in.sh/> file
> > > SOLR_HEAP="20480m"
> > > SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
> > > GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
> > >   -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> > > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
> > > The Java version I use is
> > > java version "1.8.0_51"
> > > Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
> > > Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)
> > >
> > > Also comparing the solr web interface I noticed a difference in the
> > > "Overview" page in solr 6.4 it was affected Optimized and Current and
> > > allowed me to launch Optimized if necessary, in version 8.41 Optimized
> is
> > > no longer present I hypothesized that this activity is done with the
> > commit
> > > or through some operation in the backgroup, if this were so, is it
> still
> > > necessary to run the Optimize command from my application when I have
> > > finished indexing? I noticed that the Optimized function requires
> > > considerable time and resources especially in large databases
> > >
> > > Thank you for your attention
> >
> > Massimiliano Randazzo
> >
> > >
> > >
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
>
> 11th Floor, Tower 2, Assotech Business Cresterra,
> Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *1196*
>
> --
> *
> *
>
>  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>


-- 
Massimiliano Randazzo

Analista Programmatore,
Sistemista Senior
Mobile +39 335 6488039
email: massimiliano.randazzo@gmail.com
pec: massimiliano.randazzo@pec.net

Re: Optimize solr 8.4.1

Posted by Paras Lehana <pa...@indiamart.com>.
Hi Massimiliano,

Is it still necessary to run the Optimize command from my application when
> I have finished indexing?


I guess you can stop worrying about optimizations and let Solr handle that
implicitly. There's nothing so bad about having more segments.

On Wed, 26 Feb 2020 at 16:02, Massimiliano Randazzo <
massimiliano.randazzo@gmail.com> wrote:

> > Good morning,
> >
> > recently I went from version 6.4 to version 8.4.1, I access solerre
> > through java applications written by me to which I have updated the
> > solr-solrj-8.4.1.jar libraries.
> >
> > I am performing the OCR indexing of a newspaper of about 550,000 pages in
> > production for which I have calculated at least 1,000,000,000 words and I
> > am experiencing slowness I wanted to know if you could advise me on
> changes
> > to the configuration.
> >
> > The server I'm using is a server with 12 cores and 64GB of Ram, the only
> > changes I made in the configuration are:
> > Solr.in.sh <http://solr.in.sh/> file
> > SOLR_HEAP="20480m"
> > SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
> > GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
> >   -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
> > The Java version I use is
> > java version "1.8.0_51"
> > Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
> > Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)
> >
> > Also comparing the solr web interface I noticed a difference in the
> > "Overview" page in solr 6.4 it was affected Optimized and Current and
> > allowed me to launch Optimized if necessary, in version 8.41 Optimized is
> > no longer present I hypothesized that this activity is done with the
> commit
> > or through some operation in the backgroup, if this were so, is it still
> > necessary to run the Optimize command from my application when I have
> > finished indexing? I noticed that the Optimized function requires
> > considerable time and resources especially in large databases
> >
> > Thank you for your attention
>
> Massimiliano Randazzo
>
> >
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>

Optimize solr 8.4.1

Posted by Massimiliano Randazzo <ma...@gmail.com>.
> Good morning,
>
> recently I went from version 6.4 to version 8.4.1, I access solerre
> through java applications written by me to which I have updated the
> solr-solrj-8.4.1.jar libraries.
>
> I am performing the OCR indexing of a newspaper of about 550,000 pages in
> production for which I have calculated at least 1,000,000,000 words and I
> am experiencing slowness I wanted to know if you could advise me on changes
> to the configuration.
>
> The server I'm using is a server with 12 cores and 64GB of Ram, the only
> changes I made in the configuration are:
> Solr.in.sh <http://solr.in.sh/> file
> SOLR_HEAP="20480m"
> SOLR_JAVA_MEM="-Xms20480m -Xmx20480m"
> GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
>   -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"
> The Java version I use is
> java version "1.8.0_51"
> Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
> Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)
>
> Also comparing the solr web interface I noticed a difference in the
> "Overview" page in solr 6.4 it was affected Optimized and Current and
> allowed me to launch Optimized if necessary, in version 8.41 Optimized is
> no longer present I hypothesized that this activity is done with the commit
> or through some operation in the backgroup, if this were so, is it still
> necessary to run the Optimize command from my application when I have
> finished indexing? I noticed that the Optimized function requires
> considerable time and resources especially in large databases
>
> Thank you for your attention

Massimiliano Randazzo

>
>