You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Daniel Carrasco <d....@i2tic.com> on 2018/10/23 01:57:27 UTC

Slow import from MsSQL and down cluster during process

annoyingHello,

I've a Solr Cluster that is created with 7 machines on AWS instances. The
Solr version is 7.2.1 (b2b6438b37073bee1fca40374e85bf91aa457c0b) and all
nodes are running on NTR mode and I've a replica by node (7 replicas). One
node is used to import, and the rest are just for serve data.

My problem is that I'm having problems from about two weeks with a MsSQL
import on my Solr Cluster: when the process becomes slow or takes too long,
the entire cluster goes down.

I'm confused, because the main reason to have a cluster is HA, and every
time the import node "fails" (is not really failing, just taking more time
to finish), the entire cluster fails and I've to stop the webpage until
nodes are green again.

I don't know if maybe I've to change something in configuration to allow
the cluster to keep working even when the import freezes or the import node
dies, but is very annoying to wake up at 3AM to fix the cluster.

Is there any way to avoid this?, maybe keeping the import node as NTR and
convert the rest to TLOG?

I'm a bit noob in Solr, so I don't know if I've to sent something to help
to find the problem, and the cluster was created just creating a Zookeeper
cluster, connecting the Solr nodes to that Zk cluster, importing the
collections and adding réplicas manually to every collection.
Also I've upgraded that cluster from Solr 6 to Solr 7.1 and later to Solr
7.2.1.

Thanks and greetings!
-- 
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________

Re: Slow import from MsSQL and down cluster during process

Posted by Deepak Goel <de...@gmail.com>.
Please check if there is a deadlock happening by taking heap dumps


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deicool@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Wed, Oct 24, 2018 at 11:12 AM Daniel Carrasco <d....@i2tic.com>
wrote:

> Thanks for all, I'll try later ;)
>
> Greetings!!.
>
> El mié., 24 oct. 2018 a las 7:13, Walter Underwood (<wunder@wunderwood.org
> >)
> escribió:
>
> > We handle request rates at a few thousand requests/minute with an 8 GB
> > heap. 95th percentile response time is 200 ms. Median (cached) is 4 ms.
> >
> > An oversized heap will hurt your query performance because everything
> > stops for the huge GC.
> >
> > RAM is still a thousand times faster than SSD, so you want a lot of RAM
> > available for file system buffers managed by the OS.
> >
> > I recommend trying an 8 GB heap with the latest version of Java 8 and the
> > G1 collector.
> >
> > We have this in our solr.in.sh:
> >
> > SOLR_HEAP=8g
> > # Use G1 GC  -- wunder 2017-01-23
> > # Settings from https://wiki.apache.org/solr/ShawnHeisey
> > GC_TUNE=" \
> > -XX:+UseG1GC \
> > -XX:+ParallelRefProcEnabled \
> > -XX:G1HeapRegionSize=8m \
> > -XX:MaxGCPauseMillis=200 \
> > -XX:+UseLargePages \
> > -XX:+AggressiveOpts \
> > "
> >
> > wunder
> > Walter Underwood
> > wunder@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Oct 23, 2018, at 9:51 PM, Daniel Carrasco <d....@i2tic.com>
> > wrote:
> > >
> > > Hello,
> > >
> > > I've set that heap size because the solr receives a lot of queries
> every
> > > second and I want to cache as much as possible. Also I'm not sure about
> > the
> > > number of documents in the collection, but the webpage have a lot of
> > > products.
> > >
> > > About store the index data in RAM is just an expression. The data is
> > stored
> > > on SSD disks with XFS (faster than EXT4).
> > >
> > > I'll take a look to the links tomorrow at work.
> > >
> > > Thanks!!
> > > Greetings!!
> > >
> > >
> > > El mar., 23 oct. 2018 23:48, Shawn Heisey <ap...@elyograg.org>
> > escribió:
> > >
> > >> On 10/23/2018 7:15 AM, Daniel Carrasco wrote:
> > >>> Hello,
> > >>>
> > >>> Thanks for your response.
> > >>>
> > >>> We've already thought about that and doubled the instances. Just now
> > for
> > >>> every Solr instance we've 60GB of RAM (40GB configured on Solr), and
> a
> > 16
> > >>> Cores CPU. The entire Data can be stored on RAM and will not fill the
> > RAM
> > >>> (of course talking about raw data, not procesed data).
> > >>
> > >> Why are you making the heap so large?  I've set up servers that can
> > >> handle hundreds of millions of Solr documents in a much smaller
> heap.  A
> > >> 40GB heap would be something you might do if you're handling billions
> of
> > >> documents on one server.
> > >>
> > >> When you say the entire data can be stored in RAM ... are you counting
> > >> that 40GB you gave to Solr?  Because you can't count that -- that's
> for
> > >> Solr, NOT the index data.
> > >>
> > >> The heap size should never be dictated by the amount of memory in the
> > >> server.  It should be made as large as it needs to be for the job, and
> > >> no larger.
> > >>
> > >> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
> > >>
> > >>> About the usage, I've checked the RAM and CPU usage and are not fully
> > >> used.
> > >>
> > >> What exactly are you looking at?  I've had people swear that they
> can't
> > >> see a problem with their systems when Solr is REALLY struggling to
> keep
> > >> up with what it has been asked to do.
> > >>
> > >> Further down on the page I linked above is a section about asking for
> > >> help.  If you can provide the screenshot it mentions there, that would
> > >> be helpful.  Here's a direct link to that section:
> > >>
> > >>
> > >>
> >
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >>
> >
> >
>
> --
> _________________________________________
>
>       Daniel Carrasco Marín
>       Ingeniería para la Innovación i2TIC, S.L.
>       Tlf:  +34 911 12 32 84 Ext: 223
>       www.i2tic.com
> _________________________________________
>

Re: Slow import from MsSQL and down cluster during process

Posted by Daniel Carrasco <d....@i2tic.com>.
Thanks for all, I'll try later ;)

Greetings!!.

El mié., 24 oct. 2018 a las 7:13, Walter Underwood (<wu...@wunderwood.org>)
escribió:

> We handle request rates at a few thousand requests/minute with an 8 GB
> heap. 95th percentile response time is 200 ms. Median (cached) is 4 ms.
>
> An oversized heap will hurt your query performance because everything
> stops for the huge GC.
>
> RAM is still a thousand times faster than SSD, so you want a lot of RAM
> available for file system buffers managed by the OS.
>
> I recommend trying an 8 GB heap with the latest version of Java 8 and the
> G1 collector.
>
> We have this in our solr.in.sh:
>
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 23, 2018, at 9:51 PM, Daniel Carrasco <d....@i2tic.com>
> wrote:
> >
> > Hello,
> >
> > I've set that heap size because the solr receives a lot of queries every
> > second and I want to cache as much as possible. Also I'm not sure about
> the
> > number of documents in the collection, but the webpage have a lot of
> > products.
> >
> > About store the index data in RAM is just an expression. The data is
> stored
> > on SSD disks with XFS (faster than EXT4).
> >
> > I'll take a look to the links tomorrow at work.
> >
> > Thanks!!
> > Greetings!!
> >
> >
> > El mar., 23 oct. 2018 23:48, Shawn Heisey <ap...@elyograg.org>
> escribió:
> >
> >> On 10/23/2018 7:15 AM, Daniel Carrasco wrote:
> >>> Hello,
> >>>
> >>> Thanks for your response.
> >>>
> >>> We've already thought about that and doubled the instances. Just now
> for
> >>> every Solr instance we've 60GB of RAM (40GB configured on Solr), and a
> 16
> >>> Cores CPU. The entire Data can be stored on RAM and will not fill the
> RAM
> >>> (of course talking about raw data, not procesed data).
> >>
> >> Why are you making the heap so large?  I've set up servers that can
> >> handle hundreds of millions of Solr documents in a much smaller heap.  A
> >> 40GB heap would be something you might do if you're handling billions of
> >> documents on one server.
> >>
> >> When you say the entire data can be stored in RAM ... are you counting
> >> that 40GB you gave to Solr?  Because you can't count that -- that's for
> >> Solr, NOT the index data.
> >>
> >> The heap size should never be dictated by the amount of memory in the
> >> server.  It should be made as large as it needs to be for the job, and
> >> no larger.
> >>
> >> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
> >>
> >>> About the usage, I've checked the RAM and CPU usage and are not fully
> >> used.
> >>
> >> What exactly are you looking at?  I've had people swear that they can't
> >> see a problem with their systems when Solr is REALLY struggling to keep
> >> up with what it has been asked to do.
> >>
> >> Further down on the page I linked above is a section about asking for
> >> help.  If you can provide the screenshot it mentions there, that would
> >> be helpful.  Here's a direct link to that section:
> >>
> >>
> >>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
>

-- 
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________

Re: Slow import from MsSQL and down cluster during process

Posted by Walter Underwood <wu...@wunderwood.org>.
We handle request rates at a few thousand requests/minute with an 8 GB heap. 95th percentile response time is 200 ms. Median (cached) is 4 ms.

An oversized heap will hurt your query performance because everything stops for the huge GC.

RAM is still a thousand times faster than SSD, so you want a lot of RAM available for file system buffers managed by the OS.

I recommend trying an 8 GB heap with the latest version of Java 8 and the G1 collector. 

We have this in our solr.in.sh:

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 23, 2018, at 9:51 PM, Daniel Carrasco <d....@i2tic.com> wrote:
> 
> Hello,
> 
> I've set that heap size because the solr receives a lot of queries every
> second and I want to cache as much as possible. Also I'm not sure about the
> number of documents in the collection, but the webpage have a lot of
> products.
> 
> About store the index data in RAM is just an expression. The data is stored
> on SSD disks with XFS (faster than EXT4).
> 
> I'll take a look to the links tomorrow at work.
> 
> Thanks!!
> Greetings!!
> 
> 
> El mar., 23 oct. 2018 23:48, Shawn Heisey <ap...@elyograg.org> escribió:
> 
>> On 10/23/2018 7:15 AM, Daniel Carrasco wrote:
>>> Hello,
>>> 
>>> Thanks for your response.
>>> 
>>> We've already thought about that and doubled the instances. Just now for
>>> every Solr instance we've 60GB of RAM (40GB configured on Solr), and a 16
>>> Cores CPU. The entire Data can be stored on RAM and will not fill the RAM
>>> (of course talking about raw data, not procesed data).
>> 
>> Why are you making the heap so large?  I've set up servers that can
>> handle hundreds of millions of Solr documents in a much smaller heap.  A
>> 40GB heap would be something you might do if you're handling billions of
>> documents on one server.
>> 
>> When you say the entire data can be stored in RAM ... are you counting
>> that 40GB you gave to Solr?  Because you can't count that -- that's for
>> Solr, NOT the index data.
>> 
>> The heap size should never be dictated by the amount of memory in the
>> server.  It should be made as large as it needs to be for the job, and
>> no larger.
>> 
>> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
>> 
>>> About the usage, I've checked the RAM and CPU usage and are not fully
>> used.
>> 
>> What exactly are you looking at?  I've had people swear that they can't
>> see a problem with their systems when Solr is REALLY struggling to keep
>> up with what it has been asked to do.
>> 
>> Further down on the page I linked above is a section about asking for
>> help.  If you can provide the screenshot it mentions there, that would
>> be helpful.  Here's a direct link to that section:
>> 
>> 
>> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
>> 
>> Thanks,
>> Shawn
>> 
>> 


Re: Slow import from MsSQL and down cluster during process

Posted by Daniel Carrasco <d....@i2tic.com>.
Hello,

I've set that heap size because the solr receives a lot of queries every
second and I want to cache as much as possible. Also I'm not sure about the
number of documents in the collection, but the webpage have a lot of
products.

About store the index data in RAM is just an expression. The data is stored
on SSD disks with XFS (faster than EXT4).

I'll take a look to the links tomorrow at work.

Thanks!!
Greetings!!


El mar., 23 oct. 2018 23:48, Shawn Heisey <ap...@elyograg.org> escribió:

> On 10/23/2018 7:15 AM, Daniel Carrasco wrote:
> > Hello,
> >
> > Thanks for your response.
> >
> > We've already thought about that and doubled the instances. Just now for
> > every Solr instance we've 60GB of RAM (40GB configured on Solr), and a 16
> > Cores CPU. The entire Data can be stored on RAM and will not fill the RAM
> > (of course talking about raw data, not procesed data).
>
> Why are you making the heap so large?  I've set up servers that can
> handle hundreds of millions of Solr documents in a much smaller heap.  A
> 40GB heap would be something you might do if you're handling billions of
> documents on one server.
>
> When you say the entire data can be stored in RAM ... are you counting
> that 40GB you gave to Solr?  Because you can't count that -- that's for
> Solr, NOT the index data.
>
> The heap size should never be dictated by the amount of memory in the
> server.  It should be made as large as it needs to be for the job, and
> no larger.
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#RAM
>
> > About the usage, I've checked the RAM and CPU usage and are not fully
> used.
>
> What exactly are you looking at?  I've had people swear that they can't
> see a problem with their systems when Solr is REALLY struggling to keep
> up with what it has been asked to do.
>
> Further down on the page I linked above is a section about asking for
> help.  If you can provide the screenshot it mentions there, that would
> be helpful.  Here's a direct link to that section:
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
>
> Thanks,
> Shawn
>
>

Re: Slow import from MsSQL and down cluster during process

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/23/2018 7:15 AM, Daniel Carrasco wrote:
> Hello,
>
> Thanks for your response.
>
> We've already thought about that and doubled the instances. Just now for
> every Solr instance we've 60GB of RAM (40GB configured on Solr), and a 16
> Cores CPU. The entire Data can be stored on RAM and will not fill the RAM
> (of course talking about raw data, not procesed data).

Why are you making the heap so large?  I've set up servers that can 
handle hundreds of millions of Solr documents in a much smaller heap.  A 
40GB heap would be something you might do if you're handling billions of 
documents on one server.

When you say the entire data can be stored in RAM ... are you counting 
that 40GB you gave to Solr?  Because you can't count that -- that's for 
Solr, NOT the index data.

The heap size should never be dictated by the amount of memory in the 
server.  It should be made as large as it needs to be for the job, and 
no larger.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

> About the usage, I've checked the RAM and CPU usage and are not fully used.

What exactly are you looking at?  I've had people swear that they can't 
see a problem with their systems when Solr is REALLY struggling to keep 
up with what it has been asked to do.

Further down on the page I linked above is a section about asking for 
help.  If you can provide the screenshot it mentions there, that would 
be helpful.  Here's a direct link to that section:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

Thanks,
Shawn


Re: Slow import from MsSQL and down cluster during process

Posted by Daniel Carrasco <d....@i2tic.com>.
Hello,

Thanks for your response.

We've already thought about that and doubled the instances. Just now for
every Solr instance we've 60GB of RAM (40GB configured on Solr), and a 16
Cores CPU. The entire Data can be stored on RAM and will not fill the RAM
(of course talking about raw data, not procesed data).

About the usage, I've checked the RAM and CPU usage and are not fully used.

Greetings!

El mar., 23 oct. 2018 a las 14:02, Chris Ulicny (<cu...@iq.media>)
escribió:

> Dan,
>
> Do you have any idea on the resource usage for the hosts when Solr starts
> to become unresponsive? It could be that you need more resources or better
> AWS instances for the hosts.
>
> We had what sounds like a similar scenario when attempting to move one of
> our solrcloud instances to a cloud computing platform. During periods of
> heaving indexing, segment merging, and searches, the cluster would become
> unresponsive due to solr waiting for numerous I/O operations which we being
> throttled. Solr can be very I/O intensive, especially when you can't cache
> the entire index in memory.
>
> Thanks,
> Chris
>
>
> On Tue, Oct 23, 2018 at 5:40 AM Daniel Carrasco <d....@i2tic.com>
> wrote:
>
> > Hi,
> > El mar., 23 oct. 2018 a las 10:18, Charlie Hull (<ch...@flax.co.uk>)
> > escribió:
> >
> > > On 23/10/2018 02:57, Daniel Carrasco wrote:
> > > > annoyingHello,
> > > >
> > > > I've a Solr Cluster that is created with 7 machines on AWS instances.
> > The
> > > > Solr version is 7.2.1 (b2b6438b37073bee1fca40374e85bf91aa457c0b) and
> > all
> > > > nodes are running on NTR mode and I've a replica by node (7
> replicas).
> > > One
> > > > node is used to import, and the rest are just for serve data.
> > > >
> > > > My problem is that I'm having problems from about two weeks with a
> > MsSQL
> > > > import on my Solr Cluster: when the process becomes slow or takes too
> > > long,
> > > > the entire cluster goes down.
> > >
> > > How exactly are you importing from MsSQL to Solr? Are you using the
> Data
> > > Import Handler (DIH) and if so, how?
> >
> >
> > yeah, we're using import handler with jdbc connector:
> >
> > <dataConfig>
> >   <dataSource type="JdbcDataSource"
> > driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> > url="jdbc:sqlserver://......." user="..." password="..."/>
> >     <entity name="products_baja_real" transformer="RegexTransformer"
> > query="A_Long_Query" />
> >     <field column="id" name="id"/>
> >     ... A lot of fields configuration ...
> >     </entity>
> >     ... some entities similar to above ...
> >   </document>
> > </dataConfig>
> >
> >
> >
> > > What evidence do you have that  this is slow or takes too long?
> > >
> >
> > Well, the process normally takes less than 20 minutes and doesn't affect
> at
> > all to cluster (normally near 15m). I've a monit system that notice when
> > this process takes more than 25 minutes, and just a bit later after that
> > alert, the entire collection goes to recovery mode and then we're unable
> to
> > continue to serve the requests made by the webpage. We've to stop all the
> > requests until the collection is OK again. The rest of time the cluster
> > works perfect without downtime, but lately the problem is happen more
> often
> > (I'd to recover the cluster two times in less than an hour this night,
> and
> > it didn't fail again because we've stopped the import cron).
> > This is the soft problem, because sometimes the entire cluster becomes
> > unstable and affects to other collections. Sometimes even the node that
> is
> > Leader fails and we're unable to release that Leadership (even shutting
> > down the Leader server, running the FORCELEADER API command), and that
> make
> > hard to recovery the cluster. If we're lucky, the cluster recovers itself
> > even with recovering leader (taking so long, of course), but sometimes
> > we've no luck and we've to reboot all the machines to force a full
> recover.
> >
> >
> > >
> > > Charlie
> > > >
> > > > I'm confused, because the main reason to have a cluster is HA, and
> > every
> > > > time the import node "fails" (is not really failing, just taking more
> > > time
> > > > to finish), the entire cluster fails and I've to stop the webpage
> until
> > > > nodes are green again.
> > > >
> > > > I don't know if maybe I've to change something in configuration to
> > allow
> > > > the cluster to keep working even when the import freezes or the
> import
> > > node
> > > > dies, but is very annoying to wake up at 3AM to fix the cluster.
> > > >
> > > > Is there any way to avoid this?, maybe keeping the import node as NTR
> > and
> > > > convert the rest to TLOG?
> > > >
> > > > I'm a bit noob in Solr, so I don't know if I've to sent something to
> > help
> > > > to find the problem, and the cluster was created just creating a
> > > Zookeeper
> > > > cluster, connecting the Solr nodes to that Zk cluster, importing the
> > > > collections and adding réplicas manually to every collection.
> > > > Also I've upgraded that cluster from Solr 6 to Solr 7.1 and later to
> > Solr
> > > > 7.2.1.
> > > >
> > > > Thanks and greetings!
> > > >
> > >
> > >
> > > --
> > > Charlie Hull
> > > Flax - Open Source Enterprise Search
> > >
> > > tel/fax: +44 (0)8700 118334 <+44%20870%20011%208334>
> > > mobile:  +44 (0)7767 825828 <+44%207767%20825828>
> > > web: www.flax.co.uk
> > >
> >
> >
> > Thanks, and greetings!!
> >
> > --
> > _________________________________________
> >
> >       Daniel Carrasco Marín
> >       Ingeniería para la Innovación i2TIC, S.L.
> >       Tlf:  +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084>
> >       www.i2tic.com
> > _________________________________________
> >
>


-- 
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________

Re: Slow import from MsSQL and down cluster during process

Posted by Chris Ulicny <cu...@iq.media>.
Dan,

Do you have any idea on the resource usage for the hosts when Solr starts
to become unresponsive? It could be that you need more resources or better
AWS instances for the hosts.

We had what sounds like a similar scenario when attempting to move one of
our solrcloud instances to a cloud computing platform. During periods of
heaving indexing, segment merging, and searches, the cluster would become
unresponsive due to solr waiting for numerous I/O operations which we being
throttled. Solr can be very I/O intensive, especially when you can't cache
the entire index in memory.

Thanks,
Chris


On Tue, Oct 23, 2018 at 5:40 AM Daniel Carrasco <d....@i2tic.com>
wrote:

> Hi,
> El mar., 23 oct. 2018 a las 10:18, Charlie Hull (<ch...@flax.co.uk>)
> escribió:
>
> > On 23/10/2018 02:57, Daniel Carrasco wrote:
> > > annoyingHello,
> > >
> > > I've a Solr Cluster that is created with 7 machines on AWS instances.
> The
> > > Solr version is 7.2.1 (b2b6438b37073bee1fca40374e85bf91aa457c0b) and
> all
> > > nodes are running on NTR mode and I've a replica by node (7 replicas).
> > One
> > > node is used to import, and the rest are just for serve data.
> > >
> > > My problem is that I'm having problems from about two weeks with a
> MsSQL
> > > import on my Solr Cluster: when the process becomes slow or takes too
> > long,
> > > the entire cluster goes down.
> >
> > How exactly are you importing from MsSQL to Solr? Are you using the Data
> > Import Handler (DIH) and if so, how?
>
>
> yeah, we're using import handler with jdbc connector:
>
> <dataConfig>
>   <dataSource type="JdbcDataSource"
> driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
> url="jdbc:sqlserver://......." user="..." password="..."/>
>     <entity name="products_baja_real" transformer="RegexTransformer"
> query="A_Long_Query" />
>     <field column="id" name="id"/>
>     ... A lot of fields configuration ...
>     </entity>
>     ... some entities similar to above ...
>   </document>
> </dataConfig>
>
>
>
> > What evidence do you have that  this is slow or takes too long?
> >
>
> Well, the process normally takes less than 20 minutes and doesn't affect at
> all to cluster (normally near 15m). I've a monit system that notice when
> this process takes more than 25 minutes, and just a bit later after that
> alert, the entire collection goes to recovery mode and then we're unable to
> continue to serve the requests made by the webpage. We've to stop all the
> requests until the collection is OK again. The rest of time the cluster
> works perfect without downtime, but lately the problem is happen more often
> (I'd to recover the cluster two times in less than an hour this night, and
> it didn't fail again because we've stopped the import cron).
> This is the soft problem, because sometimes the entire cluster becomes
> unstable and affects to other collections. Sometimes even the node that is
> Leader fails and we're unable to release that Leadership (even shutting
> down the Leader server, running the FORCELEADER API command), and that make
> hard to recovery the cluster. If we're lucky, the cluster recovers itself
> even with recovering leader (taking so long, of course), but sometimes
> we've no luck and we've to reboot all the machines to force a full recover.
>
>
> >
> > Charlie
> > >
> > > I'm confused, because the main reason to have a cluster is HA, and
> every
> > > time the import node "fails" (is not really failing, just taking more
> > time
> > > to finish), the entire cluster fails and I've to stop the webpage until
> > > nodes are green again.
> > >
> > > I don't know if maybe I've to change something in configuration to
> allow
> > > the cluster to keep working even when the import freezes or the import
> > node
> > > dies, but is very annoying to wake up at 3AM to fix the cluster.
> > >
> > > Is there any way to avoid this?, maybe keeping the import node as NTR
> and
> > > convert the rest to TLOG?
> > >
> > > I'm a bit noob in Solr, so I don't know if I've to sent something to
> help
> > > to find the problem, and the cluster was created just creating a
> > Zookeeper
> > > cluster, connecting the Solr nodes to that Zk cluster, importing the
> > > collections and adding réplicas manually to every collection.
> > > Also I've upgraded that cluster from Solr 6 to Solr 7.1 and later to
> Solr
> > > 7.2.1.
> > >
> > > Thanks and greetings!
> > >
> >
> >
> > --
> > Charlie Hull
> > Flax - Open Source Enterprise Search
> >
> > tel/fax: +44 (0)8700 118334 <+44%20870%20011%208334>
> > mobile:  +44 (0)7767 825828 <+44%207767%20825828>
> > web: www.flax.co.uk
> >
>
>
> Thanks, and greetings!!
>
> --
> _________________________________________
>
>       Daniel Carrasco Marín
>       Ingeniería para la Innovación i2TIC, S.L.
>       Tlf:  +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084>
>       www.i2tic.com
> _________________________________________
>

Re: Slow import from MsSQL and down cluster during process

Posted by Daniel Carrasco <d....@i2tic.com>.
Hi,
El mar., 23 oct. 2018 a las 10:18, Charlie Hull (<ch...@flax.co.uk>)
escribió:

> On 23/10/2018 02:57, Daniel Carrasco wrote:
> > annoyingHello,
> >
> > I've a Solr Cluster that is created with 7 machines on AWS instances. The
> > Solr version is 7.2.1 (b2b6438b37073bee1fca40374e85bf91aa457c0b) and all
> > nodes are running on NTR mode and I've a replica by node (7 replicas).
> One
> > node is used to import, and the rest are just for serve data.
> >
> > My problem is that I'm having problems from about two weeks with a MsSQL
> > import on my Solr Cluster: when the process becomes slow or takes too
> long,
> > the entire cluster goes down.
>
> How exactly are you importing from MsSQL to Solr? Are you using the Data
> Import Handler (DIH) and if so, how?


yeah, we're using import handler with jdbc connector:

<dataConfig>
  <dataSource type="JdbcDataSource"
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
url="jdbc:sqlserver://......." user="..." password="..."/>
    <entity name="products_baja_real" transformer="RegexTransformer"
query="A_Long_Query" />
    <field column="id" name="id"/>
    ... A lot of fields configuration ...
    </entity>
    ... some entities similar to above ...
  </document>
</dataConfig>



> What evidence do you have that  this is slow or takes too long?
>

Well, the process normally takes less than 20 minutes and doesn't affect at
all to cluster (normally near 15m). I've a monit system that notice when
this process takes more than 25 minutes, and just a bit later after that
alert, the entire collection goes to recovery mode and then we're unable to
continue to serve the requests made by the webpage. We've to stop all the
requests until the collection is OK again. The rest of time the cluster
works perfect without downtime, but lately the problem is happen more often
(I'd to recover the cluster two times in less than an hour this night, and
it didn't fail again because we've stopped the import cron).
This is the soft problem, because sometimes the entire cluster becomes
unstable and affects to other collections. Sometimes even the node that is
Leader fails and we're unable to release that Leadership (even shutting
down the Leader server, running the FORCELEADER API command), and that make
hard to recovery the cluster. If we're lucky, the cluster recovers itself
even with recovering leader (taking so long, of course), but sometimes
we've no luck and we've to reboot all the machines to force a full recover.


>
> Charlie
> >
> > I'm confused, because the main reason to have a cluster is HA, and every
> > time the import node "fails" (is not really failing, just taking more
> time
> > to finish), the entire cluster fails and I've to stop the webpage until
> > nodes are green again.
> >
> > I don't know if maybe I've to change something in configuration to allow
> > the cluster to keep working even when the import freezes or the import
> node
> > dies, but is very annoying to wake up at 3AM to fix the cluster.
> >
> > Is there any way to avoid this?, maybe keeping the import node as NTR and
> > convert the rest to TLOG?
> >
> > I'm a bit noob in Solr, so I don't know if I've to sent something to help
> > to find the problem, and the cluster was created just creating a
> Zookeeper
> > cluster, connecting the Solr nodes to that Zk cluster, importing the
> > collections and adding réplicas manually to every collection.
> > Also I've upgraded that cluster from Solr 6 to Solr 7.1 and later to Solr
> > 7.2.1.
> >
> > Thanks and greetings!
> >
>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>


Thanks, and greetings!!

-- 
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________

Re: Slow import from MsSQL and down cluster during process

Posted by Charlie Hull <ch...@flax.co.uk>.
On 23/10/2018 02:57, Daniel Carrasco wrote:
> annoyingHello,
> 
> I've a Solr Cluster that is created with 7 machines on AWS instances. The
> Solr version is 7.2.1 (b2b6438b37073bee1fca40374e85bf91aa457c0b) and all
> nodes are running on NTR mode and I've a replica by node (7 replicas). One
> node is used to import, and the rest are just for serve data.
> 
> My problem is that I'm having problems from about two weeks with a MsSQL
> import on my Solr Cluster: when the process becomes slow or takes too long,
> the entire cluster goes down.

How exactly are you importing from MsSQL to Solr? Are you using the Data 
Import Handler (DIH) and if so, how?  What evidence do you have that 
this is slow or takes too long?

Charlie
> 
> I'm confused, because the main reason to have a cluster is HA, and every
> time the import node "fails" (is not really failing, just taking more time
> to finish), the entire cluster fails and I've to stop the webpage until
> nodes are green again.
> 
> I don't know if maybe I've to change something in configuration to allow
> the cluster to keep working even when the import freezes or the import node
> dies, but is very annoying to wake up at 3AM to fix the cluster.
> 
> Is there any way to avoid this?, maybe keeping the import node as NTR and
> convert the rest to TLOG?
> 
> I'm a bit noob in Solr, so I don't know if I've to sent something to help
> to find the problem, and the cluster was created just creating a Zookeeper
> cluster, connecting the Solr nodes to that Zk cluster, importing the
> collections and adding réplicas manually to every collection.
> Also I've upgraded that cluster from Solr 6 to Solr 7.1 and later to Solr
> 7.2.1.
> 
> Thanks and greetings!
> 


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk