You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Annette Newton <an...@servicetick.com> on 2013/05/01 11:39:43 UTC

Delete from Solr Cloud 4.0 index..

We have a 4 shard - 2 replica solr cloud setup, each with about 26GB of
index.  A total of 24,000,000.  We issued a rather large delete yesterday
morning to reduce that size by about half, this resulted in the loss of all
shards while the delete was taking place, but when it had apparently
finished as soon as we started writing again we continued to lose shards.

We have also issued much smaller deletes and lost shards but before they
have always come back ok.  This time we couldn't keep them online.  We
ended up rebuilding out cloud setup and switching over to it.

Is there a better process for deleting documents?  Is this expected
behaviour?

Thanks very much.

-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*

Re: Delete from Solr Cloud 4.0 index..

Posted by Shawn Heisey <so...@elyograg.org>.

On 5/3/2013 3:22 AM, Annette Newton wrote:
> One question Shawn - did you ever get any costings around Zing? Did you
> trial it?

I never did do a trial.  I asked them for a cost and they didn't have an
immediate answer, wanted to do a phone call and get a lot of information
about my setup.  The price apparently has a lot of variance based on the
specific environment, so I didn't pursue it, figuring that the cost
would be higher than my superiors are willing to pay.

The only information I could find about the cost of Zing was a very
recent Register article that had this to say:

"Azul is similarly cagey about what a supported version of the Zing JVM
costs, and only says that Zing costs around what a supported version of
an Oracle, IBM, or Red Hat JVM will run enterprises and that it has an
annual subscription model for Zing pricing. You can't easily get pricing
for Oracle, IBM, or Red Hat JVMs, of course, so the comparison is
accurate but perfectly useless."

http://www.theregister.co.uk/2013/04/08/azul_systems_zing_lmax_exchange/

Thanks,
Shawn

Re: Delete from Solr Cloud 4.0 index..

Posted by Annette Newton <an...@servicetick.com>.

One question Shawn - did you ever get any costings around Zing? Did you
trial it?

Thanks.


On 3 May 2013 10:03, Annette Newton <an...@servicetick.com> wrote:

> Thanks Shawn.
>
> I have played around with Soft Commits before and didn't seem to have any
> improvement, but with the current load testing I am doing I will give it
> another go.
>
> I have researched docValues and came across the fact that it would
> increase the index size.  With the upgrade to 4.2.1 the index size has
> reduced by approx 33% which is pleasing and I don't really want to lose
> that saving.
>
> We do use the facet.enum method - which works really well, but I will
> verify that we are using that in every instance, we have numerous
> developers working on the product and maybe one or two have slipped
> through.
>
> Right from the first I upped the zkClientTimeout to 30 as I wanted to give
> extra time for any network blips that we experience on AWS.  We only seem
> to drop communication on a full garbage collection though.
>
> I am coming to the conclusion that we need to have more shards to cope
> with the writes, so I will play around with adding more shards and see how
> I go.
>
> I appreciate you having a look over our setup and the advice.
>
> Thanks again.
>
> Netty.
>
>
> On 2 May 2013 23:17, Shawn Heisey <so...@elyograg.org> wrote:
>
>> On 5/2/2013 4:24 AM, Annette Newton wrote:
>> > Hi Shawn,
>> >
>> > Thanks so much for your response.  We basically are very write intensive
>> > and write throughput is pretty essential to our product.  Reads are
>> > sporadic and actually is functioning really well.
>> >
>> > We write on average (at the moment) 8-12 batches of 35 documents per
>> > minute.  But we really will be looking to write more in the future, so
>> need
>> > to work out scaling of solr and how to cope with more volume.
>> >
>> > Schema (I have changed the names) :
>> >
>> > http://pastebin.com/x1ry7ieW
>> >
>> > Config:
>> >
>> > http://pastebin.com/pqjTCa7L
>>
>> This is very clean.  There's probably more you could remove/comment, but
>> generally speaking I couldn't find any glaring issues.  In particular,
>> you have disabled autowarming, which is a major contributor to commit
>> speed problems.
>>
>> The first thing I think I'd try is increasing zkClientTimeout to 30 or
>> 60 seconds.  You can use the startup commandline or solr.xml, I would
>> probably use the latter.  Here's a solr.xml fragment that uses a system
>> property or a 15 second default:
>>
>> <?xml version="1.0" encoding="UTF-8" ?>
>> <solr persistent="true" sharedLib="lib">
>>   <cores adminPath="/admin/cores"
>> zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
>> hostContext="solr">
>>
>> General thoughts, these changes might not help this particular issue:
>> You've got autoCommit with openSearcher=true.  This is a hard commit.
>> If it were me, I would set that up with openSearcher=false and either do
>> explicit soft commits from my application or set up autoSoftCommit with
>> a shorter timeframe than autoCommit.
>>
>> This might simply be a scaling issue, where you'll need to spread the
>> load wider than four shards.  I know that there are financial
>> considerations with that, and they might not be small, so let's leave
>> that alone for now.
>>
>> The memory problems might be a symptom/cause of the scaling issue I just
>> mentioned.  You said you're using facets, which can be a real memory hog
>> even with only a few of them.  Have you tried facet.method=enum to see
>> how it performs?  You'd need to switch to it exclusively, never go with
>> the default of fc.  You could put that in the defaults or invariants
>> section of your request handler(s).
>>
>> Another way to reduce memory usage for facets is to use disk-based
>> docValues on version 4.2 or later for the facet fields, but this will
>> increase your index size, and your index is already quite large.
>> Depending on your index contents, the increase may be small or large.
>>
>> Something to just mention: It looks like your solrconfig.xml has
>> hard-coded absolute paths for dataDir and updateLog.  This is fine if
>> you'll only ever have one core/collection on each server, but it'll be a
>> disaster if you have multiples.  I could be wrong about how these get
>> interpreted in SolrCloud -- they might actually be relative despite
>> starting with a slash.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
>
> Annette Newton
>
> Database Administrator
>
> ServiceTick Ltd
>
>
>
> T:+44(0)1603 618326
>
>
>
> Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ
>
> www.servicetick.com
>
> *www.sessioncam.com*
>



-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*

Re: Delete from Solr Cloud 4.0 index..

Posted by Erick Erickson <er...@gmail.com>.

bq: Will docValues help with memory usage?

'm still a bit fuzzy on all the ramifications of DocValues, but I
somewhat doubt they'll result in index size savings, they _really_
help with loading the values for a field, but the end result is still
the values in memory....

People who know what they're talking about, _please_ correct this if
I'm off base.

Sure, stored field compression will help with disk space, no question.
I was mostly cautioning against extrapolating from disk size to memory
requirements without taking this into account.


Best
Erick

Best
Erick

On Tue, May 7, 2013 at 6:46 AM, Annette Newton
<an...@servicetick.com> wrote:
> Hi Erick,
>
> Thanks for the tip.
>
> Will docValues help with memory usage?  It seemed a bit complicated to set
> up..
>
> The index size saving was nice because that means that potentially I could
> use smaller provisioned IOP volumes which cost less...
>
> Thanks.
>
>
> On 3 May 2013 18:27, Erick Erickson <er...@gmail.com> wrote:
>
>> Anette:
>>
>> Be a little careful with the index size savings, they really don't
>> mean much for _searching_. The sotred field compression
>> significantly reduces the size on disk, but only for the stored
>> data which is only accessed when returning the top N docs. In
>> terms of how many docs you can fit on your hardware, it's pretty
>> irrelevant.
>>
>> The *.fdt and *.fdx files in your index directory contain the stored
>> data, so when looking at the effects of various options (including
>> compression), you can pretty much ignore these files.
>>
>> FWIW,
>> Erick
>>
>> On Fri, May 3, 2013 at 2:03 AM, Annette Newton
>> <an...@servicetick.com> wrote:
>> > Thanks Shawn.
>> >
>> > I have played around with Soft Commits before and didn't seem to have any
>> > improvement, but with the current load testing I am doing I will give it
>> > another go.
>> >
>> > I have researched docValues and came across the fact that it would
>> increase
>> > the index size.  With the upgrade to 4.2.1 the index size has reduced by
>> > approx 33% which is pleasing and I don't really want to lose that saving.
>> >
>> > We do use the facet.enum method - which works really well, but I will
>> > verify that we are using that in every instance, we have numerous
>> > developers working on the product and maybe one or two have slipped
>> > through.
>> >
>> > Right from the first I upped the zkClientTimeout to 30 as I wanted to
>> give
>> > extra time for any network blips that we experience on AWS.  We only seem
>> > to drop communication on a full garbage collection though.
>> >
>> > I am coming to the conclusion that we need to have more shards to cope
>> with
>> > the writes, so I will play around with adding more shards and see how I
>> go.
>> >
>> >
>> > I appreciate you having a look over our setup and the advice.
>> >
>> > Thanks again.
>> >
>> > Netty.
>> >
>> >
>> > On 2 May 2013 23:17, Shawn Heisey <so...@elyograg.org> wrote:
>> >
>> >> On 5/2/2013 4:24 AM, Annette Newton wrote:
>> >> > Hi Shawn,
>> >> >
>> >> > Thanks so much for your response.  We basically are very write
>> intensive
>> >> > and write throughput is pretty essential to our product.  Reads are
>> >> > sporadic and actually is functioning really well.
>> >> >
>> >> > We write on average (at the moment) 8-12 batches of 35 documents per
>> >> > minute.  But we really will be looking to write more in the future, so
>> >> need
>> >> > to work out scaling of solr and how to cope with more volume.
>> >> >
>> >> > Schema (I have changed the names) :
>> >> >
>> >> > http://pastebin.com/x1ry7ieW
>> >> >
>> >> > Config:
>> >> >
>> >> > http://pastebin.com/pqjTCa7L
>> >>
>> >> This is very clean.  There's probably more you could remove/comment, but
>> >> generally speaking I couldn't find any glaring issues.  In particular,
>> >> you have disabled autowarming, which is a major contributor to commit
>> >> speed problems.
>> >>
>> >> The first thing I think I'd try is increasing zkClientTimeout to 30 or
>> >> 60 seconds.  You can use the startup commandline or solr.xml, I would
>> >> probably use the latter.  Here's a solr.xml fragment that uses a system
>> >> property or a 15 second default:
>> >>
>> >> <?xml version="1.0" encoding="UTF-8" ?>
>> >> <solr persistent="true" sharedLib="lib">
>> >>   <cores adminPath="/admin/cores"
>> >> zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
>> >> hostContext="solr">
>> >>
>> >> General thoughts, these changes might not help this particular issue:
>> >> You've got autoCommit with openSearcher=true.  This is a hard commit.
>> >> If it were me, I would set that up with openSearcher=false and either do
>> >> explicit soft commits from my application or set up autoSoftCommit with
>> >> a shorter timeframe than autoCommit.
>> >>
>> >> This might simply be a scaling issue, where you'll need to spread the
>> >> load wider than four shards.  I know that there are financial
>> >> considerations with that, and they might not be small, so let's leave
>> >> that alone for now.
>> >>
>> >> The memory problems might be a symptom/cause of the scaling issue I just
>> >> mentioned.  You said you're using facets, which can be a real memory hog
>> >> even with only a few of them.  Have you tried facet.method=enum to see
>> >> how it performs?  You'd need to switch to it exclusively, never go with
>> >> the default of fc.  You could put that in the defaults or invariants
>> >> section of your request handler(s).
>> >>
>> >> Another way to reduce memory usage for facets is to use disk-based
>> >> docValues on version 4.2 or later for the facet fields, but this will
>> >> increase your index size, and your index is already quite large.
>> >> Depending on your index contents, the increase may be small or large.
>> >>
>> >> Something to just mention: It looks like your solrconfig.xml has
>> >> hard-coded absolute paths for dataDir and updateLog.  This is fine if
>> >> you'll only ever have one core/collection on each server, but it'll be a
>> >> disaster if you have multiples.  I could be wrong about how these get
>> >> interpreted in SolrCloud -- they might actually be relative despite
>> >> starting with a slash.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >>
>> >
>> >
>> > --
>> >
>> > Annette Newton
>> >
>> > Database Administrator
>> >
>> > ServiceTick Ltd
>> >
>> >
>> >
>> > T:+44(0)1603 618326
>> >
>> >
>> >
>> > Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ
>> >
>> > www.servicetick.com
>> >
>> > *www.sessioncam.com*
>> >
>> > --
>> > *This message is confidential and is intended to be read solely by the
>> > addressee. The contents should not be disclosed to any other person or
>> > copies taken unless authorised to do so. If you are not the intended
>> > recipient, please notify the sender and permanently delete this message.
>> As
>> > Internet communications are not secure ServiceTick accepts neither legal
>> > responsibility for the contents of this message nor responsibility for
>> any
>> > change made to this message after it was forwarded by the original
>> author.*
>>
>
>
>
> --
>
> Annette Newton
>
> Database Administrator
>
> ServiceTick Ltd
>
>
>
> T:+44(0)1603 618326
>
>
>
> Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ
>
> www.servicetick.com
>
> *www.sessioncam.com*
>
> --
> *This message is confidential and is intended to be read solely by the
> addressee. The contents should not be disclosed to any other person or
> copies taken unless authorised to do so. If you are not the intended
> recipient, please notify the sender and permanently delete this message. As
> Internet communications are not secure ServiceTick accepts neither legal
> responsibility for the contents of this message nor responsibility for any
> change made to this message after it was forwarded by the original author.*

Re: Delete from Solr Cloud 4.0 index..

Posted by Annette Newton <an...@servicetick.com>.

Hi Erick,

Thanks for the tip.

Will docValues help with memory usage?  It seemed a bit complicated to set
up..

The index size saving was nice because that means that potentially I could
use smaller provisioned IOP volumes which cost less...

Thanks.


On 3 May 2013 18:27, Erick Erickson <er...@gmail.com> wrote:

> Anette:
>
> Be a little careful with the index size savings, they really don't
> mean much for _searching_. The sotred field compression
> significantly reduces the size on disk, but only for the stored
> data which is only accessed when returning the top N docs. In
> terms of how many docs you can fit on your hardware, it's pretty
> irrelevant.
>
> The *.fdt and *.fdx files in your index directory contain the stored
> data, so when looking at the effects of various options (including
> compression), you can pretty much ignore these files.
>
> FWIW,
> Erick
>
> On Fri, May 3, 2013 at 2:03 AM, Annette Newton
> <an...@servicetick.com> wrote:
> > Thanks Shawn.
> >
> > I have played around with Soft Commits before and didn't seem to have any
> > improvement, but with the current load testing I am doing I will give it
> > another go.
> >
> > I have researched docValues and came across the fact that it would
> increase
> > the index size.  With the upgrade to 4.2.1 the index size has reduced by
> > approx 33% which is pleasing and I don't really want to lose that saving.
> >
> > We do use the facet.enum method - which works really well, but I will
> > verify that we are using that in every instance, we have numerous
> > developers working on the product and maybe one or two have slipped
> > through.
> >
> > Right from the first I upped the zkClientTimeout to 30 as I wanted to
> give
> > extra time for any network blips that we experience on AWS.  We only seem
> > to drop communication on a full garbage collection though.
> >
> > I am coming to the conclusion that we need to have more shards to cope
> with
> > the writes, so I will play around with adding more shards and see how I
> go.
> >
> >
> > I appreciate you having a look over our setup and the advice.
> >
> > Thanks again.
> >
> > Netty.
> >
> >
> > On 2 May 2013 23:17, Shawn Heisey <so...@elyograg.org> wrote:
> >
> >> On 5/2/2013 4:24 AM, Annette Newton wrote:
> >> > Hi Shawn,
> >> >
> >> > Thanks so much for your response.  We basically are very write
> intensive
> >> > and write throughput is pretty essential to our product.  Reads are
> >> > sporadic and actually is functioning really well.
> >> >
> >> > We write on average (at the moment) 8-12 batches of 35 documents per
> >> > minute.  But we really will be looking to write more in the future, so
> >> need
> >> > to work out scaling of solr and how to cope with more volume.
> >> >
> >> > Schema (I have changed the names) :
> >> >
> >> > http://pastebin.com/x1ry7ieW
> >> >
> >> > Config:
> >> >
> >> > http://pastebin.com/pqjTCa7L
> >>
> >> This is very clean.  There's probably more you could remove/comment, but
> >> generally speaking I couldn't find any glaring issues.  In particular,
> >> you have disabled autowarming, which is a major contributor to commit
> >> speed problems.
> >>
> >> The first thing I think I'd try is increasing zkClientTimeout to 30 or
> >> 60 seconds.  You can use the startup commandline or solr.xml, I would
> >> probably use the latter.  Here's a solr.xml fragment that uses a system
> >> property or a 15 second default:
> >>
> >> <?xml version="1.0" encoding="UTF-8" ?>
> >> <solr persistent="true" sharedLib="lib">
> >>   <cores adminPath="/admin/cores"
> >> zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
> >> hostContext="solr">
> >>
> >> General thoughts, these changes might not help this particular issue:
> >> You've got autoCommit with openSearcher=true.  This is a hard commit.
> >> If it were me, I would set that up with openSearcher=false and either do
> >> explicit soft commits from my application or set up autoSoftCommit with
> >> a shorter timeframe than autoCommit.
> >>
> >> This might simply be a scaling issue, where you'll need to spread the
> >> load wider than four shards.  I know that there are financial
> >> considerations with that, and they might not be small, so let's leave
> >> that alone for now.
> >>
> >> The memory problems might be a symptom/cause of the scaling issue I just
> >> mentioned.  You said you're using facets, which can be a real memory hog
> >> even with only a few of them.  Have you tried facet.method=enum to see
> >> how it performs?  You'd need to switch to it exclusively, never go with
> >> the default of fc.  You could put that in the defaults or invariants
> >> section of your request handler(s).
> >>
> >> Another way to reduce memory usage for facets is to use disk-based
> >> docValues on version 4.2 or later for the facet fields, but this will
> >> increase your index size, and your index is already quite large.
> >> Depending on your index contents, the increase may be small or large.
> >>
> >> Something to just mention: It looks like your solrconfig.xml has
> >> hard-coded absolute paths for dataDir and updateLog.  This is fine if
> >> you'll only ever have one core/collection on each server, but it'll be a
> >> disaster if you have multiples.  I could be wrong about how these get
> >> interpreted in SolrCloud -- they might actually be relative despite
> >> starting with a slash.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >
> >
> > --
> >
> > Annette Newton
> >
> > Database Administrator
> >
> > ServiceTick Ltd
> >
> >
> >
> > T:+44(0)1603 618326
> >
> >
> >
> > Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ
> >
> > www.servicetick.com
> >
> > *www.sessioncam.com*
> >
> > --
> > *This message is confidential and is intended to be read solely by the
> > addressee. The contents should not be disclosed to any other person or
> > copies taken unless authorised to do so. If you are not the intended
> > recipient, please notify the sender and permanently delete this message.
> As
> > Internet communications are not secure ServiceTick accepts neither legal
> > responsibility for the contents of this message nor responsibility for
> any
> > change made to this message after it was forwarded by the original
> author.*
>



-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*

Re: Delete from Solr Cloud 4.0 index..

Posted by Erick Erickson <er...@gmail.com>.

Anette:

Be a little careful with the index size savings, they really don't
mean much for _searching_. The sotred field compression
significantly reduces the size on disk, but only for the stored
data which is only accessed when returning the top N docs. In
terms of how many docs you can fit on your hardware, it's pretty
irrelevant.

The *.fdt and *.fdx files in your index directory contain the stored
data, so when looking at the effects of various options (including
compression), you can pretty much ignore these files.

FWIW,
Erick

On Fri, May 3, 2013 at 2:03 AM, Annette Newton
<an...@servicetick.com> wrote:
> Thanks Shawn.
>
> I have played around with Soft Commits before and didn't seem to have any
> improvement, but with the current load testing I am doing I will give it
> another go.
>
> I have researched docValues and came across the fact that it would increase
> the index size.  With the upgrade to 4.2.1 the index size has reduced by
> approx 33% which is pleasing and I don't really want to lose that saving.
>
> We do use the facet.enum method - which works really well, but I will
> verify that we are using that in every instance, we have numerous
> developers working on the product and maybe one or two have slipped
> through.
>
> Right from the first I upped the zkClientTimeout to 30 as I wanted to give
> extra time for any network blips that we experience on AWS.  We only seem
> to drop communication on a full garbage collection though.
>
> I am coming to the conclusion that we need to have more shards to cope with
> the writes, so I will play around with adding more shards and see how I go.
>
>
> I appreciate you having a look over our setup and the advice.
>
> Thanks again.
>
> Netty.
>
>
> On 2 May 2013 23:17, Shawn Heisey <so...@elyograg.org> wrote:
>
>> On 5/2/2013 4:24 AM, Annette Newton wrote:
>> > Hi Shawn,
>> >
>> > Thanks so much for your response.  We basically are very write intensive
>> > and write throughput is pretty essential to our product.  Reads are
>> > sporadic and actually is functioning really well.
>> >
>> > We write on average (at the moment) 8-12 batches of 35 documents per
>> > minute.  But we really will be looking to write more in the future, so
>> need
>> > to work out scaling of solr and how to cope with more volume.
>> >
>> > Schema (I have changed the names) :
>> >
>> > http://pastebin.com/x1ry7ieW
>> >
>> > Config:
>> >
>> > http://pastebin.com/pqjTCa7L
>>
>> This is very clean.  There's probably more you could remove/comment, but
>> generally speaking I couldn't find any glaring issues.  In particular,
>> you have disabled autowarming, which is a major contributor to commit
>> speed problems.
>>
>> The first thing I think I'd try is increasing zkClientTimeout to 30 or
>> 60 seconds.  You can use the startup commandline or solr.xml, I would
>> probably use the latter.  Here's a solr.xml fragment that uses a system
>> property or a 15 second default:
>>
>> <?xml version="1.0" encoding="UTF-8" ?>
>> <solr persistent="true" sharedLib="lib">
>>   <cores adminPath="/admin/cores"
>> zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
>> hostContext="solr">
>>
>> General thoughts, these changes might not help this particular issue:
>> You've got autoCommit with openSearcher=true.  This is a hard commit.
>> If it were me, I would set that up with openSearcher=false and either do
>> explicit soft commits from my application or set up autoSoftCommit with
>> a shorter timeframe than autoCommit.
>>
>> This might simply be a scaling issue, where you'll need to spread the
>> load wider than four shards.  I know that there are financial
>> considerations with that, and they might not be small, so let's leave
>> that alone for now.
>>
>> The memory problems might be a symptom/cause of the scaling issue I just
>> mentioned.  You said you're using facets, which can be a real memory hog
>> even with only a few of them.  Have you tried facet.method=enum to see
>> how it performs?  You'd need to switch to it exclusively, never go with
>> the default of fc.  You could put that in the defaults or invariants
>> section of your request handler(s).
>>
>> Another way to reduce memory usage for facets is to use disk-based
>> docValues on version 4.2 or later for the facet fields, but this will
>> increase your index size, and your index is already quite large.
>> Depending on your index contents, the increase may be small or large.
>>
>> Something to just mention: It looks like your solrconfig.xml has
>> hard-coded absolute paths for dataDir and updateLog.  This is fine if
>> you'll only ever have one core/collection on each server, but it'll be a
>> disaster if you have multiples.  I could be wrong about how these get
>> interpreted in SolrCloud -- they might actually be relative despite
>> starting with a slash.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
>
> Annette Newton
>
> Database Administrator
>
> ServiceTick Ltd
>
>
>
> T:+44(0)1603 618326
>
>
>
> Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ
>
> www.servicetick.com
>
> *www.sessioncam.com*
>
> --
> *This message is confidential and is intended to be read solely by the
> addressee. The contents should not be disclosed to any other person or
> copies taken unless authorised to do so. If you are not the intended
> recipient, please notify the sender and permanently delete this message. As
> Internet communications are not secure ServiceTick accepts neither legal
> responsibility for the contents of this message nor responsibility for any
> change made to this message after it was forwarded by the original author.*

Re: Delete from Solr Cloud 4.0 index..

Posted by Annette Newton <an...@servicetick.com>.

Thanks Shawn.

I have played around with Soft Commits before and didn't seem to have any
improvement, but with the current load testing I am doing I will give it
another go.

I have researched docValues and came across the fact that it would increase
the index size.  With the upgrade to 4.2.1 the index size has reduced by
approx 33% which is pleasing and I don't really want to lose that saving.

We do use the facet.enum method - which works really well, but I will
verify that we are using that in every instance, we have numerous
developers working on the product and maybe one or two have slipped
through.

Right from the first I upped the zkClientTimeout to 30 as I wanted to give
extra time for any network blips that we experience on AWS.  We only seem
to drop communication on a full garbage collection though.

I am coming to the conclusion that we need to have more shards to cope with
the writes, so I will play around with adding more shards and see how I go.


I appreciate you having a look over our setup and the advice.

Thanks again.

Netty.


On 2 May 2013 23:17, Shawn Heisey <so...@elyograg.org> wrote:

> On 5/2/2013 4:24 AM, Annette Newton wrote:
> > Hi Shawn,
> >
> > Thanks so much for your response.  We basically are very write intensive
> > and write throughput is pretty essential to our product.  Reads are
> > sporadic and actually is functioning really well.
> >
> > We write on average (at the moment) 8-12 batches of 35 documents per
> > minute.  But we really will be looking to write more in the future, so
> need
> > to work out scaling of solr and how to cope with more volume.
> >
> > Schema (I have changed the names) :
> >
> > http://pastebin.com/x1ry7ieW
> >
> > Config:
> >
> > http://pastebin.com/pqjTCa7L
>
> This is very clean.  There's probably more you could remove/comment, but
> generally speaking I couldn't find any glaring issues.  In particular,
> you have disabled autowarming, which is a major contributor to commit
> speed problems.
>
> The first thing I think I'd try is increasing zkClientTimeout to 30 or
> 60 seconds.  You can use the startup commandline or solr.xml, I would
> probably use the latter.  Here's a solr.xml fragment that uses a system
> property or a 15 second default:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <solr persistent="true" sharedLib="lib">
>   <cores adminPath="/admin/cores"
> zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
> hostContext="solr">
>
> General thoughts, these changes might not help this particular issue:
> You've got autoCommit with openSearcher=true.  This is a hard commit.
> If it were me, I would set that up with openSearcher=false and either do
> explicit soft commits from my application or set up autoSoftCommit with
> a shorter timeframe than autoCommit.
>
> This might simply be a scaling issue, where you'll need to spread the
> load wider than four shards.  I know that there are financial
> considerations with that, and they might not be small, so let's leave
> that alone for now.
>
> The memory problems might be a symptom/cause of the scaling issue I just
> mentioned.  You said you're using facets, which can be a real memory hog
> even with only a few of them.  Have you tried facet.method=enum to see
> how it performs?  You'd need to switch to it exclusively, never go with
> the default of fc.  You could put that in the defaults or invariants
> section of your request handler(s).
>
> Another way to reduce memory usage for facets is to use disk-based
> docValues on version 4.2 or later for the facet fields, but this will
> increase your index size, and your index is already quite large.
> Depending on your index contents, the increase may be small or large.
>
> Something to just mention: It looks like your solrconfig.xml has
> hard-coded absolute paths for dataDir and updateLog.  This is fine if
> you'll only ever have one core/collection on each server, but it'll be a
> disaster if you have multiples.  I could be wrong about how these get
> interpreted in SolrCloud -- they might actually be relative despite
> starting with a slash.
>
> Thanks,
> Shawn
>
>


-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*

Re: Delete from Solr Cloud 4.0 index..

Posted by Shawn Heisey <so...@elyograg.org>.

On 5/2/2013 4:24 AM, Annette Newton wrote:
> Hi Shawn,
> 
> Thanks so much for your response.  We basically are very write intensive
> and write throughput is pretty essential to our product.  Reads are
> sporadic and actually is functioning really well.
> 
> We write on average (at the moment) 8-12 batches of 35 documents per
> minute.  But we really will be looking to write more in the future, so need
> to work out scaling of solr and how to cope with more volume.
> 
> Schema (I have changed the names) :
> 
> http://pastebin.com/x1ry7ieW
> 
> Config:
> 
> http://pastebin.com/pqjTCa7L

This is very clean.  There's probably more you could remove/comment, but
generally speaking I couldn't find any glaring issues.  In particular,
you have disabled autowarming, which is a major contributor to commit
speed problems.

The first thing I think I'd try is increasing zkClientTimeout to 30 or
60 seconds.  You can use the startup commandline or solr.xml, I would
probably use the latter.  Here's a solr.xml fragment that uses a system
property or a 15 second default:

<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true" sharedLib="lib">
  <cores adminPath="/admin/cores"
zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
hostContext="solr">

General thoughts, these changes might not help this particular issue:
You've got autoCommit with openSearcher=true.  This is a hard commit.
If it were me, I would set that up with openSearcher=false and either do
explicit soft commits from my application or set up autoSoftCommit with
a shorter timeframe than autoCommit.

This might simply be a scaling issue, where you'll need to spread the
load wider than four shards.  I know that there are financial
considerations with that, and they might not be small, so let's leave
that alone for now.

The memory problems might be a symptom/cause of the scaling issue I just
mentioned.  You said you're using facets, which can be a real memory hog
even with only a few of them.  Have you tried facet.method=enum to see
how it performs?  You'd need to switch to it exclusively, never go with
the default of fc.  You could put that in the defaults or invariants
section of your request handler(s).

Another way to reduce memory usage for facets is to use disk-based
docValues on version 4.2 or later for the facet fields, but this will
increase your index size, and your index is already quite large.
Depending on your index contents, the increase may be small or large.

Something to just mention: It looks like your solrconfig.xml has
hard-coded absolute paths for dataDir and updateLog.  This is fine if
you'll only ever have one core/collection on each server, but it'll be a
disaster if you have multiples.  I could be wrong about how these get
interpreted in SolrCloud -- they might actually be relative despite
starting with a slash.

Thanks,
Shawn

Re: Delete from Solr Cloud 4.0 index..

Posted by Annette Newton <an...@servicetick.com>.

Hi Shawn,

Thanks so much for your response.  We basically are very write intensive
and write throughput is pretty essential to our product.  Reads are
sporadic and actually is functioning really well.

We write on average (at the moment) 8-12 batches of 35 documents per
minute.  But we really will be looking to write more in the future, so need
to work out scaling of solr and how to cope with more volume.

Schema (I have changed the names) :

http://pastebin.com/x1ry7ieW

Config:

http://pastebin.com/pqjTCa7L

As you can see we haven't played around much with caches and such.  I am
now load testing on 4.2.1 and will be re-indexing our data so now is really
the time to make any tweeks we can to get the throughput we want.

We query based mostly on the latest documents added and use facet to
populate drop downs for distinct values which the selection then gets added
to the basic query of:

rows=20&df=text&fl=Id,EP,ExP,PC,UTCTime,CIp,Br,OS,LU&start=0&q=UTCTime:[2013-04-25T23:00:00Z+TO+2013-05-02T22:00:00Z]+AND+H:(https\:\/\/.com)&sort=UTCTime+desc


So we will add further fields onto the above, typically users are adding
only 1 or 2 further restrictions.

Facet queries will be the same as the above, we always restrict by the date
and the customer reference.

Hope this is enough information to be going on with.  Again thanks for your
help.

Netty.







On 1 May 2013 17:31, Shawn Heisey <so...@elyograg.org> wrote:

> On 5/1/2013 8:42 AM, Annette Newton wrote:
>
>> It was a single delete with a date range query.  We have 8 machines each
>> with 35GB memory, 10GB is allocated to the JVM.  Garbage collection has
>> always been a problem for us with the heap not clearing on Full garbage
>> collection.  I don't know what is being held in memory and refuses to be
>> collected.
>>
>> I have seen your java heap configuration on previous posts and it's very
>> like ours except that we are not currently using LargePages (I don't know
>> how much difference that has made to your memory usage).
>>
>> We have tried various configurations around Java including the G1
>> collector
>> (which was awful) but all settings seem to leave the old generation at
>> least 50% full, so it quickly fills up again.
>>
>> -Xms10240M -Xmx10240M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>> -XX:+CMSParallelRemarkEnabled -XX:NewRatio=2 -XX:+CMSScavengeBeforeRemark
>> -XX:CMSWaitDuration=5000  -XX:+CMSClassUnloadingEnabled
>> -XX:**CMSInitiatingOccupancyFraction**=80 -XX:+**
>> UseCMSInitiatingOccupancyOnly
>>
>> If I could only figure out what keeps the heap to the current level I feel
>> we would be in a better place with solr.
>>
>
> With a single delete request, it was probably the commit that was very
> slow and caused the problem, not the delete itself.  This has been my
> experience with my large indexes.
>
> My attempts with the G1 collector were similarly awful.  The idea seems
> sound on paper, but Oracle needs to do some work in making it better for
> large heaps.  Because my GC tuning was not very disciplined, I do not know
> how much impact UseLargePages is having.
>
> Your overall RAM allocation should be good.  If these machines aren't
> being used for other software, then you have 24-25GB of memory available
> for caching your index, which should be very good with 26GB of index for
> that machine.
>
> Looking over your message history, I see that you're using Amazon EC2.
> Solr performs much better on bare metal, although the EC2 instance you're
> using is probably very good.
>
> SolrCloud is optimized for machines that are on the same Ethernet LAN.
> Communication between EC2 VMs (especially if they are not located in nearby
> data centers) will have some latency and a potential for dropped packets.
>  I'm going to proceed with the idea that EC2 and virtualization are not the
> problems here.
>
> I'm not really surprised to hear that with an index of your size that so
> much of a 10GB heap is retained.  There may be things that could reduce
> your memory usage, so could you share your solrconfig.xml and schema.xml
> with a paste site that does XML highlighting (pastie.org being a good
> example), and give us an idea of how often you update and commit?  Feel
> free to search/replace sensitive information, as long that work is
> consistent and you don't entirely remove it.  Armed with that information,
> we can have a discussion about your needs and how to achieve them.
>
> Do you know how long cache autowarming is taking?  The cache statistics
> should tell you how long it took on the last commit.
>
> Some examples of typical real-world queries would be helpful too. Examples
> should be relatively complex for your setup, but not worst-case.  An
> example query for my setup that meets this requirement would probably be
> 4-10KB in size ... some of them are 20KB!
>
> Not really related - a question about one of your old messages that never
> seemed to get resolved:  Are you still seeing a lot of CLOSE_WAIT
> connections in your TCP table?  A later message from you mentioned 4.2.1,
> so I'm wondering specifically about that version.
>
> Thanks,
> Shawn
>
>


-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*

Re: Delete from Solr Cloud 4.0 index..

Posted by Shawn Heisey <so...@elyograg.org>.

On 5/1/2013 8:42 AM, Annette Newton wrote:
> It was a single delete with a date range query.  We have 8 machines each
> with 35GB memory, 10GB is allocated to the JVM.  Garbage collection has
> always been a problem for us with the heap not clearing on Full garbage
> collection.  I don't know what is being held in memory and refuses to be
> collected.
>
> I have seen your java heap configuration on previous posts and it's very
> like ours except that we are not currently using LargePages (I don't know
> how much difference that has made to your memory usage).
>
> We have tried various configurations around Java including the G1 collector
> (which was awful) but all settings seem to leave the old generation at
> least 50% full, so it quickly fills up again.
>
> -Xms10240M -Xmx10240M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> -XX:+CMSParallelRemarkEnabled -XX:NewRatio=2 -XX:+CMSScavengeBeforeRemark
> -XX:CMSWaitDuration=5000  -XX:+CMSClassUnloadingEnabled
> -XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly
>
> If I could only figure out what keeps the heap to the current level I feel
> we would be in a better place with solr.

With a single delete request, it was probably the commit that was very 
slow and caused the problem, not the delete itself.  This has been my 
experience with my large indexes.

My attempts with the G1 collector were similarly awful.  The idea seems 
sound on paper, but Oracle needs to do some work in making it better for 
large heaps.  Because my GC tuning was not very disciplined, I do not 
know how much impact UseLargePages is having.

Your overall RAM allocation should be good.  If these machines aren't 
being used for other software, then you have 24-25GB of memory available 
for caching your index, which should be very good with 26GB of index for 
that machine.

Looking over your message history, I see that you're using Amazon EC2. 
Solr performs much better on bare metal, although the EC2 instance 
you're using is probably very good.

SolrCloud is optimized for machines that are on the same Ethernet LAN. 
Communication between EC2 VMs (especially if they are not located in 
nearby data centers) will have some latency and a potential for dropped 
packets.  I'm going to proceed with the idea that EC2 and virtualization 
are not the problems here.

I'm not really surprised to hear that with an index of your size that so 
much of a 10GB heap is retained.  There may be things that could reduce 
your memory usage, so could you share your solrconfig.xml and schema.xml 
with a paste site that does XML highlighting (pastie.org being a good 
example), and give us an idea of how often you update and commit?  Feel 
free to search/replace sensitive information, as long that work is 
consistent and you don't entirely remove it.  Armed with that 
information, we can have a discussion about your needs and how to 
achieve them.

Do you know how long cache autowarming is taking?  The cache statistics 
should tell you how long it took on the last commit.

Some examples of typical real-world queries would be helpful too. 
Examples should be relatively complex for your setup, but not 
worst-case.  An example query for my setup that meets this requirement 
would probably be 4-10KB in size ... some of them are 20KB!

Not really related - a question about one of your old messages that 
never seemed to get resolved:  Are you still seeing a lot of CLOSE_WAIT 
connections in your TCP table?  A later message from you mentioned 
4.2.1, so I'm wondering specifically about that version.

Thanks,
Shawn

Re: Delete from Solr Cloud 4.0 index..

Posted by Annette Newton <an...@servicetick.com>.

Hi Shawn

Thanks for the reply.

It was a single delete with a date range query.  We have 8 machines each
with 35GB memory, 10GB is allocated to the JVM.  Garbage collection has
always been a problem for us with the heap not clearing on Full garbage
collection.  I don't know what is being held in memory and refuses to be
collected.

I have seen your java heap configuration on previous posts and it's very
like ours except that we are not currently using LargePages (I don't know
how much difference that has made to your memory usage).

We have tried various configurations around Java including the G1 collector
(which was awful) but all settings seem to leave the old generation at
least 50% full, so it quickly fills up again.

-Xms10240M -Xmx10240M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled -XX:NewRatio=2 -XX:+CMSScavengeBeforeRemark
-XX:CMSWaitDuration=5000  -XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=80 -XX:+UseCMSInitiatingOccupancyOnly

If I could only figure out what keeps the heap to the current level I feel
we would be in a better place with solr.

Thanks.



On 1 May 2013 14:40, Shawn Heisey <so...@elyograg.org> wrote:

> On 5/1/2013 3:39 AM, Annette Newton wrote:
> > We have a 4 shard - 2 replica solr cloud setup, each with about 26GB of
> > index.  A total of 24,000,000.  We issued a rather large delete yesterday
> > morning to reduce that size by about half, this resulted in the loss of
> all
> > shards while the delete was taking place, but when it had apparently
> > finished as soon as we started writing again we continued to lose shards.
> >
> > We have also issued much smaller deletes and lost shards but before they
> > have always come back ok.  This time we couldn't keep them online.  We
> > ended up rebuilding out cloud setup and switching over to it.
> >
> > Is there a better process for deleting documents?  Is this expected
> > behaviour?
>
> How was the delete composed?  Was it a single request with a simple
> query, or was a it a huge list of IDs or a huge query?  Was it millions
> of individual delete queries?  All of those should be fine, but the last
> option is the hardest on Solr, especially if you are doing a lot of
> commits at the same time.  You might need to increase the zkTimeout
> value on your startup commandline or in solr.xml.
>
> How many machines do your eight SolrCloud replicas live on? How much RAM
> to they have? How much of that memory is allocated to the Java heap?
>
> Assuming that your SolrCloud is living on eight separate machines that
> each have a 26GB index, I hope that you have 16 to 32 GB of RAM on each
> of those machines, and that a large chunk of that RAM is not allocated
> to Java or any other program.  If you don't, then it will be very
> difficult to get good performance out of Solr, especially for index
> commits.  If you have multiple 26GB shards per machine, you'll need even
> more free memory.  The free memory is used to cache your index files.
>
> Another possible problem here is Java garbage collection pauses.  If you
> have a large max heap and don't have a tuned GC configuration, then the
> only way to fix this is to reduce your heap and/or to tune Java's
> garbage collection.
>
> Thanks,
> Shawn
>
>


-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*

Re: Delete from Solr Cloud 4.0 index..

Posted by Shawn Heisey <so...@elyograg.org>.

On 5/1/2013 3:39 AM, Annette Newton wrote:
> We have a 4 shard - 2 replica solr cloud setup, each with about 26GB of
> index.  A total of 24,000,000.  We issued a rather large delete yesterday
> morning to reduce that size by about half, this resulted in the loss of all
> shards while the delete was taking place, but when it had apparently
> finished as soon as we started writing again we continued to lose shards.
> 
> We have also issued much smaller deletes and lost shards but before they
> have always come back ok.  This time we couldn't keep them online.  We
> ended up rebuilding out cloud setup and switching over to it.
> 
> Is there a better process for deleting documents?  Is this expected
> behaviour?

How was the delete composed?  Was it a single request with a simple
query, or was a it a huge list of IDs or a huge query?  Was it millions
of individual delete queries?  All of those should be fine, but the last
option is the hardest on Solr, especially if you are doing a lot of
commits at the same time.  You might need to increase the zkTimeout
value on your startup commandline or in solr.xml.

How many machines do your eight SolrCloud replicas live on? How much RAM
to they have? How much of that memory is allocated to the Java heap?

Assuming that your SolrCloud is living on eight separate machines that
each have a 26GB index, I hope that you have 16 to 32 GB of RAM on each
of those machines, and that a large chunk of that RAM is not allocated
to Java or any other program.  If you don't, then it will be very
difficult to get good performance out of Solr, especially for index
commits.  If you have multiple 26GB shards per machine, you'll need even
more free memory.  The free memory is used to cache your index files.

Another possible problem here is Java garbage collection pauses.  If you
have a large max heap and don't have a tuned GC configuration, then the
only way to fix this is to reduce your heap and/or to tune Java's
garbage collection.

Thanks,
Shawn