You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Karol Dudzinski <ka...@gmail.com> on 2015/02/22 11:44:26 UTC

What goes in the snapshot?

Hi all,

I was under the impression that the snapshot contained essentially an on-disk copy of all the data.  However, one of our clusters has a snapshot which is over 1GB while the mntr four letter word reports an approximate data size in the hundreds of KB and a node count in the low thousands.  So what else goes into the snapshot and how can I slim it down?

Thanks,
Karol



Re: What goes in the snapshot?

Posted by Flavio Junqueira <fp...@yahoo.com.INVALID>.
Hi Karol,
I'll have a look. We are a bit busy with the next release for the 3.5 branch (RC should go out this week), so don't worry if we don't say anything about them in the next few days. If we take too long, feel free to ping me on this list again.
-Flavio 


     On Monday, March 16, 2015 4:23 PM, "Dudzinski, Karol" <Ka...@gs.com> wrote:
   
 

 Hi Flavio,

I've created a JIRA for this: https://issues.apache.org/jira/browse/ZOOKEEPER-2141  I'll upload a patch to demonstrate the approach I was considering shortly.

While I was at it, I submitted a few other JIRAs for some issues we've hit.  I'm happy to submit patches for all of them but would appreciate some comments from the committers about the approaches  or even the validity of what I'm suggesting.

The other JIRAs are:
https://issues.apache.org/jira/browse/ZOOKEEPER-2142
https://issues.apache.org/jira/browse/ZOOKEEPER-2143
https://issues.apache.org/jira/browse/ZOOKEEPER-2144

Thanks,
Karol

The Goldman Sachs Group, Inc. All rights reserved.
See http://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest and other terms and conditions relating to this e-mail and your reliance on information contained in it.  This message may contain confidential or privileged information.  If you are not the intended recipient, please advise us immediately and delete this message.  See http://www.gs.com/disclaimer/email for further information on confidentiality and the risks of non-secure electronic communication.  If you cannot access these links, please notify us by reply message and we will send the contents to you.

-----Original Message-----
From: Flavio Junqueira [mailto:fpjunqueira@yahoo.com.INVALID] 
Sent: 26 February 2015 22:53
To: user@zookeeper.apache.org
Cc: adam@milne-smith.co.uk
Subject: Re: What goes in the snapshot?

Hi Karol,

The use of reference counters might be a good way around it. To make it backward compatible, I think we can optionally use the counters if the third map is present in the snapshot. Would it work?

I also think it would be good to create a jira for this so that we can track this discussion and propose patches.

-Flavio

> On 26 Feb 2015, at 13:13, Karol Dudzinski <ka...@gmail.com> wrote:
> 
> Hi Flavio,
> 
> We've done some more analysis using the snapshot formatter and a heap dump and have found the source of the snapshot bloat.
> 
> What is taking  the majority of the space is the longKeyMap from DataTree.  In the heapdump, aclKeyMap has as many entries (which is to be expected given how the maps are used) and is also taking an equally large amount of space though at least aclKeyMap isn't serialised to the snapshot.
> 
> We use a custom authentication provider but because the AuthenticationProvider.matches method does not provide the path being operated on, we end up sticking the path in the ACL id.  Some of our apps end up generating a lot of paths for one time use and consequently we end up with lots of unique ACLs.
> 
> The two ACL maps in DataTree seem to be an optimisation so that repeated usage of ACLs does not result in the full list being stored multiple times.  However, these two maps are never removed from so if an ACL is unique these maps (and the snapshot) grow forever.
> 
> We're quite keen on fixing this as it's causing us lots of issues and we're happy to provide a patch but will need your opinion on the various options:
> - create a third map which would be a reference count for the ACLs which can be updated as needed when creating, deleting or setting ACL.  When the reference count is 0, remove the entry from all the maps
> - use weak references in some shape or form though this is made harder by the fact that ACL optimisation essentially needs a bidirectional index (hence the two maps).  We've given this one lots of thought but it would really require something like a ConcurrentWeakBiHashMap which just sounds wrong and over engineered :)
> 
> The other fix that could be made is to pass the path being operated on to the AuthenticationProvider.  However, doing that in a backwards compatible fashion is not trivial and even though it would fix my problem (by allowing me to remove the path from the ACL id) it wouldn't fix the general problem with this optimisation.
> 
> Looking forward to hearing your thoughts on this.
> 
> Thanks,
> Karol
> 
>> On 22 Feb 2015, at 14:55, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
>> 
>> Hi Karol,
>> 
>> It's odd that you have such large snapshots and little data in the data tree. Are you creating lots of sessions? Right now I can't think of a good reason, I suggest you really use the snapshot formatter to inspect the snapshot. 
>> 
>> -Flavio
>> 
>>> On 22 Feb 2015, at 14:23, Karol Dudzinski <ka...@gmail.com> wrote:
>>> 
>>> Hi Flavio,
>>> 
>>> Yes, one of ours clients had a bug which caused it to go into a create/delete tight loop with zero net effect (I.e. It was deleting what it had just created). After stopping the client, the snapshot never reduced in size so are the deletes in there permanently?
>>> 
>>> Thanks,
>>> Karol
>>> 
>>> 
>>>> On 22 Feb 2015, at 14:05, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
>>>> 
>>>> Hi there,
>>>> 
>>>> Perhaps a lot of data has been deleted? In any case, you may want to use the SnapshotFormatter to check what is in the large snapshot.
>>>> 
>>>> -Flavio
>>>> 
>>>>> On 22 Feb 2015, at 10:44, Karol Dudzinski <ka...@gmail.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I was under the impression that the snapshot contained essentially an on-disk copy of all the data.  However, one of our clusters has a snapshot which is over 1GB while the mntr four letter word reports an approximate data size in the hundreds of KB and a node count in the low thousands.  So what else goes into the snapshot and how can I slim it down?
>>>>> 
>>>>> Thanks,
>>>>> Karol
>> 


 
  

Re: What goes in the snapshot?

Posted by Flavio Junqueira <fp...@yahoo.com.INVALID>.
Hi Karol,
I'll have a look. We are a bit busy with the next release for the 3.5 branch (RC should go out this week), so don't worry if we don't say anything about them in the next few days. If we take too long, feel free to ping me on this list again.
-Flavio 


     On Monday, March 16, 2015 4:23 PM, "Dudzinski, Karol" <Ka...@gs.com> wrote:
   
 

 Hi Flavio,

I've created a JIRA for this: https://issues.apache.org/jira/browse/ZOOKEEPER-2141  I'll upload a patch to demonstrate the approach I was considering shortly.

While I was at it, I submitted a few other JIRAs for some issues we've hit.  I'm happy to submit patches for all of them but would appreciate some comments from the committers about the approaches  or even the validity of what I'm suggesting.

The other JIRAs are:
https://issues.apache.org/jira/browse/ZOOKEEPER-2142
https://issues.apache.org/jira/browse/ZOOKEEPER-2143
https://issues.apache.org/jira/browse/ZOOKEEPER-2144

Thanks,
Karol

The Goldman Sachs Group, Inc. All rights reserved.
See http://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest and other terms and conditions relating to this e-mail and your reliance on information contained in it.  This message may contain confidential or privileged information.  If you are not the intended recipient, please advise us immediately and delete this message.  See http://www.gs.com/disclaimer/email for further information on confidentiality and the risks of non-secure electronic communication.  If you cannot access these links, please notify us by reply message and we will send the contents to you.

-----Original Message-----
From: Flavio Junqueira [mailto:fpjunqueira@yahoo.com.INVALID] 
Sent: 26 February 2015 22:53
To: user@zookeeper.apache.org
Cc: adam@milne-smith.co.uk
Subject: Re: What goes in the snapshot?

Hi Karol,

The use of reference counters might be a good way around it. To make it backward compatible, I think we can optionally use the counters if the third map is present in the snapshot. Would it work?

I also think it would be good to create a jira for this so that we can track this discussion and propose patches.

-Flavio

> On 26 Feb 2015, at 13:13, Karol Dudzinski <ka...@gmail.com> wrote:
> 
> Hi Flavio,
> 
> We've done some more analysis using the snapshot formatter and a heap dump and have found the source of the snapshot bloat.
> 
> What is taking  the majority of the space is the longKeyMap from DataTree.  In the heapdump, aclKeyMap has as many entries (which is to be expected given how the maps are used) and is also taking an equally large amount of space though at least aclKeyMap isn't serialised to the snapshot.
> 
> We use a custom authentication provider but because the AuthenticationProvider.matches method does not provide the path being operated on, we end up sticking the path in the ACL id.  Some of our apps end up generating a lot of paths for one time use and consequently we end up with lots of unique ACLs.
> 
> The two ACL maps in DataTree seem to be an optimisation so that repeated usage of ACLs does not result in the full list being stored multiple times.  However, these two maps are never removed from so if an ACL is unique these maps (and the snapshot) grow forever.
> 
> We're quite keen on fixing this as it's causing us lots of issues and we're happy to provide a patch but will need your opinion on the various options:
> - create a third map which would be a reference count for the ACLs which can be updated as needed when creating, deleting or setting ACL.  When the reference count is 0, remove the entry from all the maps
> - use weak references in some shape or form though this is made harder by the fact that ACL optimisation essentially needs a bidirectional index (hence the two maps).  We've given this one lots of thought but it would really require something like a ConcurrentWeakBiHashMap which just sounds wrong and over engineered :)
> 
> The other fix that could be made is to pass the path being operated on to the AuthenticationProvider.  However, doing that in a backwards compatible fashion is not trivial and even though it would fix my problem (by allowing me to remove the path from the ACL id) it wouldn't fix the general problem with this optimisation.
> 
> Looking forward to hearing your thoughts on this.
> 
> Thanks,
> Karol
> 
>> On 22 Feb 2015, at 14:55, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
>> 
>> Hi Karol,
>> 
>> It's odd that you have such large snapshots and little data in the data tree. Are you creating lots of sessions? Right now I can't think of a good reason, I suggest you really use the snapshot formatter to inspect the snapshot. 
>> 
>> -Flavio
>> 
>>> On 22 Feb 2015, at 14:23, Karol Dudzinski <ka...@gmail.com> wrote:
>>> 
>>> Hi Flavio,
>>> 
>>> Yes, one of ours clients had a bug which caused it to go into a create/delete tight loop with zero net effect (I.e. It was deleting what it had just created). After stopping the client, the snapshot never reduced in size so are the deletes in there permanently?
>>> 
>>> Thanks,
>>> Karol
>>> 
>>> 
>>>> On 22 Feb 2015, at 14:05, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
>>>> 
>>>> Hi there,
>>>> 
>>>> Perhaps a lot of data has been deleted? In any case, you may want to use the SnapshotFormatter to check what is in the large snapshot.
>>>> 
>>>> -Flavio
>>>> 
>>>>> On 22 Feb 2015, at 10:44, Karol Dudzinski <ka...@gmail.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I was under the impression that the snapshot contained essentially an on-disk copy of all the data.  However, one of our clusters has a snapshot which is over 1GB while the mntr four letter word reports an approximate data size in the hundreds of KB and a node count in the low thousands.  So what else goes into the snapshot and how can I slim it down?
>>>>> 
>>>>> Thanks,
>>>>> Karol
>> 


 
  

RE: What goes in the snapshot?

Posted by "Dudzinski, Karol" <Ka...@gs.com>.
Hi Flavio,

I've created a JIRA for this: https://issues.apache.org/jira/browse/ZOOKEEPER-2141  I'll upload a patch to demonstrate the approach I was considering shortly.

While I was at it, I submitted a few other JIRAs for some issues we've hit.  I'm happy to submit patches for all of them but would appreciate some comments from the committers about the approaches  or even the validity of what I'm suggesting.

The other JIRAs are:
https://issues.apache.org/jira/browse/ZOOKEEPER-2142
https://issues.apache.org/jira/browse/ZOOKEEPER-2143
https://issues.apache.org/jira/browse/ZOOKEEPER-2144

Thanks,
Karol

The Goldman Sachs Group, Inc. All rights reserved.
See http://www.gs.com/disclaimer/global_email for important risk disclosures, conflicts of interest and other terms and conditions relating to this e-mail and your reliance on information contained in it.  This message may contain confidential or privileged information.  If you are not the intended recipient, please advise us immediately and delete this message.  See http://www.gs.com/disclaimer/email for further information on confidentiality and the risks of non-secure electronic communication.  If you cannot access these links, please notify us by reply message and we will send the contents to you.

-----Original Message-----
From: Flavio Junqueira [mailto:fpjunqueira@yahoo.com.INVALID] 
Sent: 26 February 2015 22:53
To: user@zookeeper.apache.org
Cc: adam@milne-smith.co.uk
Subject: Re: What goes in the snapshot?

Hi Karol,

The use of reference counters might be a good way around it. To make it backward compatible, I think we can optionally use the counters if the third map is present in the snapshot. Would it work?

I also think it would be good to create a jira for this so that we can track this discussion and propose patches.

-Flavio

> On 26 Feb 2015, at 13:13, Karol Dudzinski <ka...@gmail.com> wrote:
> 
> Hi Flavio,
> 
> We've done some more analysis using the snapshot formatter and a heap dump and have found the source of the snapshot bloat.
> 
> What is taking  the majority of the space is the longKeyMap from DataTree.  In the heapdump, aclKeyMap has as many entries (which is to be expected given how the maps are used) and is also taking an equally large amount of space though at least aclKeyMap isn't serialised to the snapshot.
> 
> We use a custom authentication provider but because the AuthenticationProvider.matches method does not provide the path being operated on, we end up sticking the path in the ACL id.  Some of our apps end up generating a lot of paths for one time use and consequently we end up with lots of unique ACLs.
> 
> The two ACL maps in DataTree seem to be an optimisation so that repeated usage of ACLs does not result in the full list being stored multiple times.  However, these two maps are never removed from so if an ACL is unique these maps (and the snapshot) grow forever.
> 
> We're quite keen on fixing this as it's causing us lots of issues and we're happy to provide a patch but will need your opinion on the various options:
> - create a third map which would be a reference count for the ACLs which can be updated as needed when creating, deleting or setting ACL.  When the reference count is 0, remove the entry from all the maps
> - use weak references in some shape or form though this is made harder by the fact that ACL optimisation essentially needs a bidirectional index (hence the two maps).  We've given this one lots of thought but it would really require something like a ConcurrentWeakBiHashMap which just sounds wrong and over engineered :)
> 
> The other fix that could be made is to pass the path being operated on to the AuthenticationProvider.  However, doing that in a backwards compatible fashion is not trivial and even though it would fix my problem (by allowing me to remove the path from the ACL id) it wouldn't fix the general problem with this optimisation.
> 
> Looking forward to hearing your thoughts on this.
> 
> Thanks,
> Karol
> 
>> On 22 Feb 2015, at 14:55, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
>> 
>> Hi Karol,
>> 
>> It's odd that you have such large snapshots and little data in the data tree. Are you creating lots of sessions? Right now I can't think of a good reason, I suggest you really use the snapshot formatter to inspect the snapshot. 
>> 
>> -Flavio
>> 
>>> On 22 Feb 2015, at 14:23, Karol Dudzinski <ka...@gmail.com> wrote:
>>> 
>>> Hi Flavio,
>>> 
>>> Yes, one of ours clients had a bug which caused it to go into a create/delete tight loop with zero net effect (I.e. It was deleting what it had just created). After stopping the client, the snapshot never reduced in size so are the deletes in there permanently?
>>> 
>>> Thanks,
>>> Karol
>>> 
>>> 
>>>> On 22 Feb 2015, at 14:05, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
>>>> 
>>>> Hi there,
>>>> 
>>>> Perhaps a lot of data has been deleted? In any case, you may want to use the SnapshotFormatter to check what is in the large snapshot.
>>>> 
>>>> -Flavio
>>>> 
>>>>> On 22 Feb 2015, at 10:44, Karol Dudzinski <ka...@gmail.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I was under the impression that the snapshot contained essentially an on-disk copy of all the data.  However, one of our clusters has a snapshot which is over 1GB while the mntr four letter word reports an approximate data size in the hundreds of KB and a node count in the low thousands.  So what else goes into the snapshot and how can I slim it down?
>>>>> 
>>>>> Thanks,
>>>>> Karol
>> 


Re: What goes in the snapshot?

Posted by Flavio Junqueira <fp...@yahoo.com.INVALID>.
Hi Karol,

The use of reference counters might be a good way around it. To make it backward compatible, I think we can optionally use the counters if the third map is present in the snapshot. Would it work?

I also think it would be good to create a jira for this so that we can track this discussion and propose patches.

-Flavio

> On 26 Feb 2015, at 13:13, Karol Dudzinski <ka...@gmail.com> wrote:
> 
> Hi Flavio,
> 
> We've done some more analysis using the snapshot formatter and a heap dump and have found the source of the snapshot bloat.
> 
> What is taking  the majority of the space is the longKeyMap from DataTree.  In the heapdump, aclKeyMap has as many entries (which is to be expected given how the maps are used) and is also taking an equally large amount of space though at least aclKeyMap isn't serialised to the snapshot.
> 
> We use a custom authentication provider but because the AuthenticationProvider.matches method does not provide the path being operated on, we end up sticking the path in the ACL id.  Some of our apps end up generating a lot of paths for one time use and consequently we end up with lots of unique ACLs.
> 
> The two ACL maps in DataTree seem to be an optimisation so that repeated usage of ACLs does not result in the full list being stored multiple times.  However, these two maps are never removed from so if an ACL is unique these maps (and the snapshot) grow forever.
> 
> We're quite keen on fixing this as it's causing us lots of issues and we're happy to provide a patch but will need your opinion on the various options:
> - create a third map which would be a reference count for the ACLs which can be updated as needed when creating, deleting or setting ACL.  When the reference count is 0, remove the entry from all the maps
> - use weak references in some shape or form though this is made harder by the fact that ACL optimisation essentially needs a bidirectional index (hence the two maps).  We've given this one lots of thought but it would really require something like a ConcurrentWeakBiHashMap which just sounds wrong and over engineered :)
> 
> The other fix that could be made is to pass the path being operated on to the AuthenticationProvider.  However, doing that in a backwards compatible fashion is not trivial and even though it would fix my problem (by allowing me to remove the path from the ACL id) it wouldn't fix the general problem with this optimisation.
> 
> Looking forward to hearing your thoughts on this.
> 
> Thanks,
> Karol
> 
>> On 22 Feb 2015, at 14:55, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
>> 
>> Hi Karol,
>> 
>> It's odd that you have such large snapshots and little data in the data tree. Are you creating lots of sessions? Right now I can't think of a good reason, I suggest you really use the snapshot formatter to inspect the snapshot. 
>> 
>> -Flavio
>> 
>>> On 22 Feb 2015, at 14:23, Karol Dudzinski <ka...@gmail.com> wrote:
>>> 
>>> Hi Flavio,
>>> 
>>> Yes, one of ours clients had a bug which caused it to go into a create/delete tight loop with zero net effect (I.e. It was deleting what it had just created). After stopping the client, the snapshot never reduced in size so are the deletes in there permanently?
>>> 
>>> Thanks,
>>> Karol
>>> 
>>> 
>>>> On 22 Feb 2015, at 14:05, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
>>>> 
>>>> Hi there,
>>>> 
>>>> Perhaps a lot of data has been deleted? In any case, you may want to use the SnapshotFormatter to check what is in the large snapshot.
>>>> 
>>>> -Flavio
>>>> 
>>>>> On 22 Feb 2015, at 10:44, Karol Dudzinski <ka...@gmail.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I was under the impression that the snapshot contained essentially an on-disk copy of all the data.  However, one of our clusters has a snapshot which is over 1GB while the mntr four letter word reports an approximate data size in the hundreds of KB and a node count in the low thousands.  So what else goes into the snapshot and how can I slim it down?
>>>>> 
>>>>> Thanks,
>>>>> Karol
>> 


Re: What goes in the snapshot?

Posted by Karol Dudzinski <ka...@gmail.com>.
Hi Flavio,

We've done some more analysis using the snapshot formatter and a heap dump and have found the source of the snapshot bloat.

What is taking  the majority of the space is the longKeyMap from DataTree.  In the heapdump, aclKeyMap has as many entries (which is to be expected given how the maps are used) and is also taking an equally large amount of space though at least aclKeyMap isn't serialised to the snapshot.

We use a custom authentication provider but because the AuthenticationProvider.matches method does not provide the path being operated on, we end up sticking the path in the ACL id.  Some of our apps end up generating a lot of paths for one time use and consequently we end up with lots of unique ACLs.

The two ACL maps in DataTree seem to be an optimisation so that repeated usage of ACLs does not result in the full list being stored multiple times.  However, these two maps are never removed from so if an ACL is unique these maps (and the snapshot) grow forever.

We're quite keen on fixing this as it's causing us lots of issues and we're happy to provide a patch but will need your opinion on the various options:
- create a third map which would be a reference count for the ACLs which can be updated as needed when creating, deleting or setting ACL.  When the reference count is 0, remove the entry from all the maps
- use weak references in some shape or form though this is made harder by the fact that ACL optimisation essentially needs a bidirectional index (hence the two maps).  We've given this one lots of thought but it would really require something like a ConcurrentWeakBiHashMap which just sounds wrong and over engineered :)

The other fix that could be made is to pass the path being operated on to the AuthenticationProvider.  However, doing that in a backwards compatible fashion is not trivial and even though it would fix my problem (by allowing me to remove the path from the ACL id) it wouldn't fix the general problem with this optimisation.

Looking forward to hearing your thoughts on this.

Thanks,
Karol

> On 22 Feb 2015, at 14:55, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
> 
> Hi Karol,
> 
> It's odd that you have such large snapshots and little data in the data tree. Are you creating lots of sessions? Right now I can't think of a good reason, I suggest you really use the snapshot formatter to inspect the snapshot. 
> 
> -Flavio
> 
>> On 22 Feb 2015, at 14:23, Karol Dudzinski <ka...@gmail.com> wrote:
>> 
>> Hi Flavio,
>> 
>> Yes, one of ours clients had a bug which caused it to go into a create/delete tight loop with zero net effect (I.e. It was deleting what it had just created). After stopping the client, the snapshot never reduced in size so are the deletes in there permanently?
>> 
>> Thanks,
>> Karol
>> 
>> 
>>> On 22 Feb 2015, at 14:05, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
>>> 
>>> Hi there,
>>> 
>>> Perhaps a lot of data has been deleted? In any case, you may want to use the SnapshotFormatter to check what is in the large snapshot.
>>> 
>>> -Flavio
>>> 
>>>> On 22 Feb 2015, at 10:44, Karol Dudzinski <ka...@gmail.com> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I was under the impression that the snapshot contained essentially an on-disk copy of all the data.  However, one of our clusters has a snapshot which is over 1GB while the mntr four letter word reports an approximate data size in the hundreds of KB and a node count in the low thousands.  So what else goes into the snapshot and how can I slim it down?
>>>> 
>>>> Thanks,
>>>> Karol
> 

Re: What goes in the snapshot?

Posted by Flavio Junqueira <fp...@yahoo.com.INVALID>.
Hi Karol,

It's odd that you have such large snapshots and little data in the data tree. Are you creating lots of sessions? Right now I can't think of a good reason, I suggest you really use the snapshot formatter to inspect the snapshot. 

-Flavio

> On 22 Feb 2015, at 14:23, Karol Dudzinski <ka...@gmail.com> wrote:
> 
> Hi Flavio,
> 
> Yes, one of ours clients had a bug which caused it to go into a create/delete tight loop with zero net effect (I.e. It was deleting what it had just created). After stopping the client, the snapshot never reduced in size so are the deletes in there permanently?
> 
> Thanks,
> Karol
> 
> 
>> On 22 Feb 2015, at 14:05, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
>> 
>> Hi there,
>> 
>> Perhaps a lot of data has been deleted? In any case, you may want to use the SnapshotFormatter to check what is in the large snapshot.
>> 
>> -Flavio
>> 
>>> On 22 Feb 2015, at 10:44, Karol Dudzinski <ka...@gmail.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> I was under the impression that the snapshot contained essentially an on-disk copy of all the data.  However, one of our clusters has a snapshot which is over 1GB while the mntr four letter word reports an approximate data size in the hundreds of KB and a node count in the low thousands.  So what else goes into the snapshot and how can I slim it down?
>>> 
>>> Thanks,
>>> Karol
>> 


Re: What goes in the snapshot?

Posted by Karol Dudzinski <ka...@gmail.com>.
Hi Flavio,

Yes, one of ours clients had a bug which caused it to go into a create/delete tight loop with zero net effect (I.e. It was deleting what it had just created). After stopping the client, the snapshot never reduced in size so are the deletes in there permanently?

Thanks,
Karol


> On 22 Feb 2015, at 14:05, Flavio Junqueira <fp...@yahoo.com.INVALID> wrote:
> 
> Hi there,
> 
> Perhaps a lot of data has been deleted? In any case, you may want to use the SnapshotFormatter to check what is in the large snapshot.
> 
> -Flavio
> 
>> On 22 Feb 2015, at 10:44, Karol Dudzinski <ka...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I was under the impression that the snapshot contained essentially an on-disk copy of all the data.  However, one of our clusters has a snapshot which is over 1GB while the mntr four letter word reports an approximate data size in the hundreds of KB and a node count in the low thousands.  So what else goes into the snapshot and how can I slim it down?
>> 
>> Thanks,
>> Karol
> 

Re: What goes in the snapshot?

Posted by Flavio Junqueira <fp...@yahoo.com.INVALID>.
Hi there,

Perhaps a lot of data has been deleted? In any case, you may want to use the SnapshotFormatter to check what is in the large snapshot.

-Flavio

> On 22 Feb 2015, at 10:44, Karol Dudzinski <ka...@gmail.com> wrote:
> 
> Hi all,
> 
> I was under the impression that the snapshot contained essentially an on-disk copy of all the data.  However, one of our clusters has a snapshot which is over 1GB while the mntr four letter word reports an approximate data size in the hundreds of KB and a node count in the low thousands.  So what else goes into the snapshot and how can I slim it down?
> 
> Thanks,
> Karol
> 
>