You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@geode.apache.org by Eugene Strokin <eu...@strokin.info> on 2016/04/20 01:26:01 UTC

Use case question

Hello, I'm seriously consider to use Geode as a core for distributed file
cache system. But I have a few questions.
But first, this is what needs to be done: Scalable file system with LRU
eviction policy utilizing the disc space as much as possible. The idea is
to have around 50 small Droplets from DigitalOcean, which provides 512Kb
RAM and 20Gb Storage. The client should call the cluster and get a byte
array by a key. If needed, the cluster should be expanded. The origin of
the byte arrays are files from AWS S3.
Looks like everything could be done using Geode, but:
- it looks like the compaction requires a lot of free hard drive space. All
I can allow is about 1Gb. Would this work in my case? How could it be done.
- Is the objects would be evicted automatically from overflow storage using
LRU policy?

Thanks in advance for your answers, ideas, suggestions.
Eugene

Re: Use case question

Posted by Dan Smith <ds...@pivotal.io>.
> In other words, would I have only one call to AWS S3 origin even if many
requests would come for the object at the same time?

Yes, you'll only have a single call to S3. Geode keeps track of in progress
loads and another request for the same key will wait for the in progress
load to complete.

-Dan


On Wed, Apr 20, 2016 at 11:00 AM, Eugene Strokin <eu...@strokin.info>
wrote:

> Got it, thanks a lot. I guess my set up will be as following:
> - OVERFLOW_TO_DISK
> - Small Oplogs size (about 500Mb)
> - compaction-threshold = 95% (experiment with the number)
> - try to add listener on Low Disk Size warning (if possible, if not, just
> check the disk size periodically) and find LRU objects to delete them
> synchronously (concurrency-checks = false) to keep free disk space ~2Gb.
>
> This way I'll be able to use as much disk space as possible. Worrying
> about the performance, but will see how it will go.
>
> Another quick question is: I've created a CacheLoader, which gets files
> from AWS S3, and provides them as byte[], if the Geode cluster would
> receive several requests to get the object from different nodes, would the
> CacheLoader lock the responses, download the file from S3, and distribute
> it to all clients? In other words, would I have only one call to AWS S3
> origin even if many requests would come for the object at the same time?
>
> Thanks a lot,
> Eugene
>
> On Wed, Apr 20, 2016 at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:
>
>> > My cache could grow infinitely, so I need some mechanism to evict the
>> objects from overflow space as well, not just from memory.
>>
>> Unfortunately, I don't think there is a built in way to evict entries
>> when you disk space is starting to get low. For your eviction action, you
>> basically have a choice of whether to evict an entry from memory
>> (OVERFLOW_TO_DISK) or completely destroy the entry "LOCAL_DESTROY".
>>
>> I suppose you have your own thread that is watching the disk space and
>> starts issuing destroys if the disk space gets low.
>>
>> -Dan
>>
>> On Wed, Apr 20, 2016 at 10:12 AM, Darrel Schneider <dschneider@pivotal.io
>> > wrote:
>>
>>> Something to keep in mind is that when you have an LRU whose eviction
>>> action is overflow to disk then each eviction does not do a delete. After
>>> an overflow to disk the region entry and its key are still the jvm
>>> consuming memory; only the entry value overflowed to disk.
>>>
>>> When you say "I need some mechanism to evict the objects from overflow
>>> space as well" are you saying that you no longer want that object in your
>>> region at all? The way to do that is to do an entry delete operation on the
>>> region. That will mark the value that overflowed to disk as being deleted
>>> and the entry and key will be removed from memory. (Actually if
>>> concurrency-checks=true on your region then the delete operation does not
>>> immediately remove from entry and key. Instead it changes the value of the
>>> region to a special value we call a TOMBSTONE. Eventually a background
>>> process will remove the entry and key of these tombstones).
>>>
>>>
>>> On Wed, Apr 20, 2016 at 6:40 AM, Eugene Strokin <eu...@strokin.info>
>>> wrote:
>>>
>>>> Udo, thanks for the link. But my concern was not about the memory but
>>>> disk space. My cache could grow infinitely, so I need some mechanism to
>>>> evict the objects from overflow space as well, not just from memory.
>>>> I couldn't fins any pointers that Geode could do this out of the box,
>>>> or even the way to implement this myself.
>>>> If you do know something about this, please let me know. Looks like
>>>> Geode could do everything what I need but this one thing.
>>>>
>>>> Thanks,
>>>> Eugene
>>>>
>>>> On Tue, Apr 19, 2016 at 9:47 PM, Udo Kohlmeyer <uk...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Hi there Eugene,
>>>>>
>>>>> Please look at
>>>>> http://geode.docs.pivotal.io/docs/reference/topics/cache_xml.html#lru-memory-size
>>>>> .
>>>>> When configuring this eviction policy, you should be able to specify
>>>>> the amount of memory that this region holds in memory before it overflows
>>>>> the value.
>>>>>
>>>>> I am at this stage uncertain if this policy only takes the size of the
>>>>> value into account, or if this value would be inclusive of the key as well.
>>>>> If so, this setting might cause the region to keep fewer and fewer values
>>>>> in-memory, as the number of entries in the region increase.
>>>>>
>>>>> --Udo
>>>>>
>>>>> On 20/04/2016 11:39 am, Eugene Strokin wrote:
>>>>>
>>>>> Dan, thanks for the response. Yes you right, 512 Mb of course. My
>>>>> mistake.
>>>>> The idea is to use as much disk space as possible. I understand the
>>>>> downside of using high compaction threshold. I'll play with that, and see
>>>>> how bad it could be.
>>>>> But what about eviction? Would Geode remove objects from the overflow
>>>>> automatically once it would reach a certain size?
>>>>> Ideally, I'd like to set the Geode to start kicking LRU objects out
>>>>> once the free disk space would reach 1Gb. Is it possible? If so, please
>>>>> point me to the right direction.
>>>>>
>>>>> Thanks again,
>>>>> Eugene
>>>>>
>>>>>
>>>>> On Tue, Apr 19, 2016 at 8:25 PM, Dan Smith <ds...@pivotal.io> wrote:
>>>>>
>>>>>> I'm guessing you mean 512MB of RAM, not KB? Otherwise, you are
>>>>>> definitely going to have problems :)
>>>>>>
>>>>>> Regarding conserving disk space - I think only allowing for 1 GB free
>>>>>> space is probably going to run into issues. I think you would be better off
>>>>>> having fewer droplets with more space if that's possible. And only leaving
>>>>>> 5% disk space for compaction and as a buffer to avoid running out of disk
>>>>>> is probably not enough.
>>>>>>
>>>>>> By default, geode will compact oplogs when they get to be 50%
>>>>>> garbage, which means needing maybe 2X the amount of actual disk space. You
>>>>>> can configure the compaction-threshold to something like 95%, but that
>>>>>> means geode will be doing a lot of extra work clean up garbage on disk.
>>>>>> Regardless, you'll probably want to tune down the max-oplog-size to
>>>>>> something much smaller than 1GB.
>>>>>>
>>>>>> -Dan
>>>>>>
>>>>>> On Tue, Apr 19, 2016 at 4:26 PM, Eugene Strokin <
>>>>>> <eu...@strokin.info> wrote:
>>>>>>
>>>>>>> Hello, I'm seriously consider to use Geode as a core for distributed
>>>>>>> file cache system. But I have a few questions.
>>>>>>> But first, this is what needs to be done: Scalable file system with
>>>>>>> LRU eviction policy utilizing the disc space as much as possible. The idea
>>>>>>> is to have around 50 small Droplets from DigitalOcean, which provides 512Kb
>>>>>>> RAM and 20Gb Storage. The client should call the cluster and get a byte
>>>>>>> array by a key. If needed, the cluster should be expanded. The origin of
>>>>>>> the byte arrays are files from AWS S3.
>>>>>>> Looks like everything could be done using Geode, but:
>>>>>>> - it looks like the compaction requires a lot of free hard drive
>>>>>>> space. All I can allow is about 1Gb. Would this work in my case? How could
>>>>>>> it be done.
>>>>>>> - Is the objects would be evicted automatically from overflow
>>>>>>> storage using LRU policy?
>>>>>>>
>>>>>>> Thanks in advance for your answers, ideas, suggestions.
>>>>>>> Eugene
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Use case question

Posted by Eugene Strokin <eu...@strokin.info>.
Got it, thanks a lot. I guess my set up will be as following:
- OVERFLOW_TO_DISK
- Small Oplogs size (about 500Mb)
- compaction-threshold = 95% (experiment with the number)
- try to add listener on Low Disk Size warning (if possible, if not, just
check the disk size periodically) and find LRU objects to delete them
synchronously (concurrency-checks = false) to keep free disk space ~2Gb.

This way I'll be able to use as much disk space as possible. Worrying about
the performance, but will see how it will go.

Another quick question is: I've created a CacheLoader, which gets files
from AWS S3, and provides them as byte[], if the Geode cluster would
receive several requests to get the object from different nodes, would the
CacheLoader lock the responses, download the file from S3, and distribute
it to all clients? In other words, would I have only one call to AWS S3
origin even if many requests would come for the object at the same time?

Thanks a lot,
Eugene

On Wed, Apr 20, 2016 at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:

> > My cache could grow infinitely, so I need some mechanism to evict the
> objects from overflow space as well, not just from memory.
>
> Unfortunately, I don't think there is a built in way to evict entries when
> you disk space is starting to get low. For your eviction action, you
> basically have a choice of whether to evict an entry from memory
> (OVERFLOW_TO_DISK) or completely destroy the entry "LOCAL_DESTROY".
>
> I suppose you have your own thread that is watching the disk space and
> starts issuing destroys if the disk space gets low.
>
> -Dan
>
> On Wed, Apr 20, 2016 at 10:12 AM, Darrel Schneider <ds...@pivotal.io>
> wrote:
>
>> Something to keep in mind is that when you have an LRU whose eviction
>> action is overflow to disk then each eviction does not do a delete. After
>> an overflow to disk the region entry and its key are still the jvm
>> consuming memory; only the entry value overflowed to disk.
>>
>> When you say "I need some mechanism to evict the objects from overflow
>> space as well" are you saying that you no longer want that object in your
>> region at all? The way to do that is to do an entry delete operation on the
>> region. That will mark the value that overflowed to disk as being deleted
>> and the entry and key will be removed from memory. (Actually if
>> concurrency-checks=true on your region then the delete operation does not
>> immediately remove from entry and key. Instead it changes the value of the
>> region to a special value we call a TOMBSTONE. Eventually a background
>> process will remove the entry and key of these tombstones).
>>
>>
>> On Wed, Apr 20, 2016 at 6:40 AM, Eugene Strokin <eu...@strokin.info>
>> wrote:
>>
>>> Udo, thanks for the link. But my concern was not about the memory but
>>> disk space. My cache could grow infinitely, so I need some mechanism to
>>> evict the objects from overflow space as well, not just from memory.
>>> I couldn't fins any pointers that Geode could do this out of the box, or
>>> even the way to implement this myself.
>>> If you do know something about this, please let me know. Looks like
>>> Geode could do everything what I need but this one thing.
>>>
>>> Thanks,
>>> Eugene
>>>
>>> On Tue, Apr 19, 2016 at 9:47 PM, Udo Kohlmeyer <uk...@pivotal.io>
>>> wrote:
>>>
>>>> Hi there Eugene,
>>>>
>>>> Please look at
>>>> http://geode.docs.pivotal.io/docs/reference/topics/cache_xml.html#lru-memory-size
>>>> .
>>>> When configuring this eviction policy, you should be able to specify
>>>> the amount of memory that this region holds in memory before it overflows
>>>> the value.
>>>>
>>>> I am at this stage uncertain if this policy only takes the size of the
>>>> value into account, or if this value would be inclusive of the key as well.
>>>> If so, this setting might cause the region to keep fewer and fewer values
>>>> in-memory, as the number of entries in the region increase.
>>>>
>>>> --Udo
>>>>
>>>> On 20/04/2016 11:39 am, Eugene Strokin wrote:
>>>>
>>>> Dan, thanks for the response. Yes you right, 512 Mb of course. My
>>>> mistake.
>>>> The idea is to use as much disk space as possible. I understand the
>>>> downside of using high compaction threshold. I'll play with that, and see
>>>> how bad it could be.
>>>> But what about eviction? Would Geode remove objects from the overflow
>>>> automatically once it would reach a certain size?
>>>> Ideally, I'd like to set the Geode to start kicking LRU objects out
>>>> once the free disk space would reach 1Gb. Is it possible? If so, please
>>>> point me to the right direction.
>>>>
>>>> Thanks again,
>>>> Eugene
>>>>
>>>>
>>>> On Tue, Apr 19, 2016 at 8:25 PM, Dan Smith <ds...@pivotal.io> wrote:
>>>>
>>>>> I'm guessing you mean 512MB of RAM, not KB? Otherwise, you are
>>>>> definitely going to have problems :)
>>>>>
>>>>> Regarding conserving disk space - I think only allowing for 1 GB free
>>>>> space is probably going to run into issues. I think you would be better off
>>>>> having fewer droplets with more space if that's possible. And only leaving
>>>>> 5% disk space for compaction and as a buffer to avoid running out of disk
>>>>> is probably not enough.
>>>>>
>>>>> By default, geode will compact oplogs when they get to be 50% garbage,
>>>>> which means needing maybe 2X the amount of actual disk space. You can
>>>>> configure the compaction-threshold to something like 95%, but that means
>>>>> geode will be doing a lot of extra work clean up garbage on disk.
>>>>> Regardless, you'll probably want to tune down the max-oplog-size to
>>>>> something much smaller than 1GB.
>>>>>
>>>>> -Dan
>>>>>
>>>>> On Tue, Apr 19, 2016 at 4:26 PM, Eugene Strokin <
>>>>> <eu...@strokin.info> wrote:
>>>>>
>>>>>> Hello, I'm seriously consider to use Geode as a core for distributed
>>>>>> file cache system. But I have a few questions.
>>>>>> But first, this is what needs to be done: Scalable file system with
>>>>>> LRU eviction policy utilizing the disc space as much as possible. The idea
>>>>>> is to have around 50 small Droplets from DigitalOcean, which provides 512Kb
>>>>>> RAM and 20Gb Storage. The client should call the cluster and get a byte
>>>>>> array by a key. If needed, the cluster should be expanded. The origin of
>>>>>> the byte arrays are files from AWS S3.
>>>>>> Looks like everything could be done using Geode, but:
>>>>>> - it looks like the compaction requires a lot of free hard drive
>>>>>> space. All I can allow is about 1Gb. Would this work in my case? How could
>>>>>> it be done.
>>>>>> - Is the objects would be evicted automatically from overflow storage
>>>>>> using LRU policy?
>>>>>>
>>>>>> Thanks in advance for your answers, ideas, suggestions.
>>>>>> Eugene
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Use case question

Posted by Dan Smith <ds...@pivotal.io>.
> My cache could grow infinitely, so I need some mechanism to evict the
objects from overflow space as well, not just from memory.

Unfortunately, I don't think there is a built in way to evict entries when
you disk space is starting to get low. For your eviction action, you
basically have a choice of whether to evict an entry from memory
(OVERFLOW_TO_DISK) or completely destroy the entry "LOCAL_DESTROY".

I suppose you have your own thread that is watching the disk space and
starts issuing destroys if the disk space gets low.

-Dan

On Wed, Apr 20, 2016 at 10:12 AM, Darrel Schneider <ds...@pivotal.io>
wrote:

> Something to keep in mind is that when you have an LRU whose eviction
> action is overflow to disk then each eviction does not do a delete. After
> an overflow to disk the region entry and its key are still the jvm
> consuming memory; only the entry value overflowed to disk.
>
> When you say "I need some mechanism to evict the objects from overflow
> space as well" are you saying that you no longer want that object in your
> region at all? The way to do that is to do an entry delete operation on the
> region. That will mark the value that overflowed to disk as being deleted
> and the entry and key will be removed from memory. (Actually if
> concurrency-checks=true on your region then the delete operation does not
> immediately remove from entry and key. Instead it changes the value of the
> region to a special value we call a TOMBSTONE. Eventually a background
> process will remove the entry and key of these tombstones).
>
>
> On Wed, Apr 20, 2016 at 6:40 AM, Eugene Strokin <eu...@strokin.info>
> wrote:
>
>> Udo, thanks for the link. But my concern was not about the memory but
>> disk space. My cache could grow infinitely, so I need some mechanism to
>> evict the objects from overflow space as well, not just from memory.
>> I couldn't fins any pointers that Geode could do this out of the box, or
>> even the way to implement this myself.
>> If you do know something about this, please let me know. Looks like Geode
>> could do everything what I need but this one thing.
>>
>> Thanks,
>> Eugene
>>
>> On Tue, Apr 19, 2016 at 9:47 PM, Udo Kohlmeyer <uk...@pivotal.io>
>> wrote:
>>
>>> Hi there Eugene,
>>>
>>> Please look at
>>> http://geode.docs.pivotal.io/docs/reference/topics/cache_xml.html#lru-memory-size
>>> .
>>> When configuring this eviction policy, you should be able to specify the
>>> amount of memory that this region holds in memory before it overflows the
>>> value.
>>>
>>> I am at this stage uncertain if this policy only takes the size of the
>>> value into account, or if this value would be inclusive of the key as well.
>>> If so, this setting might cause the region to keep fewer and fewer values
>>> in-memory, as the number of entries in the region increase.
>>>
>>> --Udo
>>>
>>> On 20/04/2016 11:39 am, Eugene Strokin wrote:
>>>
>>> Dan, thanks for the response. Yes you right, 512 Mb of course. My
>>> mistake.
>>> The idea is to use as much disk space as possible. I understand the
>>> downside of using high compaction threshold. I'll play with that, and see
>>> how bad it could be.
>>> But what about eviction? Would Geode remove objects from the overflow
>>> automatically once it would reach a certain size?
>>> Ideally, I'd like to set the Geode to start kicking LRU objects out once
>>> the free disk space would reach 1Gb. Is it possible? If so, please point me
>>> to the right direction.
>>>
>>> Thanks again,
>>> Eugene
>>>
>>>
>>> On Tue, Apr 19, 2016 at 8:25 PM, Dan Smith <ds...@pivotal.io> wrote:
>>>
>>>> I'm guessing you mean 512MB of RAM, not KB? Otherwise, you are
>>>> definitely going to have problems :)
>>>>
>>>> Regarding conserving disk space - I think only allowing for 1 GB free
>>>> space is probably going to run into issues. I think you would be better off
>>>> having fewer droplets with more space if that's possible. And only leaving
>>>> 5% disk space for compaction and as a buffer to avoid running out of disk
>>>> is probably not enough.
>>>>
>>>> By default, geode will compact oplogs when they get to be 50% garbage,
>>>> which means needing maybe 2X the amount of actual disk space. You can
>>>> configure the compaction-threshold to something like 95%, but that means
>>>> geode will be doing a lot of extra work clean up garbage on disk.
>>>> Regardless, you'll probably want to tune down the max-oplog-size to
>>>> something much smaller than 1GB.
>>>>
>>>> -Dan
>>>>
>>>> On Tue, Apr 19, 2016 at 4:26 PM, Eugene Strokin < <eu...@strokin.info>
>>>> eugene@strokin.info> wrote:
>>>>
>>>>> Hello, I'm seriously consider to use Geode as a core for distributed
>>>>> file cache system. But I have a few questions.
>>>>> But first, this is what needs to be done: Scalable file system with
>>>>> LRU eviction policy utilizing the disc space as much as possible. The idea
>>>>> is to have around 50 small Droplets from DigitalOcean, which provides 512Kb
>>>>> RAM and 20Gb Storage. The client should call the cluster and get a byte
>>>>> array by a key. If needed, the cluster should be expanded. The origin of
>>>>> the byte arrays are files from AWS S3.
>>>>> Looks like everything could be done using Geode, but:
>>>>> - it looks like the compaction requires a lot of free hard drive
>>>>> space. All I can allow is about 1Gb. Would this work in my case? How could
>>>>> it be done.
>>>>> - Is the objects would be evicted automatically from overflow storage
>>>>> using LRU policy?
>>>>>
>>>>> Thanks in advance for your answers, ideas, suggestions.
>>>>> Eugene
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: Use case question

Posted by Darrel Schneider <ds...@pivotal.io>.
Something to keep in mind is that when you have an LRU whose eviction
action is overflow to disk then each eviction does not do a delete. After
an overflow to disk the region entry and its key are still the jvm
consuming memory; only the entry value overflowed to disk.

When you say "I need some mechanism to evict the objects from overflow
space as well" are you saying that you no longer want that object in your
region at all? The way to do that is to do an entry delete operation on the
region. That will mark the value that overflowed to disk as being deleted
and the entry and key will be removed from memory. (Actually if
concurrency-checks=true on your region then the delete operation does not
immediately remove from entry and key. Instead it changes the value of the
region to a special value we call a TOMBSTONE. Eventually a background
process will remove the entry and key of these tombstones).


On Wed, Apr 20, 2016 at 6:40 AM, Eugene Strokin <eu...@strokin.info> wrote:

> Udo, thanks for the link. But my concern was not about the memory but disk
> space. My cache could grow infinitely, so I need some mechanism to evict
> the objects from overflow space as well, not just from memory.
> I couldn't fins any pointers that Geode could do this out of the box, or
> even the way to implement this myself.
> If you do know something about this, please let me know. Looks like Geode
> could do everything what I need but this one thing.
>
> Thanks,
> Eugene
>
> On Tue, Apr 19, 2016 at 9:47 PM, Udo Kohlmeyer <uk...@pivotal.io>
> wrote:
>
>> Hi there Eugene,
>>
>> Please look at
>> http://geode.docs.pivotal.io/docs/reference/topics/cache_xml.html#lru-memory-size
>> .
>> When configuring this eviction policy, you should be able to specify the
>> amount of memory that this region holds in memory before it overflows the
>> value.
>>
>> I am at this stage uncertain if this policy only takes the size of the
>> value into account, or if this value would be inclusive of the key as well.
>> If so, this setting might cause the region to keep fewer and fewer values
>> in-memory, as the number of entries in the region increase.
>>
>> --Udo
>>
>> On 20/04/2016 11:39 am, Eugene Strokin wrote:
>>
>> Dan, thanks for the response. Yes you right, 512 Mb of course. My
>> mistake.
>> The idea is to use as much disk space as possible. I understand the
>> downside of using high compaction threshold. I'll play with that, and see
>> how bad it could be.
>> But what about eviction? Would Geode remove objects from the overflow
>> automatically once it would reach a certain size?
>> Ideally, I'd like to set the Geode to start kicking LRU objects out once
>> the free disk space would reach 1Gb. Is it possible? If so, please point me
>> to the right direction.
>>
>> Thanks again,
>> Eugene
>>
>>
>> On Tue, Apr 19, 2016 at 8:25 PM, Dan Smith <ds...@pivotal.io> wrote:
>>
>>> I'm guessing you mean 512MB of RAM, not KB? Otherwise, you are
>>> definitely going to have problems :)
>>>
>>> Regarding conserving disk space - I think only allowing for 1 GB free
>>> space is probably going to run into issues. I think you would be better off
>>> having fewer droplets with more space if that's possible. And only leaving
>>> 5% disk space for compaction and as a buffer to avoid running out of disk
>>> is probably not enough.
>>>
>>> By default, geode will compact oplogs when they get to be 50% garbage,
>>> which means needing maybe 2X the amount of actual disk space. You can
>>> configure the compaction-threshold to something like 95%, but that means
>>> geode will be doing a lot of extra work clean up garbage on disk.
>>> Regardless, you'll probably want to tune down the max-oplog-size to
>>> something much smaller than 1GB.
>>>
>>> -Dan
>>>
>>> On Tue, Apr 19, 2016 at 4:26 PM, Eugene Strokin < <eu...@strokin.info>
>>> eugene@strokin.info> wrote:
>>>
>>>> Hello, I'm seriously consider to use Geode as a core for distributed
>>>> file cache system. But I have a few questions.
>>>> But first, this is what needs to be done: Scalable file system with LRU
>>>> eviction policy utilizing the disc space as much as possible. The idea is
>>>> to have around 50 small Droplets from DigitalOcean, which provides 512Kb
>>>> RAM and 20Gb Storage. The client should call the cluster and get a byte
>>>> array by a key. If needed, the cluster should be expanded. The origin of
>>>> the byte arrays are files from AWS S3.
>>>> Looks like everything could be done using Geode, but:
>>>> - it looks like the compaction requires a lot of free hard drive space.
>>>> All I can allow is about 1Gb. Would this work in my case? How could it be
>>>> done.
>>>> - Is the objects would be evicted automatically from overflow storage
>>>> using LRU policy?
>>>>
>>>> Thanks in advance for your answers, ideas, suggestions.
>>>> Eugene
>>>>
>>>
>>>
>>
>>
>

Re: Use case question

Posted by Anthony Baker <ab...@pivotal.io>.
And it’s worth pointing out that Geode’s persistent store is an append-only store, optimized for sequential read/write operations.  This approach prefers write speed at the cost of additional space on disk.

Anthony

> On Apr 20, 2016, at 7:25 AM, Michael Stolz <ms...@pivotal.io> wrote:
> 
> What will happen is that the entries will be deleted from memory and the delete operation will be put into the oplog on disk. The oplog will get rolled when it reaches 50% garbage. The act of rolling the oplog really means creating a new oplog file, then copying the entries from the old oplog file to a new oplog file and in the process vacuuming out the deleted entries. So in effect, it is doing what you want, but in batch at the point of rolling the oplog files.
> 
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: 631-835-4771
> 
> On Wed, Apr 20, 2016 at 9:40 AM, Eugene Strokin <eugene@strokin.info <ma...@strokin.info>> wrote:
> Udo, thanks for the link. But my concern was not about the memory but disk space. My cache could grow infinitely, so I need some mechanism to evict the objects from overflow space as well, not just from memory.
> I couldn't fins any pointers that Geode could do this out of the box, or even the way to implement this myself.
> If you do know something about this, please let me know. Looks like Geode could do everything what I need but this one thing.
> 
> Thanks,
> Eugene
> 
> On Tue, Apr 19, 2016 at 9:47 PM, Udo Kohlmeyer <ukohlmeyer@pivotal.io <ma...@pivotal.io>> wrote:
> Hi there Eugene,
> 
> Please look at http://geode.docs.pivotal.io/docs/reference/topics/cache_xml.html#lru-memory-size <http://geode.docs.pivotal.io/docs/reference/topics/cache_xml.html#lru-memory-size>.
> When configuring this eviction policy, you should be able to specify the amount of memory that this region holds in memory before it overflows the value.
> 
> I am at this stage uncertain if this policy only takes the size of the value into account, or if this value would be inclusive of the key as well. If so, this setting might cause the region to keep fewer and fewer values in-memory, as the number of entries in the region increase.
> 
> --Udo
> 
> On 20/04/2016 11:39 am, Eugene Strokin wrote:
>> Dan, thanks for the response. Yes you right, 512 Mb of course. My mistake.
>> The idea is to use as much disk space as possible. I understand the downside of using high compaction threshold. I'll play with that, and see how bad it could be.
>> But what about eviction? Would Geode remove objects from the overflow automatically once it would reach a certain size?
>> Ideally, I'd like to set the Geode to start kicking LRU objects out once the free disk space would reach 1Gb. Is it possible? If so, please point me to the right direction.
>> 
>> Thanks again,
>> Eugene
>> 
>> 
>> On Tue, Apr 19, 2016 at 8:25 PM, Dan Smith <dsmith@pivotal.io <ma...@pivotal.io>> wrote:
>> I'm guessing you mean 512MB of RAM, not KB? Otherwise, you are definitely going to have problems :)
>> 
>> Regarding conserving disk space - I think only allowing for 1 GB free space is probably going to run into issues. I think you would be better off having fewer droplets with more space if that's possible. And only leaving 5% disk space for compaction and as a buffer to avoid running out of disk is probably not enough.
>> 
>> By default, geode will compact oplogs when they get to be 50% garbage, which means needing maybe 2X the amount of actual disk space. You can configure the compaction-threshold to something like 95%, but that means geode will be doing a lot of extra work clean up garbage on disk. Regardless, you'll probably want to tune down the max-oplog-size to something much smaller than 1GB.
>> 
>> -Dan
>> 
>> On Tue, Apr 19, 2016 at 4:26 PM, Eugene Strokin < <ma...@strokin.info>eugene@strokin.info <ma...@strokin.info>> wrote:
>> Hello, I'm seriously consider to use Geode as a core for distributed file cache system. But I have a few questions.
>> But first, this is what needs to be done: Scalable file system with LRU eviction policy utilizing the disc space as much as possible. The idea is to have around 50 small Droplets from DigitalOcean, which provides 512Kb RAM and 20Gb Storage. The client should call the cluster and get a byte array by a key. If needed, the cluster should be expanded. The origin of the byte arrays are files from AWS S3.
>> Looks like everything could be done using Geode, but:
>> - it looks like the compaction requires a lot of free hard drive space. All I can allow is about 1Gb. Would this work in my case? How could it be done.
>> - Is the objects would be evicted automatically from overflow storage using LRU policy?
>> 
>> Thanks in advance for your answers, ideas, suggestions.
>> Eugene
>> 
>> 
> 
> 
> 


Re: Use case question

Posted by Michael Stolz <ms...@pivotal.io>.
What will happen is that the entries will be deleted from memory and the
delete operation will be put into the oplog on disk. The oplog will get
rolled when it reaches 50% garbage. The act of rolling the oplog really
means creating a new oplog file, then copying the entries from the old
oplog file to a new oplog file and in the process vacuuming out the deleted
entries. So in effect, it is doing what you want, but in batch at the point
of rolling the oplog files.

--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: 631-835-4771

On Wed, Apr 20, 2016 at 9:40 AM, Eugene Strokin <eu...@strokin.info> wrote:

> Udo, thanks for the link. But my concern was not about the memory but disk
> space. My cache could grow infinitely, so I need some mechanism to evict
> the objects from overflow space as well, not just from memory.
> I couldn't fins any pointers that Geode could do this out of the box, or
> even the way to implement this myself.
> If you do know something about this, please let me know. Looks like Geode
> could do everything what I need but this one thing.
>
> Thanks,
> Eugene
>
> On Tue, Apr 19, 2016 at 9:47 PM, Udo Kohlmeyer <uk...@pivotal.io>
> wrote:
>
>> Hi there Eugene,
>>
>> Please look at
>> http://geode.docs.pivotal.io/docs/reference/topics/cache_xml.html#lru-memory-size
>> .
>> When configuring this eviction policy, you should be able to specify the
>> amount of memory that this region holds in memory before it overflows the
>> value.
>>
>> I am at this stage uncertain if this policy only takes the size of the
>> value into account, or if this value would be inclusive of the key as well.
>> If so, this setting might cause the region to keep fewer and fewer values
>> in-memory, as the number of entries in the region increase.
>>
>> --Udo
>>
>> On 20/04/2016 11:39 am, Eugene Strokin wrote:
>>
>> Dan, thanks for the response. Yes you right, 512 Mb of course. My
>> mistake.
>> The idea is to use as much disk space as possible. I understand the
>> downside of using high compaction threshold. I'll play with that, and see
>> how bad it could be.
>> But what about eviction? Would Geode remove objects from the overflow
>> automatically once it would reach a certain size?
>> Ideally, I'd like to set the Geode to start kicking LRU objects out once
>> the free disk space would reach 1Gb. Is it possible? If so, please point me
>> to the right direction.
>>
>> Thanks again,
>> Eugene
>>
>>
>> On Tue, Apr 19, 2016 at 8:25 PM, Dan Smith <ds...@pivotal.io> wrote:
>>
>>> I'm guessing you mean 512MB of RAM, not KB? Otherwise, you are
>>> definitely going to have problems :)
>>>
>>> Regarding conserving disk space - I think only allowing for 1 GB free
>>> space is probably going to run into issues. I think you would be better off
>>> having fewer droplets with more space if that's possible. And only leaving
>>> 5% disk space for compaction and as a buffer to avoid running out of disk
>>> is probably not enough.
>>>
>>> By default, geode will compact oplogs when they get to be 50% garbage,
>>> which means needing maybe 2X the amount of actual disk space. You can
>>> configure the compaction-threshold to something like 95%, but that means
>>> geode will be doing a lot of extra work clean up garbage on disk.
>>> Regardless, you'll probably want to tune down the max-oplog-size to
>>> something much smaller than 1GB.
>>>
>>> -Dan
>>>
>>> On Tue, Apr 19, 2016 at 4:26 PM, Eugene Strokin < <eu...@strokin.info>
>>> eugene@strokin.info> wrote:
>>>
>>>> Hello, I'm seriously consider to use Geode as a core for distributed
>>>> file cache system. But I have a few questions.
>>>> But first, this is what needs to be done: Scalable file system with LRU
>>>> eviction policy utilizing the disc space as much as possible. The idea is
>>>> to have around 50 small Droplets from DigitalOcean, which provides 512Kb
>>>> RAM and 20Gb Storage. The client should call the cluster and get a byte
>>>> array by a key. If needed, the cluster should be expanded. The origin of
>>>> the byte arrays are files from AWS S3.
>>>> Looks like everything could be done using Geode, but:
>>>> - it looks like the compaction requires a lot of free hard drive space.
>>>> All I can allow is about 1Gb. Would this work in my case? How could it be
>>>> done.
>>>> - Is the objects would be evicted automatically from overflow storage
>>>> using LRU policy?
>>>>
>>>> Thanks in advance for your answers, ideas, suggestions.
>>>> Eugene
>>>>
>>>
>>>
>>
>>
>

Re: Use case question

Posted by Eugene Strokin <eu...@strokin.info>.
Udo, thanks for the link. But my concern was not about the memory but disk
space. My cache could grow infinitely, so I need some mechanism to evict
the objects from overflow space as well, not just from memory.
I couldn't fins any pointers that Geode could do this out of the box, or
even the way to implement this myself.
If you do know something about this, please let me know. Looks like Geode
could do everything what I need but this one thing.

Thanks,
Eugene

On Tue, Apr 19, 2016 at 9:47 PM, Udo Kohlmeyer <uk...@pivotal.io>
wrote:

> Hi there Eugene,
>
> Please look at
> http://geode.docs.pivotal.io/docs/reference/topics/cache_xml.html#lru-memory-size
> .
> When configuring this eviction policy, you should be able to specify the
> amount of memory that this region holds in memory before it overflows the
> value.
>
> I am at this stage uncertain if this policy only takes the size of the
> value into account, or if this value would be inclusive of the key as well.
> If so, this setting might cause the region to keep fewer and fewer values
> in-memory, as the number of entries in the region increase.
>
> --Udo
>
> On 20/04/2016 11:39 am, Eugene Strokin wrote:
>
> Dan, thanks for the response. Yes you right, 512 Mb of course. My mistake.
> The idea is to use as much disk space as possible. I understand the
> downside of using high compaction threshold. I'll play with that, and see
> how bad it could be.
> But what about eviction? Would Geode remove objects from the overflow
> automatically once it would reach a certain size?
> Ideally, I'd like to set the Geode to start kicking LRU objects out once
> the free disk space would reach 1Gb. Is it possible? If so, please point me
> to the right direction.
>
> Thanks again,
> Eugene
>
>
> On Tue, Apr 19, 2016 at 8:25 PM, Dan Smith <ds...@pivotal.io> wrote:
>
>> I'm guessing you mean 512MB of RAM, not KB? Otherwise, you are definitely
>> going to have problems :)
>>
>> Regarding conserving disk space - I think only allowing for 1 GB free
>> space is probably going to run into issues. I think you would be better off
>> having fewer droplets with more space if that's possible. And only leaving
>> 5% disk space for compaction and as a buffer to avoid running out of disk
>> is probably not enough.
>>
>> By default, geode will compact oplogs when they get to be 50% garbage,
>> which means needing maybe 2X the amount of actual disk space. You can
>> configure the compaction-threshold to something like 95%, but that means
>> geode will be doing a lot of extra work clean up garbage on disk.
>> Regardless, you'll probably want to tune down the max-oplog-size to
>> something much smaller than 1GB.
>>
>> -Dan
>>
>> On Tue, Apr 19, 2016 at 4:26 PM, Eugene Strokin < <eu...@strokin.info>
>> eugene@strokin.info> wrote:
>>
>>> Hello, I'm seriously consider to use Geode as a core for distributed
>>> file cache system. But I have a few questions.
>>> But first, this is what needs to be done: Scalable file system with LRU
>>> eviction policy utilizing the disc space as much as possible. The idea is
>>> to have around 50 small Droplets from DigitalOcean, which provides 512Kb
>>> RAM and 20Gb Storage. The client should call the cluster and get a byte
>>> array by a key. If needed, the cluster should be expanded. The origin of
>>> the byte arrays are files from AWS S3.
>>> Looks like everything could be done using Geode, but:
>>> - it looks like the compaction requires a lot of free hard drive space.
>>> All I can allow is about 1Gb. Would this work in my case? How could it be
>>> done.
>>> - Is the objects would be evicted automatically from overflow storage
>>> using LRU policy?
>>>
>>> Thanks in advance for your answers, ideas, suggestions.
>>> Eugene
>>>
>>
>>
>
>

Re: Use case question

Posted by Udo Kohlmeyer <uk...@pivotal.io>.
Hi there Eugene,

Please look at 
http://geode.docs.pivotal.io/docs/reference/topics/cache_xml.html#lru-memory-size.
When configuring this eviction policy, you should be able to specify the 
amount of memory that this region holds in memory before it overflows 
the value.

I am at this stage uncertain if this policy only takes the size of the 
value into account, or if this value would be inclusive of the key as 
well. If so, this setting might cause the region to keep fewer and fewer 
values in-memory, as the number of entries in the region increase.

--Udo

On 20/04/2016 11:39 am, Eugene Strokin wrote:
> Dan, thanks for the response. Yes you right, 512 Mb of course. My 
> mistake.
> The idea is to use as much disk space as possible. I understand the 
> downside of using high compaction threshold. I'll play with that, and 
> see how bad it could be.
> But what about eviction? Would Geode remove objects from the overflow 
> automatically once it would reach a certain size?
> Ideally, I'd like to set the Geode to start kicking LRU objects out 
> once the free disk space would reach 1Gb. Is it possible? If so, 
> please point me to the right direction.
>
> Thanks again,
> Eugene
>
> On Tue, Apr 19, 2016 at 8:25 PM, Dan Smith <dsmith@pivotal.io 
> <ma...@pivotal.io>> wrote:
>
>     I'm guessing you mean 512MB of RAM, not KB? Otherwise, you are
>     definitely going to have problems :)
>
>     Regarding conserving disk space - I think only allowing for 1 GB
>     free space is probably going to run into issues. I think you would
>     be better off having fewer droplets with more space if that's
>     possible. And only leaving 5% disk space for compaction and as a
>     buffer to avoid running out of disk is probably not enough.
>
>     By default, geode will compact oplogs when they get to be 50%
>     garbage, which means needing maybe 2X the amount of actual disk
>     space. You can configure the compaction-threshold to something
>     like 95%, but that means geode will be doing a lot of extra work
>     clean up garbage on disk. Regardless, you'll probably want to tune
>     down the max-oplog-size to something much smaller than 1GB.
>
>     -Dan
>
>     On Tue, Apr 19, 2016 at 4:26 PM, Eugene Strokin
>     <eugene@strokin.info <ma...@strokin.info>> wrote:
>
>         Hello, I'm seriously consider to use Geode as a core for
>         distributed file cache system. But I have a few questions.
>         But first, this is what needs to be done: Scalable file system
>         with LRU eviction policy utilizing the disc space as much as
>         possible. The idea is to have around 50 small Droplets from
>         DigitalOcean, which provides 512Kb RAM and 20Gb Storage. The
>         client should call the cluster and get a byte array by a key.
>         If needed, the cluster should be expanded. The origin of the
>         byte arrays are files from AWS S3.
>         Looks like everything could be done using Geode, but:
>         - it looks like the compaction requires a lot of free hard
>         drive space. All I can allow is about 1Gb. Would this work in
>         my case? How could it be done.
>         - Is the objects would be evicted automatically from overflow
>         storage using LRU policy?
>
>         Thanks in advance for your answers, ideas, suggestions.
>         Eugene
>
>
>


Re: Use case question

Posted by Eugene Strokin <eu...@strokin.info>.
Dan, thanks for the response. Yes you right, 512 Mb of course. My mistake.
The idea is to use as much disk space as possible. I understand the
downside of using high compaction threshold. I'll play with that, and see
how bad it could be.
But what about eviction? Would Geode remove objects from the overflow
automatically once it would reach a certain size?
Ideally, I'd like to set the Geode to start kicking LRU objects out once
the free disk space would reach 1Gb. Is it possible? If so, please point me
to the right direction.

Thanks again,
Eugene


On Tue, Apr 19, 2016 at 8:25 PM, Dan Smith <ds...@pivotal.io> wrote:

> I'm guessing you mean 512MB of RAM, not KB? Otherwise, you are definitely
> going to have problems :)
>
> Regarding conserving disk space - I think only allowing for 1 GB free
> space is probably going to run into issues. I think you would be better off
> having fewer droplets with more space if that's possible. And only leaving
> 5% disk space for compaction and as a buffer to avoid running out of disk
> is probably not enough.
>
> By default, geode will compact oplogs when they get to be 50% garbage,
> which means needing maybe 2X the amount of actual disk space. You can
> configure the compaction-threshold to something like 95%, but that means
> geode will be doing a lot of extra work clean up garbage on disk.
> Regardless, you'll probably want to tune down the max-oplog-size to
> something much smaller than 1GB.
>
> -Dan
>
> On Tue, Apr 19, 2016 at 4:26 PM, Eugene Strokin <eu...@strokin.info>
> wrote:
>
>> Hello, I'm seriously consider to use Geode as a core for distributed file
>> cache system. But I have a few questions.
>> But first, this is what needs to be done: Scalable file system with LRU
>> eviction policy utilizing the disc space as much as possible. The idea is
>> to have around 50 small Droplets from DigitalOcean, which provides 512Kb
>> RAM and 20Gb Storage. The client should call the cluster and get a byte
>> array by a key. If needed, the cluster should be expanded. The origin of
>> the byte arrays are files from AWS S3.
>> Looks like everything could be done using Geode, but:
>> - it looks like the compaction requires a lot of free hard drive space.
>> All I can allow is about 1Gb. Would this work in my case? How could it be
>> done.
>> - Is the objects would be evicted automatically from overflow storage
>> using LRU policy?
>>
>> Thanks in advance for your answers, ideas, suggestions.
>> Eugene
>>
>
>

Re: Use case question

Posted by Dan Smith <ds...@pivotal.io>.
I'm guessing you mean 512MB of RAM, not KB? Otherwise, you are definitely
going to have problems :)

Regarding conserving disk space - I think only allowing for 1 GB free space
is probably going to run into issues. I think you would be better off
having fewer droplets with more space if that's possible. And only leaving
5% disk space for compaction and as a buffer to avoid running out of disk
is probably not enough.

By default, geode will compact oplogs when they get to be 50% garbage,
which means needing maybe 2X the amount of actual disk space. You can
configure the compaction-threshold to something like 95%, but that means
geode will be doing a lot of extra work clean up garbage on disk.
Regardless, you'll probably want to tune down the max-oplog-size to
something much smaller than 1GB.

-Dan

On Tue, Apr 19, 2016 at 4:26 PM, Eugene Strokin <eu...@strokin.info> wrote:

> Hello, I'm seriously consider to use Geode as a core for distributed file
> cache system. But I have a few questions.
> But first, this is what needs to be done: Scalable file system with LRU
> eviction policy utilizing the disc space as much as possible. The idea is
> to have around 50 small Droplets from DigitalOcean, which provides 512Kb
> RAM and 20Gb Storage. The client should call the cluster and get a byte
> array by a key. If needed, the cluster should be expanded. The origin of
> the byte arrays are files from AWS S3.
> Looks like everything could be done using Geode, but:
> - it looks like the compaction requires a lot of free hard drive space.
> All I can allow is about 1Gb. Would this work in my case? How could it be
> done.
> - Is the objects would be evicted automatically from overflow storage
> using LRU policy?
>
> Thanks in advance for your answers, ideas, suggestions.
> Eugene
>