You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Filipe David Manana <fd...@apache.org> on 2011/08/16 03:29:16 UTC

Bringing automatic compaction into trunk

Developers, users,

It's been a while now since I opened a Jira ticket for it (
https://issues.apache.org/jira/browse/COUCHDB-1153 ).
I won't describe it here with detail since it's already done in the Jira ticket.

Unless there are objections, I would like to get this moving soon.

Thanks


-- 
Filipe David Manana,
fdmanana@gmail.com, fdmanana@apache.org

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Re: Bringing automatic compaction into trunk

Posted by Paul Davis <pa...@gmail.com>.

Did a quick review. Posted to the ticket.

On Mon, Aug 15, 2011 at 8:29 PM, Filipe David Manana
<fd...@apache.org> wrote:
> Developers, users,
>
> It's been a while now since I opened a Jira ticket for it (
> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
> I won't describe it here with detail since it's already done in the Jira ticket.
>
> Unless there are objections, I would like to get this moving soon.
>
> Thanks
>
>
> --
> Filipe David Manana,
> fdmanana@gmail.com, fdmanana@apache.org
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
>

Re: Bringing automatic compaction into trunk

Posted by Jan Lehnardt <ja...@apache.org>.

On Aug 16, 2011, at 4:00 PM, Robert Newson wrote:

> Ok, let's see Pauls' code concerns addressed first, it needs that
> cleanup before it can hit trunk.
> 
> I'd still prefer to see an event-driven rather than polling approach,
> e.g, hook into update_notifier and build a queue of databases that are
> actively being written to (and therefore growing). A much lazier
> background thing could compact databases that are inactive.

Jup, my discussion was barring that all that is sorted out as an
"implementation detail". Back to JIRA.

Cheers
Jan
-- 

> 
> B.
> 
> On 16 August 2011 14:48, Jan Lehnardt <ja...@apache.org> wrote:
>> 
>> On Aug 16, 2011, at 3:44 PM, Robert Newson wrote:
>> 
>>> All good points Jan, thanks.
>>> 
>>> Having large numbers of databases is one thing, but I'm focused on the
>>> impact on ongoing operations with this running in the background. What
>>> does it do to the users experience to have all dbs scanned
>>> periodically, etc?
>>> 
>>> The reason I suggest doing it after the move, and in its own app, is
>>> to reduce the work needed to not use this code in some circumstances
>>> (Cloudant hosting, for example). Yes, it's a separate module and
>>> disabled by default, but putting it in its own application will make
>>> the separation much more explicit and preclude unintended
>>> entanglements with core over time.
>> 
>> I think this is a valid concern, but I don't think it outweighs the
>> disadvantage. I'm happy to spend time to make sure this is properly
>> modular after srcmv.
>> 
>> Cheers
>> Jan
>> --
>> 
>> 
>>> 
>>> B.
>>> 
>>> On 16 August 2011 14:31, Jan Lehnardt <ja...@apache.org> wrote:
>>>> 
>>>> On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:
>>>> 
>>>>> I'm -1 on the approach (as I understand it) taken by the scheduler as
>>>>> it will be problematic in precisely the circumstance when you'd most
>>>>> want auto compaction (large numbers of databases and views).
>>>> 
>>>> As Filipe mentions in the ticket, this was tested with large numbers of
>>>> databases.
>>>> 
>>>> In addition, your "most want" assumption doesn't hold for the average
>>>> user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
>>>> plus that a software doesn't start wasting a system resource without
>>>> cleaning up after itself. But this isn't even suggesting to enable this by
>>>> default. We have plenty of other features that need proper documentation
>>>> to be used correctly and that we are improving over time to make them
>>>> more obvious by removing common errors or odd behaviour.
>>>> 
>>>>> To this point "Just curious, would it make a big difference to commit
>>>>> the patch before srcmv and migrate it with the rest of the code base
>>>>> rather than letting it rot in JIRA and leave it all to Filipe to keep
>>>>> it updated." -- I'm -∞ on any suggestion that code should be put in
>>>>> trunk to stop it from rotting. Code should land when it's ready. I
>>>>> hope we're all agreed on that and that this paragraph was redundant.
>>>> 
>>>> I was suggesting that the the patch is ready enough for trunk and that
>>>> the level of readiness should not be "solves all possible cases". Especially
>>>> for something that is disabled by default. If we take this to the extreme,
>>>> we'd never add any new features.
>>>> 
>>>> I'm not suggesting "it compiles for me, lets throw it into trunk".
>>>> 
>>>>> After srcmv, and then some work to OTP-ify each of the resultant
>>>>> subdirs, we should add this as a separate application. We might also
>>>>> mark it as beta in the first release to gather feedback from the
>>>>> community.
>>>> 
>>>> I don't see how that is any different from adding it before srcmv and
>>>> avoiding leaving the front-porting effort to a single person.
>>>> 
>>>> Ideally we'd already have srcmv done, but we don't and I don't want
>>>> to hold off progress for an architecture change.
>>>> 
>>>>> I'll be accused of 'stop energy' within nanoseconds of this post so I
>>>>> should end by saying I'm +1 on couchdb gaining the ability to
>>>>> automatically compact its databases and views in principle.
>>>> 
>>>> :)
>>>> 
>>>> Cheers
>>>> Jan
>>>> --
>>>> 
>>>> 
>>>>> 
>>>>> B.
>>>>> 
>>>>> On 16 August 2011 13:19, Jan Lehnardt <ja...@apache.org> wrote:
>>>>>> Good points Robert,
>>>>>> 
>>>>>> I replied inline and then hijacked the thread for a more general discussion, sorry about that  :)
>>>>>> 
>>>>>> On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
>>>>>> 
>>>>>>> Filipe,
>>>>>>> 
>>>>>>>  This is neat, I can definitely see the utility of the approach. I do share the concerns expressed in other comments with respect to the use of the config file for per db compaction specs and the use of a compact_loop that waits on config change messages when the ets table is empty. I don't think it fully takes into account the use case of large numbers of small dbs and/or some very large dbs interspersed with a lot of mid-size dbs.
>>>>>> 
>>>>>> As I seid in the ticket, per-db config is desirable, but I think outside of the scope of the ticket.
>>>>>> 
>>>>>>>  Anyway I like it a lot though I've only read the code for 1/2 and hour or so. I also agree with others that the code base is reaching a point of being a bit crufty and it might be a good time with the git migration, etc.. to take a breath and commit to making some of these OTP compliant changes and design changes we've talked about.
>>>>>> 
>>>>>> Just curious, would it make a big difference to commit the patch before srcmv and migrate it with the rest of the code base rather than letting it rot in JIRA and leave it all to Filipe to keep it updated.
>>>>>> 
>>>>>> I also fear that a srcmv'd release is still out a bit and I'd really like to see this one (and a few others) go into 1.2 (as per my previous mail to this list in another thread). While it isn't the absolute perfect solution in all cases, it is disabled by default and manual compaction strategies work as they did before. In the meantime, we can refine the rest of the system to make it more fully fledged and maybe even enable it by default a few versions down when we are all comfortable with it. I'm not very comfortable keeping good patches in JIRA and not trunk until they solve every little edge case. We haven't worked like this in the past and I don't think it is worth doing.
>>>>>> 
>>>>>> Cheers
>>>>>> Jan
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> Bob
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
>>>>>>> 
>>>>>>>> Developers, users,
>>>>>>>> 
>>>>>>>> It's been a while now since I opened a Jira ticket for it (
>>>>>>>> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
>>>>>>>> I won't describe it here with detail since it's already done in the Jira ticket.
>>>>>>>> 
>>>>>>>> Unless there are objections, I would like to get this moving soon.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Filipe David Manana,
>>>>>>>> fdmanana@gmail.com, fdmanana@apache.org
>>>>>>>> 
>>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>>> Unreasonable men adapt the world to themselves.
>>>>>>>> That's why all progress depends on unreasonable men."
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Bringing automatic compaction into trunk

Posted by Damien Katz <da...@apache.org>.

Filipe is addressing Paul's concerns. As far as scanning vs. an evented architecture, I'd prefer to see Filipe's working code in place, and later replaced with a better alternative. We need to push the project forward, we value useful correct code first. It's easier to improve on it once it's in place.

Also, I have no objections to a more modular architecture, I very much welcome it. But that work can happen concurrently with pushing forward the code and adding features the user community cares about.

-Damien


On Aug 16, 2011, at 7:00 AM, Robert Newson wrote:

> Ok, let's see Pauls' code concerns addressed first, it needs that
> cleanup before it can hit trunk.
> 
> I'd still prefer to see an event-driven rather than polling approach,
> e.g, hook into update_notifier and build a queue of databases that are
> actively being written to (and therefore growing). A much lazier
> background thing could compact databases that are inactive.
> 
> B.
> 
> On 16 August 2011 14:48, Jan Lehnardt <ja...@apache.org> wrote:
>> 
>> On Aug 16, 2011, at 3:44 PM, Robert Newson wrote:
>> 
>>> All good points Jan, thanks.
>>> 
>>> Having large numbers of databases is one thing, but I'm focused on the
>>> impact on ongoing operations with this running in the background. What
>>> does it do to the users experience to have all dbs scanned
>>> periodically, etc?
>>> 
>>> The reason I suggest doing it after the move, and in its own app, is
>>> to reduce the work needed to not use this code in some circumstances
>>> (Cloudant hosting, for example). Yes, it's a separate module and
>>> disabled by default, but putting it in its own application will make
>>> the separation much more explicit and preclude unintended
>>> entanglements with core over time.
>> 
>> I think this is a valid concern, but I don't think it outweighs the
>> disadvantage. I'm happy to spend time to make sure this is properly
>> modular after srcmv.
>> 
>> Cheers
>> Jan
>> --
>> 
>> 
>>> 
>>> B.
>>> 
>>> On 16 August 2011 14:31, Jan Lehnardt <ja...@apache.org> wrote:
>>>> 
>>>> On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:
>>>> 
>>>>> I'm -1 on the approach (as I understand it) taken by the scheduler as
>>>>> it will be problematic in precisely the circumstance when you'd most
>>>>> want auto compaction (large numbers of databases and views).
>>>> 
>>>> As Filipe mentions in the ticket, this was tested with large numbers of
>>>> databases.
>>>> 
>>>> In addition, your "most want" assumption doesn't hold for the average
>>>> user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
>>>> plus that a software doesn't start wasting a system resource without
>>>> cleaning up after itself. But this isn't even suggesting to enable this by
>>>> default. We have plenty of other features that need proper documentation
>>>> to be used correctly and that we are improving over time to make them
>>>> more obvious by removing common errors or odd behaviour.
>>>> 
>>>>> To this point "Just curious, would it make a big difference to commit
>>>>> the patch before srcmv and migrate it with the rest of the code base
>>>>> rather than letting it rot in JIRA and leave it all to Filipe to keep
>>>>> it updated." -- I'm -∞ on any suggestion that code should be put in
>>>>> trunk to stop it from rotting. Code should land when it's ready. I
>>>>> hope we're all agreed on that and that this paragraph was redundant.
>>>> 
>>>> I was suggesting that the the patch is ready enough for trunk and that
>>>> the level of readiness should not be "solves all possible cases". Especially
>>>> for something that is disabled by default. If we take this to the extreme,
>>>> we'd never add any new features.
>>>> 
>>>> I'm not suggesting "it compiles for me, lets throw it into trunk".
>>>> 
>>>>> After srcmv, and then some work to OTP-ify each of the resultant
>>>>> subdirs, we should add this as a separate application. We might also
>>>>> mark it as beta in the first release to gather feedback from the
>>>>> community.
>>>> 
>>>> I don't see how that is any different from adding it before srcmv and
>>>> avoiding leaving the front-porting effort to a single person.
>>>> 
>>>> Ideally we'd already have srcmv done, but we don't and I don't want
>>>> to hold off progress for an architecture change.
>>>> 
>>>>> I'll be accused of 'stop energy' within nanoseconds of this post so I
>>>>> should end by saying I'm +1 on couchdb gaining the ability to
>>>>> automatically compact its databases and views in principle.
>>>> 
>>>> :)
>>>> 
>>>> Cheers
>>>> Jan
>>>> --
>>>> 
>>>> 
>>>>> 
>>>>> B.
>>>>> 
>>>>> On 16 August 2011 13:19, Jan Lehnardt <ja...@apache.org> wrote:
>>>>>> Good points Robert,
>>>>>> 
>>>>>> I replied inline and then hijacked the thread for a more general discussion, sorry about that  :)
>>>>>> 
>>>>>> On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
>>>>>> 
>>>>>>> Filipe,
>>>>>>> 
>>>>>>>  This is neat, I can definitely see the utility of the approach. I do share the concerns expressed in other comments with respect to the use of the config file for per db compaction specs and the use of a compact_loop that waits on config change messages when the ets table is empty. I don't think it fully takes into account the use case of large numbers of small dbs and/or some very large dbs interspersed with a lot of mid-size dbs.
>>>>>> 
>>>>>> As I seid in the ticket, per-db config is desirable, but I think outside of the scope of the ticket.
>>>>>> 
>>>>>>>  Anyway I like it a lot though I've only read the code for 1/2 and hour or so. I also agree with others that the code base is reaching a point of being a bit crufty and it might be a good time with the git migration, etc.. to take a breath and commit to making some of these OTP compliant changes and design changes we've talked about.
>>>>>> 
>>>>>> Just curious, would it make a big difference to commit the patch before srcmv and migrate it with the rest of the code base rather than letting it rot in JIRA and leave it all to Filipe to keep it updated.
>>>>>> 
>>>>>> I also fear that a srcmv'd release is still out a bit and I'd really like to see this one (and a few others) go into 1.2 (as per my previous mail to this list in another thread). While it isn't the absolute perfect solution in all cases, it is disabled by default and manual compaction strategies work as they did before. In the meantime, we can refine the rest of the system to make it more fully fledged and maybe even enable it by default a few versions down when we are all comfortable with it. I'm not very comfortable keeping good patches in JIRA and not trunk until they solve every little edge case. We haven't worked like this in the past and I don't think it is worth doing.
>>>>>> 
>>>>>> Cheers
>>>>>> Jan
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> Bob
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
>>>>>>> 
>>>>>>>> Developers, users,
>>>>>>>> 
>>>>>>>> It's been a while now since I opened a Jira ticket for it (
>>>>>>>> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
>>>>>>>> I won't describe it here with detail since it's already done in the Jira ticket.
>>>>>>>> 
>>>>>>>> Unless there are objections, I would like to get this moving soon.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Filipe David Manana,
>>>>>>>> fdmanana@gmail.com, fdmanana@apache.org
>>>>>>>> 
>>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>>> Unreasonable men adapt the world to themselves.
>>>>>>>> That's why all progress depends on unreasonable men."
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Bringing automatic compaction into trunk

Posted by Robert Newson <rn...@apache.org>.

Ok, let's see Pauls' code concerns addressed first, it needs that
cleanup before it can hit trunk.

I'd still prefer to see an event-driven rather than polling approach,
e.g, hook into update_notifier and build a queue of databases that are
actively being written to (and therefore growing). A much lazier
background thing could compact databases that are inactive.

B.

On 16 August 2011 14:48, Jan Lehnardt <ja...@apache.org> wrote:
>
> On Aug 16, 2011, at 3:44 PM, Robert Newson wrote:
>
>> All good points Jan, thanks.
>>
>> Having large numbers of databases is one thing, but I'm focused on the
>> impact on ongoing operations with this running in the background. What
>> does it do to the users experience to have all dbs scanned
>> periodically, etc?
>>
>> The reason I suggest doing it after the move, and in its own app, is
>> to reduce the work needed to not use this code in some circumstances
>> (Cloudant hosting, for example). Yes, it's a separate module and
>> disabled by default, but putting it in its own application will make
>> the separation much more explicit and preclude unintended
>> entanglements with core over time.
>
> I think this is a valid concern, but I don't think it outweighs the
> disadvantage. I'm happy to spend time to make sure this is properly
> modular after srcmv.
>
> Cheers
> Jan
> --
>
>
>>
>> B.
>>
>> On 16 August 2011 14:31, Jan Lehnardt <ja...@apache.org> wrote:
>>>
>>> On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:
>>>
>>>> I'm -1 on the approach (as I understand it) taken by the scheduler as
>>>> it will be problematic in precisely the circumstance when you'd most
>>>> want auto compaction (large numbers of databases and views).
>>>
>>> As Filipe mentions in the ticket, this was tested with large numbers of
>>> databases.
>>>
>>> In addition, your "most want" assumption doesn't hold for the average
>>> user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
>>> plus that a software doesn't start wasting a system resource without
>>> cleaning up after itself. But this isn't even suggesting to enable this by
>>> default. We have plenty of other features that need proper documentation
>>> to be used correctly and that we are improving over time to make them
>>> more obvious by removing common errors or odd behaviour.
>>>
>>>> To this point "Just curious, would it make a big difference to commit
>>>> the patch before srcmv and migrate it with the rest of the code base
>>>> rather than letting it rot in JIRA and leave it all to Filipe to keep
>>>> it updated." -- I'm -∞ on any suggestion that code should be put in
>>>> trunk to stop it from rotting. Code should land when it's ready. I
>>>> hope we're all agreed on that and that this paragraph was redundant.
>>>
>>> I was suggesting that the the patch is ready enough for trunk and that
>>> the level of readiness should not be "solves all possible cases". Especially
>>> for something that is disabled by default. If we take this to the extreme,
>>> we'd never add any new features.
>>>
>>> I'm not suggesting "it compiles for me, lets throw it into trunk".
>>>
>>>> After srcmv, and then some work to OTP-ify each of the resultant
>>>> subdirs, we should add this as a separate application. We might also
>>>> mark it as beta in the first release to gather feedback from the
>>>> community.
>>>
>>> I don't see how that is any different from adding it before srcmv and
>>> avoiding leaving the front-porting effort to a single person.
>>>
>>> Ideally we'd already have srcmv done, but we don't and I don't want
>>> to hold off progress for an architecture change.
>>>
>>>> I'll be accused of 'stop energy' within nanoseconds of this post so I
>>>> should end by saying I'm +1 on couchdb gaining the ability to
>>>> automatically compact its databases and views in principle.
>>>
>>> :)
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>>>
>>>> B.
>>>>
>>>> On 16 August 2011 13:19, Jan Lehnardt <ja...@apache.org> wrote:
>>>>> Good points Robert,
>>>>>
>>>>> I replied inline and then hijacked the thread for a more general discussion, sorry about that  :)
>>>>>
>>>>> On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
>>>>>
>>>>>> Filipe,
>>>>>>
>>>>>>  This is neat, I can definitely see the utility of the approach. I do share the concerns expressed in other comments with respect to the use of the config file for per db compaction specs and the use of a compact_loop that waits on config change messages when the ets table is empty. I don't think it fully takes into account the use case of large numbers of small dbs and/or some very large dbs interspersed with a lot of mid-size dbs.
>>>>>
>>>>> As I seid in the ticket, per-db config is desirable, but I think outside of the scope of the ticket.
>>>>>
>>>>>>  Anyway I like it a lot though I've only read the code for 1/2 and hour or so. I also agree with others that the code base is reaching a point of being a bit crufty and it might be a good time with the git migration, etc.. to take a breath and commit to making some of these OTP compliant changes and design changes we've talked about.
>>>>>
>>>>> Just curious, would it make a big difference to commit the patch before srcmv and migrate it with the rest of the code base rather than letting it rot in JIRA and leave it all to Filipe to keep it updated.
>>>>>
>>>>> I also fear that a srcmv'd release is still out a bit and I'd really like to see this one (and a few others) go into 1.2 (as per my previous mail to this list in another thread). While it isn't the absolute perfect solution in all cases, it is disabled by default and manual compaction strategies work as they did before. In the meantime, we can refine the rest of the system to make it more fully fledged and maybe even enable it by default a few versions down when we are all comfortable with it. I'm not very comfortable keeping good patches in JIRA and not trunk until they solve every little edge case. We haven't worked like this in the past and I don't think it is worth doing.
>>>>>
>>>>> Cheers
>>>>> Jan
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Bob
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
>>>>>>
>>>>>>> Developers, users,
>>>>>>>
>>>>>>> It's been a while now since I opened a Jira ticket for it (
>>>>>>> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
>>>>>>> I won't describe it here with detail since it's already done in the Jira ticket.
>>>>>>>
>>>>>>> Unless there are objections, I would like to get this moving soon.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Filipe David Manana,
>>>>>>> fdmanana@gmail.com, fdmanana@apache.org
>>>>>>>
>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>> Unreasonable men adapt the world to themselves.
>>>>>>> That's why all progress depends on unreasonable men."
>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

Re: Bringing automatic compaction into trunk

Posted by Jan Lehnardt <ja...@apache.org>.

On Aug 16, 2011, at 3:44 PM, Robert Newson wrote:

> All good points Jan, thanks.
> 
> Having large numbers of databases is one thing, but I'm focused on the
> impact on ongoing operations with this running in the background. What
> does it do to the users experience to have all dbs scanned
> periodically, etc?
> 
> The reason I suggest doing it after the move, and in its own app, is
> to reduce the work needed to not use this code in some circumstances
> (Cloudant hosting, for example). Yes, it's a separate module and
> disabled by default, but putting it in its own application will make
> the separation much more explicit and preclude unintended
> entanglements with core over time.

I think this is a valid concern, but I don't think it outweighs the
disadvantage. I'm happy to spend time to make sure this is properly
modular after srcmv.

Cheers
Jan
-- 


> 
> B.
> 
> On 16 August 2011 14:31, Jan Lehnardt <ja...@apache.org> wrote:
>> 
>> On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:
>> 
>>> I'm -1 on the approach (as I understand it) taken by the scheduler as
>>> it will be problematic in precisely the circumstance when you'd most
>>> want auto compaction (large numbers of databases and views).
>> 
>> As Filipe mentions in the ticket, this was tested with large numbers of
>> databases.
>> 
>> In addition, your "most want" assumption doesn't hold for the average
>> user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
>> plus that a software doesn't start wasting a system resource without
>> cleaning up after itself. But this isn't even suggesting to enable this by
>> default. We have plenty of other features that need proper documentation
>> to be used correctly and that we are improving over time to make them
>> more obvious by removing common errors or odd behaviour.
>> 
>>> To this point "Just curious, would it make a big difference to commit
>>> the patch before srcmv and migrate it with the rest of the code base
>>> rather than letting it rot in JIRA and leave it all to Filipe to keep
>>> it updated." -- I'm -∞ on any suggestion that code should be put in
>>> trunk to stop it from rotting. Code should land when it's ready. I
>>> hope we're all agreed on that and that this paragraph was redundant.
>> 
>> I was suggesting that the the patch is ready enough for trunk and that
>> the level of readiness should not be "solves all possible cases". Especially
>> for something that is disabled by default. If we take this to the extreme,
>> we'd never add any new features.
>> 
>> I'm not suggesting "it compiles for me, lets throw it into trunk".
>> 
>>> After srcmv, and then some work to OTP-ify each of the resultant
>>> subdirs, we should add this as a separate application. We might also
>>> mark it as beta in the first release to gather feedback from the
>>> community.
>> 
>> I don't see how that is any different from adding it before srcmv and
>> avoiding leaving the front-porting effort to a single person.
>> 
>> Ideally we'd already have srcmv done, but we don't and I don't want
>> to hold off progress for an architecture change.
>> 
>>> I'll be accused of 'stop energy' within nanoseconds of this post so I
>>> should end by saying I'm +1 on couchdb gaining the ability to
>>> automatically compact its databases and views in principle.
>> 
>> :)
>> 
>> Cheers
>> Jan
>> --
>> 
>> 
>>> 
>>> B.
>>> 
>>> On 16 August 2011 13:19, Jan Lehnardt <ja...@apache.org> wrote:
>>>> Good points Robert,
>>>> 
>>>> I replied inline and then hijacked the thread for a more general discussion, sorry about that  :)
>>>> 
>>>> On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
>>>> 
>>>>> Filipe,
>>>>> 
>>>>>  This is neat, I can definitely see the utility of the approach. I do share the concerns expressed in other comments with respect to the use of the config file for per db compaction specs and the use of a compact_loop that waits on config change messages when the ets table is empty. I don't think it fully takes into account the use case of large numbers of small dbs and/or some very large dbs interspersed with a lot of mid-size dbs.
>>>> 
>>>> As I seid in the ticket, per-db config is desirable, but I think outside of the scope of the ticket.
>>>> 
>>>>>  Anyway I like it a lot though I've only read the code for 1/2 and hour or so. I also agree with others that the code base is reaching a point of being a bit crufty and it might be a good time with the git migration, etc.. to take a breath and commit to making some of these OTP compliant changes and design changes we've talked about.
>>>> 
>>>> Just curious, would it make a big difference to commit the patch before srcmv and migrate it with the rest of the code base rather than letting it rot in JIRA and leave it all to Filipe to keep it updated.
>>>> 
>>>> I also fear that a srcmv'd release is still out a bit and I'd really like to see this one (and a few others) go into 1.2 (as per my previous mail to this list in another thread). While it isn't the absolute perfect solution in all cases, it is disabled by default and manual compaction strategies work as they did before. In the meantime, we can refine the rest of the system to make it more fully fledged and maybe even enable it by default a few versions down when we are all comfortable with it. I'm not very comfortable keeping good patches in JIRA and not trunk until they solve every little edge case. We haven't worked like this in the past and I don't think it is worth doing.
>>>> 
>>>> Cheers
>>>> Jan
>>>> --
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Bob
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
>>>>> 
>>>>>> Developers, users,
>>>>>> 
>>>>>> It's been a while now since I opened a Jira ticket for it (
>>>>>> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
>>>>>> I won't describe it here with detail since it's already done in the Jira ticket.
>>>>>> 
>>>>>> Unless there are objections, I would like to get this moving soon.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Filipe David Manana,
>>>>>> fdmanana@gmail.com, fdmanana@apache.org
>>>>>> 
>>>>>> "Reasonable men adapt themselves to the world.
>>>>>> Unreasonable men adapt the world to themselves.
>>>>>> That's why all progress depends on unreasonable men."
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Bringing automatic compaction into trunk

Posted by Robert Newson <rn...@apache.org>.

All good points Jan, thanks.

Having large numbers of databases is one thing, but I'm focused on the
impact on ongoing operations with this running in the background. What
does it do to the users experience to have all dbs scanned
periodically, etc?

The reason I suggest doing it after the move, and in its own app, is
to reduce the work needed to not use this code in some circumstances
(Cloudant hosting, for example). Yes, it's a separate module and
disabled by default, but putting it in its own application will make
the separation much more explicit and preclude unintended
entanglements with core over time.

B.

On 16 August 2011 14:31, Jan Lehnardt <ja...@apache.org> wrote:
>
> On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:
>
>> I'm -1 on the approach (as I understand it) taken by the scheduler as
>> it will be problematic in precisely the circumstance when you'd most
>> want auto compaction (large numbers of databases and views).
>
> As Filipe mentions in the ticket, this was tested with large numbers of
> databases.
>
> In addition, your "most want" assumption doesn't hold for the average
> user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
> plus that a software doesn't start wasting a system resource without
> cleaning up after itself. But this isn't even suggesting to enable this by
> default. We have plenty of other features that need proper documentation
> to be used correctly and that we are improving over time to make them
> more obvious by removing common errors or odd behaviour.
>
>> To this point "Just curious, would it make a big difference to commit
>> the patch before srcmv and migrate it with the rest of the code base
>> rather than letting it rot in JIRA and leave it all to Filipe to keep
>> it updated." -- I'm -∞ on any suggestion that code should be put in
>> trunk to stop it from rotting. Code should land when it's ready. I
>> hope we're all agreed on that and that this paragraph was redundant.
>
> I was suggesting that the the patch is ready enough for trunk and that
> the level of readiness should not be "solves all possible cases". Especially
> for something that is disabled by default. If we take this to the extreme,
> we'd never add any new features.
>
> I'm not suggesting "it compiles for me, lets throw it into trunk".
>
>> After srcmv, and then some work to OTP-ify each of the resultant
>> subdirs, we should add this as a separate application. We might also
>> mark it as beta in the first release to gather feedback from the
>> community.
>
> I don't see how that is any different from adding it before srcmv and
> avoiding leaving the front-porting effort to a single person.
>
> Ideally we'd already have srcmv done, but we don't and I don't want
> to hold off progress for an architecture change.
>
>> I'll be accused of 'stop energy' within nanoseconds of this post so I
>> should end by saying I'm +1 on couchdb gaining the ability to
>> automatically compact its databases and views in principle.
>
> :)
>
> Cheers
> Jan
> --
>
>
>>
>> B.
>>
>> On 16 August 2011 13:19, Jan Lehnardt <ja...@apache.org> wrote:
>>> Good points Robert,
>>>
>>> I replied inline and then hijacked the thread for a more general discussion, sorry about that  :)
>>>
>>> On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
>>>
>>>> Filipe,
>>>>
>>>>  This is neat, I can definitely see the utility of the approach. I do share the concerns expressed in other comments with respect to the use of the config file for per db compaction specs and the use of a compact_loop that waits on config change messages when the ets table is empty. I don't think it fully takes into account the use case of large numbers of small dbs and/or some very large dbs interspersed with a lot of mid-size dbs.
>>>
>>> As I seid in the ticket, per-db config is desirable, but I think outside of the scope of the ticket.
>>>
>>>>  Anyway I like it a lot though I've only read the code for 1/2 and hour or so. I also agree with others that the code base is reaching a point of being a bit crufty and it might be a good time with the git migration, etc.. to take a breath and commit to making some of these OTP compliant changes and design changes we've talked about.
>>>
>>> Just curious, would it make a big difference to commit the patch before srcmv and migrate it with the rest of the code base rather than letting it rot in JIRA and leave it all to Filipe to keep it updated.
>>>
>>> I also fear that a srcmv'd release is still out a bit and I'd really like to see this one (and a few others) go into 1.2 (as per my previous mail to this list in another thread). While it isn't the absolute perfect solution in all cases, it is disabled by default and manual compaction strategies work as they did before. In the meantime, we can refine the rest of the system to make it more fully fledged and maybe even enable it by default a few versions down when we are all comfortable with it. I'm not very comfortable keeping good patches in JIRA and not trunk until they solve every little edge case. We haven't worked like this in the past and I don't think it is worth doing.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>>
>>>
>>>>
>>>> Regards,
>>>>
>>>> Bob
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
>>>>
>>>>> Developers, users,
>>>>>
>>>>> It's been a while now since I opened a Jira ticket for it (
>>>>> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
>>>>> I won't describe it here with detail since it's already done in the Jira ticket.
>>>>>
>>>>> Unless there are objections, I would like to get this moving soon.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> --
>>>>> Filipe David Manana,
>>>>> fdmanana@gmail.com, fdmanana@apache.org
>>>>>
>>>>> "Reasonable men adapt themselves to the world.
>>>>> Unreasonable men adapt the world to themselves.
>>>>> That's why all progress depends on unreasonable men."
>>>>
>>>
>>>
>
>

Re: Bringing automatic compaction into trunk

Posted by Jan Lehnardt <ja...@apache.org>.

On Aug 16, 2011, at 2:59 PM, Robert Newson wrote:

> I'm -1 on the approach (as I understand it) taken by the scheduler as
> it will be problematic in precisely the circumstance when you'd most
> want auto compaction (large numbers of databases and views).

As Filipe mentions in the ticket, this was tested with large numbers of
databases.

In addition, your "most want" assumption doesn't hold for the average
user, I'd wager (no numbers, alas). I'd say it's a basic user-experience
plus that a software doesn't start wasting a system resource without
cleaning up after itself. But this isn't even suggesting to enable this by
default. We have plenty of other features that need proper documentation
to be used correctly and that we are improving over time to make them
more obvious by removing common errors or odd behaviour.

> To this point "Just curious, would it make a big difference to commit
> the patch before srcmv and migrate it with the rest of the code base
> rather than letting it rot in JIRA and leave it all to Filipe to keep
> it updated." -- I'm -∞ on any suggestion that code should be put in
> trunk to stop it from rotting. Code should land when it's ready. I
> hope we're all agreed on that and that this paragraph was redundant.

I was suggesting that the the patch is ready enough for trunk and that
the level of readiness should not be "solves all possible cases". Especially
for something that is disabled by default. If we take this to the extreme,
we'd never add any new features.

I'm not suggesting "it compiles for me, lets throw it into trunk".

> After srcmv, and then some work to OTP-ify each of the resultant
> subdirs, we should add this as a separate application. We might also
> mark it as beta in the first release to gather feedback from the
> community.

I don't see how that is any different from adding it before srcmv and
avoiding leaving the front-porting effort to a single person.

Ideally we'd already have srcmv done, but we don't and I don't want
to hold off progress for an architecture change.

> I'll be accused of 'stop energy' within nanoseconds of this post so I
> should end by saying I'm +1 on couchdb gaining the ability to
> automatically compact its databases and views in principle.

:)

Cheers
Jan
-- 


> 
> B.
> 
> On 16 August 2011 13:19, Jan Lehnardt <ja...@apache.org> wrote:
>> Good points Robert,
>> 
>> I replied inline and then hijacked the thread for a more general discussion, sorry about that  :)
>> 
>> On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
>> 
>>> Filipe,
>>> 
>>>  This is neat, I can definitely see the utility of the approach. I do share the concerns expressed in other comments with respect to the use of the config file for per db compaction specs and the use of a compact_loop that waits on config change messages when the ets table is empty. I don't think it fully takes into account the use case of large numbers of small dbs and/or some very large dbs interspersed with a lot of mid-size dbs.
>> 
>> As I seid in the ticket, per-db config is desirable, but I think outside of the scope of the ticket.
>> 
>>>  Anyway I like it a lot though I've only read the code for 1/2 and hour or so. I also agree with others that the code base is reaching a point of being a bit crufty and it might be a good time with the git migration, etc.. to take a breath and commit to making some of these OTP compliant changes and design changes we've talked about.
>> 
>> Just curious, would it make a big difference to commit the patch before srcmv and migrate it with the rest of the code base rather than letting it rot in JIRA and leave it all to Filipe to keep it updated.
>> 
>> I also fear that a srcmv'd release is still out a bit and I'd really like to see this one (and a few others) go into 1.2 (as per my previous mail to this list in another thread). While it isn't the absolute perfect solution in all cases, it is disabled by default and manual compaction strategies work as they did before. In the meantime, we can refine the rest of the system to make it more fully fledged and maybe even enable it by default a few versions down when we are all comfortable with it. I'm not very comfortable keeping good patches in JIRA and not trunk until they solve every little edge case. We haven't worked like this in the past and I don't think it is worth doing.
>> 
>> Cheers
>> Jan
>> --
>> 
>> 
>> 
>> 
>>> 
>>> Regards,
>>> 
>>> Bob
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
>>> 
>>>> Developers, users,
>>>> 
>>>> It's been a while now since I opened a Jira ticket for it (
>>>> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
>>>> I won't describe it here with detail since it's already done in the Jira ticket.
>>>> 
>>>> Unless there are objections, I would like to get this moving soon.
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> --
>>>> Filipe David Manana,
>>>> fdmanana@gmail.com, fdmanana@apache.org
>>>> 
>>>> "Reasonable men adapt themselves to the world.
>>>> Unreasonable men adapt the world to themselves.
>>>> That's why all progress depends on unreasonable men."
>>> 
>> 
>>

Re: Bringing automatic compaction into trunk

Posted by Robert Newson <rn...@apache.org>.

I'm -1 on the approach (as I understand it) taken by the scheduler as
it will be problematic in precisely the circumstance when you'd most
want auto compaction (large numbers of databases and views).

To this point "Just curious, would it make a big difference to commit
the patch before srcmv and migrate it with the rest of the code base
rather than letting it rot in JIRA and leave it all to Filipe to keep
it updated." -- I'm -∞ on any suggestion that code should be put in
trunk to stop it from rotting. Code should land when it's ready. I
hope we're all agreed on that and that this paragraph was redundant.

After srcmv, and then some work to OTP-ify each of the resultant
subdirs, we should add this as a separate application. We might also
mark it as beta in the first release to gather feedback from the
community.

I'll be accused of 'stop energy' within nanoseconds of this post so I
should end by saying I'm +1 on couchdb gaining the ability to
automatically compact its databases and views in principle.

B.

On 16 August 2011 13:19, Jan Lehnardt <ja...@apache.org> wrote:
> Good points Robert,
>
> I replied inline and then hijacked the thread for a more general discussion, sorry about that  :)
>
> On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:
>
>> Filipe,
>>
>>  This is neat, I can definitely see the utility of the approach. I do share the concerns expressed in other comments with respect to the use of the config file for per db compaction specs and the use of a compact_loop that waits on config change messages when the ets table is empty. I don't think it fully takes into account the use case of large numbers of small dbs and/or some very large dbs interspersed with a lot of mid-size dbs.
>
> As I seid in the ticket, per-db config is desirable, but I think outside of the scope of the ticket.
>
>>  Anyway I like it a lot though I've only read the code for 1/2 and hour or so. I also agree with others that the code base is reaching a point of being a bit crufty and it might be a good time with the git migration, etc.. to take a breath and commit to making some of these OTP compliant changes and design changes we've talked about.
>
> Just curious, would it make a big difference to commit the patch before srcmv and migrate it with the rest of the code base rather than letting it rot in JIRA and leave it all to Filipe to keep it updated.
>
> I also fear that a srcmv'd release is still out a bit and I'd really like to see this one (and a few others) go into 1.2 (as per my previous mail to this list in another thread). While it isn't the absolute perfect solution in all cases, it is disabled by default and manual compaction strategies work as they did before. In the meantime, we can refine the rest of the system to make it more fully fledged and maybe even enable it by default a few versions down when we are all comfortable with it. I'm not very comfortable keeping good patches in JIRA and not trunk until they solve every little edge case. We haven't worked like this in the past and I don't think it is worth doing.
>
> Cheers
> Jan
> --
>
>
>
>
>>
>> Regards,
>>
>> Bob
>>
>>
>>
>>
>>
>> On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
>>
>>> Developers, users,
>>>
>>> It's been a while now since I opened a Jira ticket for it (
>>> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
>>> I won't describe it here with detail since it's already done in the Jira ticket.
>>>
>>> Unless there are objections, I would like to get this moving soon.
>>>
>>> Thanks
>>>
>>>
>>> --
>>> Filipe David Manana,
>>> fdmanana@gmail.com, fdmanana@apache.org
>>>
>>> "Reasonable men adapt themselves to the world.
>>> Unreasonable men adapt the world to themselves.
>>> That's why all progress depends on unreasonable men."
>>
>
>

Re: Bringing automatic compaction into trunk

Posted by Jan Lehnardt <ja...@apache.org>.

Good points Robert,

I replied inline and then hijacked the thread for a more general discussion, sorry about that  :)

On Aug 16, 2011, at 2:08 PM, Robert Dionne wrote:

> Filipe,
> 
>  This is neat, I can definitely see the utility of the approach. I do share the concerns expressed in other comments with respect to the use of the config file for per db compaction specs and the use of a compact_loop that waits on config change messages when the ets table is empty. I don't think it fully takes into account the use case of large numbers of small dbs and/or some very large dbs interspersed with a lot of mid-size dbs.

As I seid in the ticket, per-db config is desirable, but I think outside of the scope of the ticket.

>  Anyway I like it a lot though I've only read the code for 1/2 and hour or so. I also agree with others that the code base is reaching a point of being a bit crufty and it might be a good time with the git migration, etc.. to take a breath and commit to making some of these OTP compliant changes and design changes we've talked about.

Just curious, would it make a big difference to commit the patch before srcmv and migrate it with the rest of the code base rather than letting it rot in JIRA and leave it all to Filipe to keep it updated.

I also fear that a srcmv'd release is still out a bit and I'd really like to see this one (and a few others) go into 1.2 (as per my previous mail to this list in another thread). While it isn't the absolute perfect solution in all cases, it is disabled by default and manual compaction strategies work as they did before. In the meantime, we can refine the rest of the system to make it more fully fledged and maybe even enable it by default a few versions down when we are all comfortable with it. I'm not very comfortable keeping good patches in JIRA and not trunk until they solve every little edge case. We haven't worked like this in the past and I don't think it is worth doing.

Cheers
Jan
-- 

> 
> Regards,
> 
> Bob
> 
> 
> 
> 
> 
> On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:
> 
>> Developers, users,
>> 
>> It's been a while now since I opened a Jira ticket for it (
>> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
>> I won't describe it here with detail since it's already done in the Jira ticket.
>> 
>> Unless there are objections, I would like to get this moving soon.
>> 
>> Thanks
>> 
>> 
>> -- 
>> Filipe David Manana,
>> fdmanana@gmail.com, fdmanana@apache.org
>> 
>> "Reasonable men adapt themselves to the world.
>> Unreasonable men adapt the world to themselves.
>> That's why all progress depends on unreasonable men."
>

Re: Bringing automatic compaction into trunk

Posted by Robert Dionne <di...@dionne-associates.com>.

Filipe,

  This is neat, I can definitely see the utility of the approach. I do share the concerns expressed in other comments with respect to the use of the config file for per db compaction specs and the use of a compact_loop that waits on config change messages when the ets table is empty. I don't think it fully takes into account the use case of large numbers of small dbs and/or some very large dbs interspersed with a lot of mid-size dbs.

  Anyway I like it a lot though I've only read the code for 1/2 and hour or so. I also agree with others that the code base is reaching a point of being a bit crufty and it might be a good time with the git migration, etc.. to take a breath and commit to making some of these OTP compliant changes and design changes we've talked about.

Regards,

Bob

On Aug 15, 2011, at 9:29 PM, Filipe David Manana wrote:

> Developers, users,
> 
> It's been a while now since I opened a Jira ticket for it (
> https://issues.apache.org/jira/browse/COUCHDB-1153 ).
> I won't describe it here with detail since it's already done in the Jira ticket.
> 
> Unless there are objections, I would like to get this moving soon.
> 
> Thanks
> 
> 
> -- 
> Filipe David Manana,
> fdmanana@gmail.com, fdmanana@apache.org
> 
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."