You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by iksaif <gi...@git.apache.org> on 2017/08/30 12:54:50 UTC

[GitHub] cassandra pull request #147: [wip] CASSANDRA-10496

GitHub user iksaif opened a pull request:

    https://github.com/apache/cassandra/pull/147

    [wip] CASSANDRA-10496

    Done:
    - --split-output kind of work when running nodetool compact
    - Values with timestamps outside of the first window should be isolated
      and merged back to the correct sstable (which maybe quite costly for now as
      it involves re-writing huge sstables for single values)
    
    TODO:
    - Unit tests
    - Fix remaining TODOs

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/iksaif/cassandra cassandra-10496-trunk

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cassandra/pull/147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #147
    
----
commit 785182fa9f977c65c201cea135ffc8076170276d
Author: Corentin Chary <c....@criteo.com>
Date:   2017-04-28T09:49:56Z

    CASSANDRA-10496
    
    Done:
    - --split-output kind of work when running nodetool compact
    - Values with timestamps outside of the first window should be isolated
      and merged back to the correct sstable (which maybe quite costly for now as
      it involves re-writing huge sstables for single values)
    
    TODO:
    - Unit tests
    - Fix remaining TODOs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] cassandra issue #147: [wip] CASSANDRA-10496

Posted by iksaif <gi...@git.apache.org>.
Github user iksaif commented on the issue:

    https://github.com/apache/cassandra/pull/147
  
    I likely won't have time to finish, and `unsafe_aggressive_sstable_expiration` is good enough for our usecase now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] cassandra pull request #147: [wip] CASSANDRA-10496

Posted by iksaif <gi...@git.apache.org>.
Github user iksaif closed the pull request at:

    https://github.com/apache/cassandra/pull/147


---

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] cassandra issue #147: [wip] CASSANDRA-10496

Posted by eliaslevy <gi...@git.apache.org>.
Github user eliaslevy commented on the issue:

    https://github.com/apache/cassandra/pull/147
  
    I am curious about the status of this work.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] cassandra issue #147: [wip] CASSANDRA-10496

Posted by iksaif <gi...@git.apache.org>.
Github user iksaif commented on the issue:

    https://github.com/apache/cassandra/pull/147
  
    * `switchCompactionLocation(..)`: go it, will update code
    * `Marcus' idea was to only create two sstables per bucket`: ok, I missed that. I'll make it work.
    * `sstables generated before this patch`: I wanted to think about the upgrade strategy only when the other questions would have been answered. If this get shipped before 4.0 is released this could be a non-issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] cassandra issue #147: [wip] CASSANDRA-10496

Posted by michaelsembwever <gi...@git.apache.org>.
Github user michaelsembwever commented on the issue:

    https://github.com/apache/cassandra/pull/147
  
    @iksaif,
     continuing the conversation from [CASSANDRA-10496](https://issues.apache.org/jira/browse/CASSANDRA-10496).
    
    > What do you mean by "changing locations isn't supported" ?
    
    `switchCompactionLocation(..)` is a public method and can be called from other places, when rows need to now be written to a new sstable in a new location. That's why [here](https://github.com/thelastpickle/cassandra/commit/a34a72391bb2847b3bd6ed93b4306199ddf3a991#diff-19359d40a9c932efdebc62a067ed4390R40) i paired writers by their location, so if a location changes the writer can also change.
    
    > Currently it will create up to "minThreshold" sstables
    
    I'm not too sure I get that. The code you referenced is about which bucket is up for compaction, not how many writers to use during a bucket's compaction.
    Marcus' idea was to only create two sstables per bucket, one that contains all the rows that belong in the bucket, and another for old data that's been streamed in late. Therefore SplittingTimeWindowCompactionWriter. writersByBounds` should be a fixed array of size 2. The splitting approach as describe in  `SplittingTimeWindowCompactionWriter`'s class apidoc: splitting in half, then half, down to 50Mb; is quite different to the original idea.
    
    > getBuckets() currently use maxTimestamp, , which isn't available (currently) in the compaction task
    
    @krummas ?
    
    > Are you talking about sstables generated before this patch ?
    
    Yes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org