You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2009/04/02 19:51:42 UTC

Roadmap

Someone asked on IRC if there is a roadmap for Cassandra. This is a
good discussion to have. :)

Personally my priority list looks like this:

High priority:
1. range queries [which requires the partitioner changes we've been discussing]
2. make cassandra not allow itself to run out of memory during
sustained inserts
3. fix distributed remove issues
4. Support unicode keys

Medium priority:
5. pre-emptive repair (what the dynamo paper calls anti-entropy)
6. load balancing

(1) is substantially done but will probably need some tweaking during
code review. And then the client api will probably need some fleshing
out (right now you just get a list of keys back, so that's not very
efficient if you want to get columns for each of those too.)

(2) has workarounds like binarymemtable but I'd really like to get the
main insert path able to handle large insert volume without falling
over. My co-worker is just starting to look into this. I'm hoping
there will be some straightforward improvements to make here.

I outlined an approach to (3) that I think will work here:
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3Ce06563880903301519h922840ds72ef6f9a8d95e07b@mail.gmail.com%3E

I'm waiting for Avinash's feedback but as outlined it is not much code.

(4) is a thrift issue, not Cassandra per se. (see
https://issues.apache.org/jira/browse/THRIFT-395) but it is on my
plate so I thought I'd throw that out there.

I have not started (5) or (6). There are some stubs for load
balancing in the code which is why I said in another thread that the
Facebook developers have probably thought more about this.

I know Avinash is currently finishing up multiget support. Hopefully
he will chime in about what his and Prashant's plans are next.

-Jonathan

Re: Roadmap

Posted by Jonathan ellis <jb...@gmail.com>.

Range queries isn't going to block us.  (The code is already written;  
I just need to rebase it and I'm waiting on #65 for that.)

But in principle I agree.

-Jonathan

On Apr 16, 2009, at 1:42 AM, Per Mellqvist <pe...@mellqvist.name> wrote:

> Great to see a target for a release!
>
> Personally I think the momentum of the project would benefit more from
> having a release to refer to, than any (other) new feature or
> improvement. I understand range queries are a priority for you
> Jonathan. I still wonder if it would not be better to limit 0.3 to
> only bug fixes (priority major or above)?
>
> // Per
>
> On Thu, Apr 16, 2009 at 12:02 AM, Jonathan Ellis <jb...@gmail.com>  
> wrote:
>> I went all Enterprise on our jira and assigned issues to version  
>> "0.3"
>> that I'd like to get done in the relatively near future for our first
>> official release.
>>
>> The list of issues is here:
>> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861
>>
>> Note that many issues are marked Patch Available which means we just
>> need to complete the review process for those.
>>
>> If you want to grab one of the unassigned ones that would be awesome.
>> If you want to grab one of the ones I assigned to myself, that's
>> awesome too, but give me a heads up first so I don't duplicate your
>> effort. :)
>>
>> Also, if there's other issues that you think should be on the 0.3  
>> list
>> feel free to add them.  (Correctness issues especially.)  But IMO we
>> should not let scope creep too much for our first Apache release.
>>
>> -Jonathan
>>
>> On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis <jb...@gmail.com>  
>> wrote:
>>> Someone asked on IRC if there is a roadmap for Cassandra.  This is a
>>> good discussion to have. :)
>>>
>>> Personally my priority list looks like this:
>>>
>>> High priority:
>>>  1. range queries [which requires the partitioner changes we've  
>>> been discussing]
>>>  2. make cassandra not allow itself to run out of memory during
>>> sustained inserts
>>>  3. fix distributed remove issues
>>>  4. Support unicode keys
>>>
>>> Medium priority:
>>>  5. pre-emptive repair (what the dynamo paper calls anti-entropy)
>>>  6. load balancing
>>>
>>> (1) is substantially done but will probably need some tweaking  
>>> during
>>> code review.  And then the client api will probably need some  
>>> fleshing
>>> out (right now you just get a list of keys back, so that's not very
>>> efficient if you want to get columns for each of those too.)
>>>
>>> (2) has workarounds like binarymemtable but I'd really like to get  
>>> the
>>> main insert path able to handle large insert volume without falling
>>> over.  My co-worker is just starting to look into this.  I'm hoping
>>> there will be some straightforward improvements to make here.
>>>
>>> I outlined an approach to (3) that I think will work here:
>>> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3Ce06563880903301519h922840ds72ef6f9a8d95e07b@mail.gmail.com%3E
>>>
>>> I'm waiting for Avinash's feedback but as outlined it is not much  
>>> code.
>>>
>>> (4) is a thrift issue, not Cassandra per se.  (see
>>> https://issues.apache.org/jira/browse/THRIFT-395) but it is on my
>>> plate so I thought I'd throw that out there.
>>>
>>> I have not started (5) or (6).  There are some stubs for load
>>> balancing in the code which is why I said in another thread that the
>>> Facebook developers have probably thought more about this.
>>>
>>> I know Avinash is currently finishing up multiget support.   
>>> Hopefully
>>> he will chime in about what his and Prashant's plans are next.
>>>
>>> -Jonathan
>>>
>>

Re: Roadmap

Posted by Johan Oskarsson <jo...@oskarsson.nu>.

I don't think it's a problem to squeeze in some new features as well, 
but I feel we should set a feature freeze date for 0.3 so that we know 
when to stop. Perhaps a couple of months from now?

After that date trunk would be branched into 0.3 and all non blocking 
issues would be moved from 0.3 to 0.4 in Jira. Then we'd fix the 
remaining blocking bugs and roll a release candidate.

/Johan

Per Mellqvist wrote:
> Great to see a target for a release!
> 
> Personally I think the momentum of the project would benefit more from
> having a release to refer to, than any (other) new feature or
> improvement. I understand range queries are a priority for you
> Jonathan. I still wonder if it would not be better to limit 0.3 to
> only bug fixes (priority major or above)?
> 
> // Per
> 
> On Thu, Apr 16, 2009 at 12:02 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>> I went all Enterprise on our jira and assigned issues to version "0.3"
>> that I'd like to get done in the relatively near future for our first
>> official release.
>>
>> The list of issues is here:
>> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861
>>
>> Note that many issues are marked Patch Available which means we just
>> need to complete the review process for those.
>>
>> If you want to grab one of the unassigned ones that would be awesome.
>> If you want to grab one of the ones I assigned to myself, that's
>> awesome too, but give me a heads up first so I don't duplicate your
>> effort. :)
>>
>> Also, if there's other issues that you think should be on the 0.3 list
>> feel free to add them.  (Correctness issues especially.)  But IMO we
>> should not let scope creep too much for our first Apache release.
>>
>> -Jonathan
>>
>> On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> Someone asked on IRC if there is a roadmap for Cassandra.  This is a
>>> good discussion to have. :)
>>>
>>> Personally my priority list looks like this:
>>>
>>> High priority:
>>>  1. range queries [which requires the partitioner changes we've been discussing]
>>>  2. make cassandra not allow itself to run out of memory during
>>> sustained inserts
>>>  3. fix distributed remove issues
>>>  4. Support unicode keys
>>>
>>> Medium priority:
>>>  5. pre-emptive repair (what the dynamo paper calls anti-entropy)
>>>  6. load balancing
>>>
>>> (1) is substantially done but will probably need some tweaking during
>>> code review.  And then the client api will probably need some fleshing
>>> out (right now you just get a list of keys back, so that's not very
>>> efficient if you want to get columns for each of those too.)
>>>
>>> (2) has workarounds like binarymemtable but I'd really like to get the
>>> main insert path able to handle large insert volume without falling
>>> over.  My co-worker is just starting to look into this.  I'm hoping
>>> there will be some straightforward improvements to make here.
>>>
>>> I outlined an approach to (3) that I think will work here:
>>> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3Ce06563880903301519h922840ds72ef6f9a8d95e07b@mail.gmail.com%3E
>>>
>>> I'm waiting for Avinash's feedback but as outlined it is not much code.
>>>
>>> (4) is a thrift issue, not Cassandra per se.  (see
>>> https://issues.apache.org/jira/browse/THRIFT-395) but it is on my
>>> plate so I thought I'd throw that out there.
>>>
>>> I have not started (5) or (6).  There are some stubs for load
>>> balancing in the code which is why I said in another thread that the
>>> Facebook developers have probably thought more about this.
>>>
>>> I know Avinash is currently finishing up multiget support.  Hopefully
>>> he will chime in about what his and Prashant's plans are next.
>>>
>>> -Jonathan
>>>

Re: Roadmap

Posted by Per Mellqvist <pe...@mellqvist.name>.

Great to see a target for a release!

Personally I think the momentum of the project would benefit more from
having a release to refer to, than any (other) new feature or
improvement. I understand range queries are a priority for you
Jonathan. I still wonder if it would not be better to limit 0.3 to
only bug fixes (priority major or above)?

// Per

On Thu, Apr 16, 2009 at 12:02 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> I went all Enterprise on our jira and assigned issues to version "0.3"
> that I'd like to get done in the relatively near future for our first
> official release.
>
> The list of issues is here:
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861
>
> Note that many issues are marked Patch Available which means we just
> need to complete the review process for those.
>
> If you want to grab one of the unassigned ones that would be awesome.
> If you want to grab one of the ones I assigned to myself, that's
> awesome too, but give me a heads up first so I don't duplicate your
> effort. :)
>
> Also, if there's other issues that you think should be on the 0.3 list
> feel free to add them.  (Correctness issues especially.)  But IMO we
> should not let scope creep too much for our first Apache release.
>
> -Jonathan
>
> On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Someone asked on IRC if there is a roadmap for Cassandra.  This is a
>> good discussion to have. :)
>>
>> Personally my priority list looks like this:
>>
>> High priority:
>>  1. range queries [which requires the partitioner changes we've been discussing]
>>  2. make cassandra not allow itself to run out of memory during
>> sustained inserts
>>  3. fix distributed remove issues
>>  4. Support unicode keys
>>
>> Medium priority:
>>  5. pre-emptive repair (what the dynamo paper calls anti-entropy)
>>  6. load balancing
>>
>> (1) is substantially done but will probably need some tweaking during
>> code review.  And then the client api will probably need some fleshing
>> out (right now you just get a list of keys back, so that's not very
>> efficient if you want to get columns for each of those too.)
>>
>> (2) has workarounds like binarymemtable but I'd really like to get the
>> main insert path able to handle large insert volume without falling
>> over.  My co-worker is just starting to look into this.  I'm hoping
>> there will be some straightforward improvements to make here.
>>
>> I outlined an approach to (3) that I think will work here:
>> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3Ce06563880903301519h922840ds72ef6f9a8d95e07b@mail.gmail.com%3E
>>
>> I'm waiting for Avinash's feedback but as outlined it is not much code.
>>
>> (4) is a thrift issue, not Cassandra per se.  (see
>> https://issues.apache.org/jira/browse/THRIFT-395) but it is on my
>> plate so I thought I'd throw that out there.
>>
>> I have not started (5) or (6).  There are some stubs for load
>> balancing in the code which is why I said in another thread that the
>> Facebook developers have probably thought more about this.
>>
>> I know Avinash is currently finishing up multiget support.  Hopefully
>> he will chime in about what his and Prashant's plans are next.
>>
>> -Jonathan
>>
>

Re: Roadmap

Posted by Jonathan Ellis <jb...@gmail.com>.

I went all Enterprise on our jira and assigned issues to version "0.3"
that I'd like to get done in the relatively near future for our first
official release.

The list of issues is here:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12313861

Note that many issues are marked Patch Available which means we just
need to complete the review process for those.

If you want to grab one of the unassigned ones that would be awesome.
If you want to grab one of the ones I assigned to myself, that's
awesome too, but give me a heads up first so I don't duplicate your
effort. :)

Also, if there's other issues that you think should be on the 0.3 list
feel free to add them.  (Correctness issues especially.)  But IMO we
should not let scope creep too much for our first Apache release.

-Jonathan

On Thu, Apr 2, 2009 at 12:51 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Someone asked on IRC if there is a roadmap for Cassandra.  This is a
> good discussion to have. :)
>
> Personally my priority list looks like this:
>
> High priority:
>  1. range queries [which requires the partitioner changes we've been discussing]
>  2. make cassandra not allow itself to run out of memory during
> sustained inserts
>  3. fix distributed remove issues
>  4. Support unicode keys
>
> Medium priority:
>  5. pre-emptive repair (what the dynamo paper calls anti-entropy)
>  6. load balancing
>
> (1) is substantially done but will probably need some tweaking during
> code review.  And then the client api will probably need some fleshing
> out (right now you just get a list of keys back, so that's not very
> efficient if you want to get columns for each of those too.)
>
> (2) has workarounds like binarymemtable but I'd really like to get the
> main insert path able to handle large insert volume without falling
> over.  My co-worker is just starting to look into this.  I'm hoping
> there will be some straightforward improvements to make here.
>
> I outlined an approach to (3) that I think will work here:
> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-dev/200903.mbox/%3Ce06563880903301519h922840ds72ef6f9a8d95e07b@mail.gmail.com%3E
>
> I'm waiting for Avinash's feedback but as outlined it is not much code.
>
> (4) is a thrift issue, not Cassandra per se.  (see
> https://issues.apache.org/jira/browse/THRIFT-395) but it is on my
> plate so I thought I'd throw that out there.
>
> I have not started (5) or (6).  There are some stubs for load
> balancing in the code which is why I said in another thread that the
> Facebook developers have probably thought more about this.
>
> I know Avinash is currently finishing up multiget support.  Hopefully
> he will chime in about what his and Prashant's plans are next.
>
> -Jonathan
>