You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@phoenix.apache.org by lars hofhansl <la...@apache.org> on 2017/09/13 17:22:41 UTC

Fw: Phoenix code quality

Hi all Phoenix developers,
here's a thread that I had started on the private PMC list, and we agreed to have this as a public discussion.

I'd like to solicit feedback on the 6 steps/recommendations below and about we can ingrain those into the development process.
Comments, concerns, are - as always - welcome!
-- Lars  
----- Forwarded Message -----
 From: lars hofhansl <la...@apache.org>
 To: Private <pr...@phoenix.apache.org> 
 Sent: Tuesday, September 5, 2017 9:59 PM
 Subject: Phoenix code quality

Hi all,
I realize this might be a difficult topic, and let me prefix this by saying that this is my opinion only.
Phoenix is coming to a point where big organizations are relying on it.
At Salesforce we do billions of Phoenix queries per day... And we had a bunch of recent production issues - only in part caused by Phoenix.

If there was a patch here and there that lacks quality, tests, comments, or proper documentation, then it's the fault of the person who created the patch.
If, however, this happens with some frequency, then it a problem that should involve PMC and committers who review and commit the patches in question.
I'd like to suggest the following:
1. Comments in the code should be considered when judging a patch for its merit. No need to go overboard, but there should be enough comments so that someone new the code can get an idea about what this code is doing.
2. Eyeball each patch for how it would scale. Will it all work on 1000 machines? With 1bn rows? With 1000 indexes? etc, etc.If it's not obvious, ask the creator of the patch. Agree on what the scaling goals should be.(For anything that works only for a few million rows or on a dozen machines, nobody in their right mind would accept the complexity of running Phoenix - and HBase, HDFS, ZK, etc - folks would and should simply use Postgres.)
3. Check how a patch will behave under failure. Machines failures are common. Regions may not reachable for a bit, etc. Are there good timeouts? Everything should gracefully continue to work.

4. Double check that tests check for corner conditions.
5. Err on the side of stability, rather than committing a patch as beta. If it's in the code, people _will_ use it.
6. Are all config options properly explained and make sense? It's better to err on the side of fewer config options.

7. Probably more stuff...

Again. Just MHO. Many of these things are already done. But I still thought might be good to have a quick discussion around this.

Comments?
Thanks.
-- Lars

Re: Phoenix code quality

Posted by Andrew Purtell <ap...@apache.org>.

Also, in my experience curating the Hadoop ecosystem (a process that starts
with Apache $FOO release x.y.z, then applies individual upstream commits
one patch at a time), Phoenix code is more tightly coupled than any other
project I have encountered, out of: ZooKeeper, Hadoop, HBase, Pig, Hive,
Spark. Most of the time I can cherry pick a single upstream commit onto a
release successfully. Sometimes I need to apply one or two prerequisite
changes first. Those changes tend to apply successfully. This is due in no
small part to how those projects structure their code. By successfully I
mean with only minor fuzz or trivial rejects like import ordering. However,
not so with Phoenix. More often than not the rejects are nontrivial,
especially the integration tests. A change to Phoenix often refactors an
integration test rather than adds or subtracts compilation or test units.
Where cherry picks from other projects typically bring over new tests which
pass, cherry picks from Phoenix usually are a pain to merge and then
numerous tests fail.

I suspect this tendency toward coupling will impact how successful (or not)
a branch merge process might be. Certainly developers working on the
feature branch will suffer more when rebasing on the main branch than in
other projects, like HBase, where this approach has been successful.

I think Kevin Liew's earlier below comment also touched on this concern.


*> Parts of the codebase can be quite intimidating due to the amount of
state that needs to be tracked.*




On Mon, Sep 25, 2017 at 10:10 AM, Andrew Purtell <ap...@apache.org>
wrote:

> A model that has worked well for HBase is large feature development on a
> separate branch, followed by a process to do a branch merge back to the
> main project. The big upside is feature developers have near total freedom.
> The big downside is the merge can take a while to review/approve and
> rebasing the code to land the feature is a lot of last minute work and the
> merge can temporarily destabilize both the feature and the main project. On
> balance, the freedom to do independent feature development without
> impacting the main project makes it worth it, IMHO.
>
>
>
> On Mon, Sep 25, 2017 at 9:30 AM, Jan Fernando <jf...@salesforce.com>
> wrote:
>
>> Lars,
>>
>> I think these are really awesome guidelines. As Phoenix reaches a new
>> phase
>> of maturity and operational complexity (which is really exciting and
>> important for the project IMHO), I think these things are becoming even
>> more important.
>>
>> Re #5, I agree we need to err on the side of stability. I agree if
>> features
>> are there in main and documented people will use them. However, right now
>> it's hard for users of the Phoenix to discern which features are mature
>> versus which features may still need hardening at scale. I think it might
>> help to actually come up with a more standardized process for developing
>> "beta" or new high impact features. Perhaps we can follow what other
>> projects like HBase do. For example: Should big changes be in their own
>> branch? In some cases we have done things like this in Phoenix (e.g.
>> Calcite) and in others we have not (e.g. transaction support).  I think
>> consistency would be really helpful.
>>
>> So, what should the guidelines be on when to create a new branch and  when
>> to merge into the main branch? Is this a good model? I think getting input
>> from HBase committers on this thread on what has worked and what hasn't
>> would be great so we don't reinvent the wheel.
>>
>> I think something like this could help ensure that  is main is stable and
>> always ready for prime time and make it easier for developers to discern
>> which are "beta" features that they can use at their discretion.
>>
>> Thanks,
>> --Jan
>>
>> On Sat, Sep 23, 2017 at 8:58 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>>
>> > Lars,
>> >
>> > This is a great list of guidelines. We should publish it on the
>> > contributing [0] section of the public site.
>> >
>> > -n
>> >
>> > [0]: http://phoenix.apache.org/contributing.html
>> >
>> > On Fri, Sep 22, 2017 at 4:12 PM lars hofhansl <la...@apache.org> wrote:
>> >
>> > > Any comments?Is this simply not a concern?
>> > > -- Lars
>> > >       From: lars hofhansl <la...@apache.org>
>> > >  To: Dev <de...@phoenix.apache.org>
>> > >  Sent: Wednesday, September 13, 2017 10:22 AM
>> > >  Subject: Fw: Phoenix code quality
>> > >
>> > > Hi all Phoenix developers,
>> > > here's a thread that I had started on the private PMC list, and we
>> agreed
>> > > to have this as a public discussion.
>> > >
>> > >
>> > > I'd like to solicit feedback on the 6 steps/recommendations below and
>> > > about we can ingrain those into the development process.
>> > > Comments, concerns, are - as always - welcome!
>> > > -- Lars
>> > > ----- Forwarded Message -----
>> > >  From: lars hofhansl <la...@apache.org>
>> > >  To: Private <pr...@phoenix.apache.org>
>> > >  Sent: Tuesday, September 5, 2017 9:59 PM
>> > >  Subject: Phoenix code quality
>> > >
>> > > Hi all,
>> > > I realize this might be a difficult topic, and let me prefix this by
>> > > saying that this is my opinion only.
>> > > Phoenix is coming to a point where big organizations are relying on
>> it.
>> > > At Salesforce we do billions of Phoenix queries per day... And we had
>> a
>> > > bunch of recent production issues - only in part caused by Phoenix.
>> > >
>> > > If there was a patch here and there that lacks quality, tests,
>> comments,
>> > > or proper documentation, then it's the fault of the person who created
>> > the
>> > > patch.
>> > > If, however, this happens with some frequency, then it a problem that
>> > > should involve PMC and committers who review and commit the patches in
>> > > question.
>> > > I'd like to suggest the following:
>> > > 1. Comments in the code should be considered when judging a patch for
>> its
>> > > merit. No need to go overboard, but there should be enough comments so
>> > that
>> > > someone new the code can get an idea about what this code is doing.
>> > > 2. Eyeball each patch for how it would scale. Will it all work on 1000
>> > > machines? With 1bn rows? With 1000 indexes? etc, etc.If it's not
>> obvious,
>> > > ask the creator of the patch. Agree on what the scaling goals should
>> > > be.(For anything that works only for a few million rows or on a dozen
>> > > machines, nobody in their right mind would accept the complexity of
>> > running
>> > > Phoenix - and HBase, HDFS, ZK, etc - folks would and should simply use
>> > > Postgres.)
>> > > 3. Check how a patch will behave under failure. Machines failures are
>> > > common. Regions may not reachable for a bit, etc. Are there good
>> > timeouts?
>> > > Everything should gracefully continue to work.
>> > >
>> > > 4. Double check that tests check for corner conditions.
>> > > 5. Err on the side of stability, rather than committing a patch as
>> beta.
>> > > If it's in the code, people _will_ use it.
>> > > 6. Are all config options properly explained and make sense? It's
>> better
>> > > to err on the side of fewer config options.
>> > >
>> > > 7. Probably more stuff...
>> > >
>> > > Again. Just MHO. Many of these things are already done. But I still
>> > > thought might be good to have a quick discussion around this.
>> > >
>> > > Comments?
>> > > Thanks.
>> > > -- Lars
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>>
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: Phoenix code quality

Posted by Andrew Purtell <ap...@apache.org>.

A model that has worked well for HBase is large feature development on a
separate branch, followed by a process to do a branch merge back to the
main project. The big upside is feature developers have near total freedom.
The big downside is the merge can take a while to review/approve and
rebasing the code to land the feature is a lot of last minute work and the
merge can temporarily destabilize both the feature and the main project. On
balance, the freedom to do independent feature development without
impacting the main project makes it worth it, IMHO.



On Mon, Sep 25, 2017 at 9:30 AM, Jan Fernando <jf...@salesforce.com>
wrote:

> Lars,
>
> I think these are really awesome guidelines. As Phoenix reaches a new phase
> of maturity and operational complexity (which is really exciting and
> important for the project IMHO), I think these things are becoming even
> more important.
>
> Re #5, I agree we need to err on the side of stability. I agree if features
> are there in main and documented people will use them. However, right now
> it's hard for users of the Phoenix to discern which features are mature
> versus which features may still need hardening at scale. I think it might
> help to actually come up with a more standardized process for developing
> "beta" or new high impact features. Perhaps we can follow what other
> projects like HBase do. For example: Should big changes be in their own
> branch? In some cases we have done things like this in Phoenix (e.g.
> Calcite) and in others we have not (e.g. transaction support).  I think
> consistency would be really helpful.
>
> So, what should the guidelines be on when to create a new branch and  when
> to merge into the main branch? Is this a good model? I think getting input
> from HBase committers on this thread on what has worked and what hasn't
> would be great so we don't reinvent the wheel.
>
> I think something like this could help ensure that  is main is stable and
> always ready for prime time and make it easier for developers to discern
> which are "beta" features that they can use at their discretion.
>
> Thanks,
> --Jan
>
> On Sat, Sep 23, 2017 at 8:58 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > Lars,
> >
> > This is a great list of guidelines. We should publish it on the
> > contributing [0] section of the public site.
> >
> > -n
> >
> > [0]: http://phoenix.apache.org/contributing.html
> >
> > On Fri, Sep 22, 2017 at 4:12 PM lars hofhansl <la...@apache.org> wrote:
> >
> > > Any comments?Is this simply not a concern?
> > > -- Lars
> > >       From: lars hofhansl <la...@apache.org>
> > >  To: Dev <de...@phoenix.apache.org>
> > >  Sent: Wednesday, September 13, 2017 10:22 AM
> > >  Subject: Fw: Phoenix code quality
> > >
> > > Hi all Phoenix developers,
> > > here's a thread that I had started on the private PMC list, and we
> agreed
> > > to have this as a public discussion.
> > >
> > >
> > > I'd like to solicit feedback on the 6 steps/recommendations below and
> > > about we can ingrain those into the development process.
> > > Comments, concerns, are - as always - welcome!
> > > -- Lars
> > > ----- Forwarded Message -----
> > >  From: lars hofhansl <la...@apache.org>
> > >  To: Private <pr...@phoenix.apache.org>
> > >  Sent: Tuesday, September 5, 2017 9:59 PM
> > >  Subject: Phoenix code quality
> > >
> > > Hi all,
> > > I realize this might be a difficult topic, and let me prefix this by
> > > saying that this is my opinion only.
> > > Phoenix is coming to a point where big organizations are relying on it.
> > > At Salesforce we do billions of Phoenix queries per day... And we had a
> > > bunch of recent production issues - only in part caused by Phoenix.
> > >
> > > If there was a patch here and there that lacks quality, tests,
> comments,
> > > or proper documentation, then it's the fault of the person who created
> > the
> > > patch.
> > > If, however, this happens with some frequency, then it a problem that
> > > should involve PMC and committers who review and commit the patches in
> > > question.
> > > I'd like to suggest the following:
> > > 1. Comments in the code should be considered when judging a patch for
> its
> > > merit. No need to go overboard, but there should be enough comments so
> > that
> > > someone new the code can get an idea about what this code is doing.
> > > 2. Eyeball each patch for how it would scale. Will it all work on 1000
> > > machines? With 1bn rows? With 1000 indexes? etc, etc.If it's not
> obvious,
> > > ask the creator of the patch. Agree on what the scaling goals should
> > > be.(For anything that works only for a few million rows or on a dozen
> > > machines, nobody in their right mind would accept the complexity of
> > running
> > > Phoenix - and HBase, HDFS, ZK, etc - folks would and should simply use
> > > Postgres.)
> > > 3. Check how a patch will behave under failure. Machines failures are
> > > common. Regions may not reachable for a bit, etc. Are there good
> > timeouts?
> > > Everything should gracefully continue to work.
> > >
> > > 4. Double check that tests check for corner conditions.
> > > 5. Err on the side of stability, rather than committing a patch as
> beta.
> > > If it's in the code, people _will_ use it.
> > > 6. Are all config options properly explained and make sense? It's
> better
> > > to err on the side of fewer config options.
> > >
> > > 7. Probably more stuff...
> > >
> > > Again. Just MHO. Many of these things are already done. But I still
> > > thought might be good to have a quick discussion around this.
> > >
> > > Comments?
> > > Thanks.
> > > -- Lars
> > >
> > >
> > >
> > >
> > >
> >
>



-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: Phoenix code quality

Posted by Jan Fernando <jf...@salesforce.com>.

Lars,

I think these are really awesome guidelines. As Phoenix reaches a new phase
of maturity and operational complexity (which is really exciting and
important for the project IMHO), I think these things are becoming even
more important.

Re #5, I agree we need to err on the side of stability. I agree if features
are there in main and documented people will use them. However, right now
it's hard for users of the Phoenix to discern which features are mature
versus which features may still need hardening at scale. I think it might
help to actually come up with a more standardized process for developing
"beta" or new high impact features. Perhaps we can follow what other
projects like HBase do. For example: Should big changes be in their own
branch? In some cases we have done things like this in Phoenix (e.g.
Calcite) and in others we have not (e.g. transaction support).  I think
consistency would be really helpful.

So, what should the guidelines be on when to create a new branch and  when
to merge into the main branch? Is this a good model? I think getting input
from HBase committers on this thread on what has worked and what hasn't
would be great so we don't reinvent the wheel.

I think something like this could help ensure that  is main is stable and
always ready for prime time and make it easier for developers to discern
which are "beta" features that they can use at their discretion.

Thanks,
--Jan

On Sat, Sep 23, 2017 at 8:58 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> Lars,
>
> This is a great list of guidelines. We should publish it on the
> contributing [0] section of the public site.
>
> -n
>
> [0]: http://phoenix.apache.org/contributing.html
>
> On Fri, Sep 22, 2017 at 4:12 PM lars hofhansl <la...@apache.org> wrote:
>
> > Any comments?Is this simply not a concern?
> > -- Lars
> >       From: lars hofhansl <la...@apache.org>
> >  To: Dev <de...@phoenix.apache.org>
> >  Sent: Wednesday, September 13, 2017 10:22 AM
> >  Subject: Fw: Phoenix code quality
> >
> > Hi all Phoenix developers,
> > here's a thread that I had started on the private PMC list, and we agreed
> > to have this as a public discussion.
> >
> >
> > I'd like to solicit feedback on the 6 steps/recommendations below and
> > about we can ingrain those into the development process.
> > Comments, concerns, are - as always - welcome!
> > -- Lars
> > ----- Forwarded Message -----
> >  From: lars hofhansl <la...@apache.org>
> >  To: Private <pr...@phoenix.apache.org>
> >  Sent: Tuesday, September 5, 2017 9:59 PM
> >  Subject: Phoenix code quality
> >
> > Hi all,
> > I realize this might be a difficult topic, and let me prefix this by
> > saying that this is my opinion only.
> > Phoenix is coming to a point where big organizations are relying on it.
> > At Salesforce we do billions of Phoenix queries per day... And we had a
> > bunch of recent production issues - only in part caused by Phoenix.
> >
> > If there was a patch here and there that lacks quality, tests, comments,
> > or proper documentation, then it's the fault of the person who created
> the
> > patch.
> > If, however, this happens with some frequency, then it a problem that
> > should involve PMC and committers who review and commit the patches in
> > question.
> > I'd like to suggest the following:
> > 1. Comments in the code should be considered when judging a patch for its
> > merit. No need to go overboard, but there should be enough comments so
> that
> > someone new the code can get an idea about what this code is doing.
> > 2. Eyeball each patch for how it would scale. Will it all work on 1000
> > machines? With 1bn rows? With 1000 indexes? etc, etc.If it's not obvious,
> > ask the creator of the patch. Agree on what the scaling goals should
> > be.(For anything that works only for a few million rows or on a dozen
> > machines, nobody in their right mind would accept the complexity of
> running
> > Phoenix - and HBase, HDFS, ZK, etc - folks would and should simply use
> > Postgres.)
> > 3. Check how a patch will behave under failure. Machines failures are
> > common. Regions may not reachable for a bit, etc. Are there good
> timeouts?
> > Everything should gracefully continue to work.
> >
> > 4. Double check that tests check for corner conditions.
> > 5. Err on the side of stability, rather than committing a patch as beta.
> > If it's in the code, people _will_ use it.
> > 6. Are all config options properly explained and make sense? It's better
> > to err on the side of fewer config options.
> >
> > 7. Probably more stuff...
> >
> > Again. Just MHO. Many of these things are already done. But I still
> > thought might be good to have a quick discussion around this.
> >
> > Comments?
> > Thanks.
> > -- Lars
> >
> >
> >
> >
> >
>

Re: Phoenix code quality

Posted by Nick Dimiduk <nd...@gmail.com>.

Lars,

This is a great list of guidelines. We should publish it on the
contributing [0] section of the public site.

-n

[0]: http://phoenix.apache.org/contributing.html

On Fri, Sep 22, 2017 at 4:12 PM lars hofhansl <la...@apache.org> wrote:

> Any comments?Is this simply not a concern?
> -- Lars
>       From: lars hofhansl <la...@apache.org>
>  To: Dev <de...@phoenix.apache.org>
>  Sent: Wednesday, September 13, 2017 10:22 AM
>  Subject: Fw: Phoenix code quality
>
> Hi all Phoenix developers,
> here's a thread that I had started on the private PMC list, and we agreed
> to have this as a public discussion.
>
>
> I'd like to solicit feedback on the 6 steps/recommendations below and
> about we can ingrain those into the development process.
> Comments, concerns, are - as always - welcome!
> -- Lars
> ----- Forwarded Message -----
>  From: lars hofhansl <la...@apache.org>
>  To: Private <pr...@phoenix.apache.org>
>  Sent: Tuesday, September 5, 2017 9:59 PM
>  Subject: Phoenix code quality
>
> Hi all,
> I realize this might be a difficult topic, and let me prefix this by
> saying that this is my opinion only.
> Phoenix is coming to a point where big organizations are relying on it.
> At Salesforce we do billions of Phoenix queries per day... And we had a
> bunch of recent production issues - only in part caused by Phoenix.
>
> If there was a patch here and there that lacks quality, tests, comments,
> or proper documentation, then it's the fault of the person who created the
> patch.
> If, however, this happens with some frequency, then it a problem that
> should involve PMC and committers who review and commit the patches in
> question.
> I'd like to suggest the following:
> 1. Comments in the code should be considered when judging a patch for its
> merit. No need to go overboard, but there should be enough comments so that
> someone new the code can get an idea about what this code is doing.
> 2. Eyeball each patch for how it would scale. Will it all work on 1000
> machines? With 1bn rows? With 1000 indexes? etc, etc.If it's not obvious,
> ask the creator of the patch. Agree on what the scaling goals should
> be.(For anything that works only for a few million rows or on a dozen
> machines, nobody in their right mind would accept the complexity of running
> Phoenix - and HBase, HDFS, ZK, etc - folks would and should simply use
> Postgres.)
> 3. Check how a patch will behave under failure. Machines failures are
> common. Regions may not reachable for a bit, etc. Are there good timeouts?
> Everything should gracefully continue to work.
>
> 4. Double check that tests check for corner conditions.
> 5. Err on the side of stability, rather than committing a patch as beta.
> If it's in the code, people _will_ use it.
> 6. Are all config options properly explained and make sense? It's better
> to err on the side of fewer config options.
>
> 7. Probably more stuff...
>
> Again. Just MHO. Many of these things are already done. But I still
> thought might be good to have a quick discussion around this.
>
> Comments?
> Thanks.
> -- Lars
>
>
>
>
>

Re: Phoenix code quality

Posted by Cody Marcel <cm...@salesforce.com>.

@Kevin Liew, that's a good idea. If you see places where this can be done,
file a JIRA for it as this is great way to make incremental improvements.
Well defined small things like this are also make for simple things for
noobs to pick up and contribute.

I like following the boy  scout rule. Leave the code better than you found.
Following this not only has the advantage of cleaning up code in general,
but it's targeting the improvements on the code paths that people use. It's
not time waste on random cleanup.

http://programmer.97things.oreilly.com/wiki/index.php/The_Boy_Scout_Rule




On Fri, Sep 22, 2017 at 6:25 PM, Kevin Liew <kl...@apache.org> wrote:

> Parts of the codebase can be quite intimidating due to the amount of state
> that needs to be tracked. In future patches, there could be an attempt to
> take cues from functional programming styles and decompose larger functions
> into "pure functions". This would make the project more accessible to new
> developers and make it easier to add test coverage through unit testing.
>
> Kevin
>
> On Fri, Sep 22, 2017 at 4:12 PM lars hofhansl <la...@apache.org> wrote:
>
> > Any comments?Is this simply not a concern?
> > -- Lars
> >       From: lars hofhansl <la...@apache.org>
> >  To: Dev <de...@phoenix.apache.org>
> >  Sent: Wednesday, September 13, 2017 10:22 AM
> >  Subject: Fw: Phoenix code quality
> >
> > Hi all Phoenix developers,
> > here's a thread that I had started on the private PMC list, and we agreed
> > to have this as a public discussion.
> >
> >
> > I'd like to solicit feedback on the 6 steps/recommendations below and
> > about we can ingrain those into the development process.
> > Comments, concerns, are - as always - welcome!
> > -- Lars
> > ----- Forwarded Message -----
> >  From: lars hofhansl <la...@apache.org>
> >  To: Private <pr...@phoenix.apache.org>
> >  Sent: Tuesday, September 5, 2017 9:59 PM
> >  Subject: Phoenix code quality
> >
> > Hi all,
> > I realize this might be a difficult topic, and let me prefix this by
> > saying that this is my opinion only.
> > Phoenix is coming to a point where big organizations are relying on it.
> > At Salesforce we do billions of Phoenix queries per day... And we had a
> > bunch of recent production issues - only in part caused by Phoenix.
> >
> > If there was a patch here and there that lacks quality, tests, comments,
> > or proper documentation, then it's the fault of the person who created
> the
> > patch.
> > If, however, this happens with some frequency, then it a problem that
> > should involve PMC and committers who review and commit the patches in
> > question.
> > I'd like to suggest the following:
> > 1. Comments in the code should be considered when judging a patch for its
> > merit. No need to go overboard, but there should be enough comments so
> that
> > someone new the code can get an idea about what this code is doing.
> > 2. Eyeball each patch for how it would scale. Will it all work on 1000
> > machines? With 1bn rows? With 1000 indexes? etc, etc.If it's not obvious,
> > ask the creator of the patch. Agree on what the scaling goals should
> > be.(For anything that works only for a few million rows or on a dozen
> > machines, nobody in their right mind would accept the complexity of
> running
> > Phoenix - and HBase, HDFS, ZK, etc - folks would and should simply use
> > Postgres.)
> > 3. Check how a patch will behave under failure. Machines failures are
> > common. Regions may not reachable for a bit, etc. Are there good
> timeouts?
> > Everything should gracefully continue to work.
> >
> > 4. Double check that tests check for corner conditions.
> > 5. Err on the side of stability, rather than committing a patch as beta.
> > If it's in the code, people _will_ use it.
> > 6. Are all config options properly explained and make sense? It's better
> > to err on the side of fewer config options.
> >
> > 7. Probably more stuff...
> >
> > Again. Just MHO. Many of these things are already done. But I still
> > thought might be good to have a quick discussion around this.
> >
> > Comments?
> > Thanks.
> > -- Lars
> >
> >
> >
> >
> >
>

Re: Phoenix code quality

Posted by Kevin Liew <kl...@apache.org>.

Parts of the codebase can be quite intimidating due to the amount of state
that needs to be tracked. In future patches, there could be an attempt to
take cues from functional programming styles and decompose larger functions
into "pure functions". This would make the project more accessible to new
developers and make it easier to add test coverage through unit testing.

Kevin

On Fri, Sep 22, 2017 at 4:12 PM lars hofhansl <la...@apache.org> wrote:

> Any comments?Is this simply not a concern?
> -- Lars
>       From: lars hofhansl <la...@apache.org>
>  To: Dev <de...@phoenix.apache.org>
>  Sent: Wednesday, September 13, 2017 10:22 AM
>  Subject: Fw: Phoenix code quality
>
> Hi all Phoenix developers,
> here's a thread that I had started on the private PMC list, and we agreed
> to have this as a public discussion.
>
>
> I'd like to solicit feedback on the 6 steps/recommendations below and
> about we can ingrain those into the development process.
> Comments, concerns, are - as always - welcome!
> -- Lars
> ----- Forwarded Message -----
>  From: lars hofhansl <la...@apache.org>
>  To: Private <pr...@phoenix.apache.org>
>  Sent: Tuesday, September 5, 2017 9:59 PM
>  Subject: Phoenix code quality
>
> Hi all,
> I realize this might be a difficult topic, and let me prefix this by
> saying that this is my opinion only.
> Phoenix is coming to a point where big organizations are relying on it.
> At Salesforce we do billions of Phoenix queries per day... And we had a
> bunch of recent production issues - only in part caused by Phoenix.
>
> If there was a patch here and there that lacks quality, tests, comments,
> or proper documentation, then it's the fault of the person who created the
> patch.
> If, however, this happens with some frequency, then it a problem that
> should involve PMC and committers who review and commit the patches in
> question.
> I'd like to suggest the following:
> 1. Comments in the code should be considered when judging a patch for its
> merit. No need to go overboard, but there should be enough comments so that
> someone new the code can get an idea about what this code is doing.
> 2. Eyeball each patch for how it would scale. Will it all work on 1000
> machines? With 1bn rows? With 1000 indexes? etc, etc.If it's not obvious,
> ask the creator of the patch. Agree on what the scaling goals should
> be.(For anything that works only for a few million rows or on a dozen
> machines, nobody in their right mind would accept the complexity of running
> Phoenix - and HBase, HDFS, ZK, etc - folks would and should simply use
> Postgres.)
> 3. Check how a patch will behave under failure. Machines failures are
> common. Regions may not reachable for a bit, etc. Are there good timeouts?
> Everything should gracefully continue to work.
>
> 4. Double check that tests check for corner conditions.
> 5. Err on the side of stability, rather than committing a patch as beta.
> If it's in the code, people _will_ use it.
> 6. Are all config options properly explained and make sense? It's better
> to err on the side of fewer config options.
>
> 7. Probably more stuff...
>
> Again. Just MHO. Many of these things are already done. But I still
> thought might be good to have a quick discussion around this.
>
> Comments?
> Thanks.
> -- Lars
>
>
>
>
>

Re: Phoenix code quality

Posted by lars hofhansl <la...@apache.org>.

Any comments?Is this simply not a concern?
-- Lars
      From: lars hofhansl <la...@apache.org>
 To: Dev <de...@phoenix.apache.org> 
 Sent: Wednesday, September 13, 2017 10:22 AM
 Subject: Fw: Phoenix code quality

Hi all Phoenix developers,
here's a thread that I had started on the private PMC list, and we agreed to have this as a public discussion.

I'd like to solicit feedback on the 6 steps/recommendations below and about we can ingrain those into the development process.
Comments, concerns, are - as always - welcome!
-- Lars  
----- Forwarded Message -----
 From: lars hofhansl <la...@apache.org>
 To: Private <pr...@phoenix.apache.org> 
 Sent: Tuesday, September 5, 2017 9:59 PM
 Subject: Phoenix code quality

Hi all,
I realize this might be a difficult topic, and let me prefix this by saying that this is my opinion only.
Phoenix is coming to a point where big organizations are relying on it.
At Salesforce we do billions of Phoenix queries per day... And we had a bunch of recent production issues - only in part caused by Phoenix.

If there was a patch here and there that lacks quality, tests, comments, or proper documentation, then it's the fault of the person who created the patch.
If, however, this happens with some frequency, then it a problem that should involve PMC and committers who review and commit the patches in question.
I'd like to suggest the following:
1. Comments in the code should be considered when judging a patch for its merit. No need to go overboard, but there should be enough comments so that someone new the code can get an idea about what this code is doing.
2. Eyeball each patch for how it would scale. Will it all work on 1000 machines? With 1bn rows? With 1000 indexes? etc, etc.If it's not obvious, ask the creator of the patch. Agree on what the scaling goals should be.(For anything that works only for a few million rows or on a dozen machines, nobody in their right mind would accept the complexity of running Phoenix - and HBase, HDFS, ZK, etc - folks would and should simply use Postgres.)
3. Check how a patch will behave under failure. Machines failures are common. Regions may not reachable for a bit, etc. Are there good timeouts? Everything should gracefully continue to work.

4. Double check that tests check for corner conditions.
5. Err on the side of stability, rather than committing a patch as beta. If it's in the code, people _will_ use it.
6. Are all config options properly explained and make sense? It's better to err on the side of fewer config options.

7. Probably more stuff...

Again. Just MHO. Many of these things are already done. But I still thought might be good to have a quick discussion around this.

Comments?
Thanks.
-- Lars