You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by wombatu-kun o_0 <wo...@gmail.com> on 2024/02/24 11:37:00 UTC

the best approach to contribute lots of safe non-breaking micro-fixes to clean up the whole code base

Hi!
I want to participate in hudi development more active and i need your advice for this to start.
At the moment i'm not really familiar with hudi to fix complex bugs, develop new outstanding features or make some global optimizations by myself. I take from jira only simple tickets, or create my own and solve them. 
While solving these easy tasks i face lots of little smelly things in the code (such as: typos, concatenation in logging, useless variables/fields/arguments/exceptions, missing annotations, raw type usage, etc) that i would like to fix immediately. But it is not welcome in community to mix in single PR realization of target jira-task and such refactoring.
Also i would like to understand the code and hudi functionality better to be able to make more serious contribution in the future. 
And while figuring out hudi codebase i want not just to get better understanding for myself, but also to do something useful for hudi project. 

So, my intention is to figure out with hudi globally starting from making micro-refactoring. And my plan is:
start from simple: attentively review all code base and methodically do lots of trivial cosmetic micro-fixes, that make code cleaner (examples of improvements are listed above);
during p.1 note for myself places in code (methods, classes, families) that needs more complex refactoring or should be optimized;
make refactoring/optimizations from p.2 and for each case create it's own jira-task and PR (or MINOR PR if there are not many changes);
.......
PROFIT!!!!!!
In case of p.1 i have some questions to ask you:
do we need such clean up?
if yes, what is the best approach to contribute lots of safe non-breaking micro-fixes to clean up the code? I mean dividing such changes by jira-tasks and PRs.
what is acceptable number of files changed by single MINOR PR?
If i make 1 PR per module, then even on middle-sized module there will be too many diffs, that reviewer won't like at all (and will never approve it). If I additionally divide amount of changes by multiple PRs there will be too many trivial PRs that produce extra load on ci.

Patiently waiting for your advice.

Re: the best approach to contribute lots of safe non-breaking micro-fixes to clean up the whole code base

Posted by Vinoth Chandar <vi...@apache.org>.
Thanks for starting this.

I think refactoring without clear end goals could cause a bunch of
thrashing. There are some active code restructuring efforts (see the
storage abstraction, file group reader etc).. Someone can speak if they
need some help there.

but that said, there are plenty of places. We have some of this filed away
under "code-quality" component. Could you file some concrete JIRAs there,
that we can then review/help prioritize?

For starters, removing build warnings can be a good, concrete starting
point.

Thanks
Vinoth

On Sat, Feb 24, 2024 at 3:39 AM wombatu-kun o_0 <wo...@gmail.com>
wrote:

> Hi!
> I want to participate in hudi development more active and i need your
> advice for this to start.
> At the moment i'm not really familiar with hudi to fix complex bugs,
> develop new outstanding features or make some global optimizations by
> myself. I take from jira only simple tickets, or create my own and solve
> them.
> While solving these easy tasks i face lots of little smelly things in the
> code (such as: typos, concatenation in logging, useless
> variables/fields/arguments/exceptions, missing annotations, raw type usage,
> etc) that i would like to fix immediately. But it is not welcome in
> community to mix in single PR realization of target jira-task and such
> refactoring.
> Also i would like to understand the code and hudi functionality better to
> be able to make more serious contribution in the future.
> And while figuring out hudi codebase i want not just to get better
> understanding for myself, but also to do something useful for hudi project.
>
> So, my intention is to figure out with hudi globally starting from making
> micro-refactoring. And my plan is:
> start from simple: attentively review all code base and methodically do
> lots of trivial cosmetic micro-fixes, that make code cleaner (examples of
> improvements are listed above);
> during p.1 note for myself places in code (methods, classes, families)
> that needs more complex refactoring or should be optimized;
> make refactoring/optimizations from p.2 and for each case create it's own
> jira-task and PR (or MINOR PR if there are not many changes);
> .......
> PROFIT!!!!!!
> In case of p.1 i have some questions to ask you:
> do we need such clean up?
> if yes, what is the best approach to contribute lots of safe non-breaking
> micro-fixes to clean up the code? I mean dividing such changes by
> jira-tasks and PRs.
> what is acceptable number of files changed by single MINOR PR?
> If i make 1 PR per module, then even on middle-sized module there will be
> too many diffs, that reviewer won't like at all (and will never approve
> it). If I additionally divide amount of changes by multiple PRs there will
> be too many trivial PRs that produce extra load on ci.
>
> Patiently waiting for your advice.
>