You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by vino yang <ya...@gmail.com> on 2020/01/21 08:22:20 UTC

[DISCUSS] Unify Hudi code cleanup and improvement

Hi folks,

Currently, the code quality of some Hudi module is not very well. As many
developers have seen, the Intellij IDEA has shown many intellisense about
cleanup and improvement. The community does not object to doing the cleanup
and improvement work and the work has been started via some direct "minor"
PRs by some volunteers. The current way is unorganized and hard to manage.
For tracking this work, I prefer to manage this work with the Jira issue.
We can create an umbrella issue. Then, split the work into several subtasks.

Since those "bad smell" lays anywhere in the whole project. It's difficult
to give a standard to split the subtasks. For example, some files have a
lot while some modules have few. So I suggest the standard would depend on
the volume of the changes. Before working, any subtask should find a
committer as a mentor who would judge and approve the scope is suitable.

What do you think?

Any comments and suggestions would be appreciated.

Best,
Vino

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by vino yang <ya...@gmail.com>.
Hi Vinoth,

Yes. In a sense, we need to define the boundary of "MINOR".

For this discussion, I want to raise two core issues:


   - File and manage the code cleanup(especially, based on the tips and
   warnings come from Intellij IDEA);
   - The contributors who want to do these kinds of code cleanup should
   find a mentor or responsible person before working;

I don't think this type of work is suitable for directly opening the
"MINOR" type of PR. Maybe we need to clearly state what kind of PR can be
defined as the "MINOR" type. In fact, this problem is difficult to quantify
(although, as you said, for example, changing the number of lines, etc.).
From a qualitative point of view, it may be no functionality changes. And
it is very easy that the reviewer can intuitively approve this change
without waiting for the results of Travis, such as typo-type PR and so on.

In short, I am big +1 to agree with the clear definition of the scope of
the “MINOR” type of PR. Maybe we can list suggestions in the contribution
guidelines for the contributors? When a reviewer is reviewing, he needs to
evaluate whether a PR is included in the "MINOR" scope? If it cannot be
included, you need to go to Jira to register an issue.

WDYT?

Best,
Vino

Vinoth Chandar <vi...@apache.org> 于2020年1月27日周一 上午3:25写道:

> Hi Vino,
>
> You raise a valid point on what "MINOR" PR should be. All JIRAs start out
> in "NEW" state and committers have to "Accept" the issue already (to force
> early conversations like this).
>
> May be we should draw some bounds on it like, "cannot be more than 50
> lines", "No functionality changes" .. etc? WDYT?  This seems to be the core
> of the issue.
>
> On Thu, Jan 23, 2020 at 4:17 PM vino yang <ya...@gmail.com> wrote:
>
> > Hi Vinoth,
> >
> > Thank you for your thoughts, I agree that focusing on some higher
> priority
> > work is more valuable.
> >
> > This discussion is to sort out and manage the work that the community is
> > already doing. There are currently some PRs working on this type of work,
> > such as PR[1][2][3][4]. The community has not given guidance on these
> > tasks. I think it's not very appropriate to open a "MINOR" PR directly.
> So,
> > I want to hear from the community and how to manage them more
> effectively.
> > The discussion does not encourage to give a higher priority to such work.
> >
> > We haven't stopped this kind of work, so we should provide effective
> > guidance and organization so that it doesn't look disorganized. WYDT?
> >
> > Best,
> > Vino
> >
> > [1]: https://github.com/apache/incubator-hudi/pull/1237
> > [2]: https://github.com/apache/incubator-hudi/pull/1139
> > [3]: https://github.com/apache/incubator-hudi/pull/1137
> > [4]: https://github.com/apache/incubator-hudi/pull/1136
> >
> > Vinoth Chandar <vi...@apache.org> 于2020年1月23日周四 下午1:20写道:
> >
> > > Hi,
> > >
> > > Thanks everyone for sharing your views!
> > >
> > > Some of this conversation is starting to feel like boiling the ocean. I
> > > believe in refactoring with purpose and discussing class-by-class or
> > > module-by-module does not make sense to me. Can we first list down what
> > we
> > > want to achieve? So far, I have only heard fixing IDE/IntelliJ
> warnings.
> > > Also instead of focussing on new work, how about looking at the pending
> > > JIRAs under "Testing" "Code Cleanup" components first and see if those
> > are
> > > worth tackling.
> > >
> > > We went down this path for code formatting and today we still have
> > > inconsistencies. Looking back, I feel we should have clearly defined
> end
> > > goals for the cleanups and we can then rank them based on ROI.
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Wed, Jan 22, 2020 at 7:05 PM vino yang <ya...@gmail.com>
> wrote:
> > >
> > > > Hi Shiyan and Bhavani:
> > > >
> > > > Thanks for sharing your thoughts.
> > > >
> > > > As I originally stated. The advantage of using modules as a unit to
> > split
> > > > work is that the decomposition is clear, but the disadvantage is that
> > the
> > > > volume of changes may be huge, which brings huge risks (considering
> > that
> > > > Hudi's test coverage is still not very high) and the workload of
> > review.
> > > > The advantage of splitting by class is that the volume of changes is
> > > small
> > > > and the review is more convenient, but the disadvantages are too many
> > > tasks
> > > > and high maintenance costs.
> > > >
> > > >
> > > > *In addition, we need to define the boundaries of the "code cleanup"
> I
> > > > expressed in this topic: it is limited to the smart tips shown by
> > > Intellij
> > > > IDEA. If the boundaries are too wide, then this discussion will lose
> > > > control.*
> > > > I agree with Bhavani that we don't take it as the actual goal. But we
> > are
> > > > not opposed to the community to help improve the quality of the code
> > > > (basically, these tips given by the IDE are more reasonable).
> > > >
> > > >
> > > > So, I still give my thoughts: We manage this work with Jira. Before
> we
> > > > start working, we need to find a committer as a mentor. The mentor
> must
> > > > decide whether the scale of the subtasks is reasonable and whether
> > > > additional unit tests need to be added to verify the changes. And the
> > > > mentor should be responsible for merged changes.
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Vino
> > > >
> > > > Bhavani Sudha <bh...@gmail.com> 于2020年1月22日周三 下午2:22写道:
> > > >
> > > > > Hi @vinoyang thanks for bringing this to discussion. I feel it
> would
> > be
> > > > > less disruptive to clean up code as part of individual classes
> being
> > > > > touched for a specific goal rather than code cleanup being the
> actual
> > > > goal.
> > > > > This would narrow the touch point and ensure test coverage (both
> unit
> > > and
> > > > > integration tests)  catches any accidental/unintentional changes.
> > Also
> > > it
> > > > > would give chance to change any documentation quoting/referencing
> > that
> > > > > code. Wanted to share my personal opinion.
> > > > >
> > > > > Thanks,
> > > > > Sudha
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <
> > > xu.shiyan.raymond@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > The clean-up work can actually be split by modules.
> > > > > >
> > > > > > Though it is generally a good practice to follow, my concern is
> the
> > > > > > clean-up is likely to cause conflicts with some on-going changes.
> > If
> > > I
> > > > > may
> > > > > > suggest, the dedicated clean-up tasks should avoid
> > > > > > - modules that are undergoing multiple feature changes/PRs
> > > > > > - modules that are planned to have major refactoring due to
> design
> > > > > changes
> > > > > > (since clean-up can be done altogether during refactoring)
> > > > > >
> > > > > > On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <
> vinoth@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Not sure if I fully agree with sweeping statements being made.
> > But,
> > > > +1
> > > > > > for
> > > > > > > structuring this work via Jiras and having some committer
> > “accept”
> > > > the
> > > > > > > issue first.  Some of these tend to be subjective and we do
> need
> > to
> > > > > make
> > > > > > > different tradeoffs.
> > > > > > >
> > > > > > > On Tue, Jan 21, 2020 at 1:28 AM vino yang <
> yanghua1127@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Pratyaksh,
> > > > > > > >
> > > > > > > > Thanks for your thought.
> > > > > > > >
> > > > > > > > Let's listen to others' comments. If there is no objection,
> we
> > > will
> > > > > > > follow
> > > > > > > > this way.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Vino
> > > > > > > >
> > > > > > > >
> > > > > > > > Pratyaksh Sharma <pr...@gmail.com> 于2020年1月21日周二
> > 下午4:56写道:
> > > > > > > >
> > > > > > > > > Hi Vino,
> > > > > > > > >
> > > > > > > > > Big +1 for this initiative. I have done this code cleanup
> for
> > > > test
> > > > > > > > classes
> > > > > > > > > in the past and strongly feel there is a need to do the
> same
> > at
> > > > > other
> > > > > > > > > places as well. I would definitely like to volunteer for
> > this.
> > > > > > > > >
> > > > > > > > > On Tue, Jan 21, 2020 at 1:52 PM vino yang <
> > > yanghua1127@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi folks,
> > > > > > > > > >
> > > > > > > > > > Currently, the code quality of some Hudi module is not
> very
> > > > well.
> > > > > > As
> > > > > > > > many
> > > > > > > > > > developers have seen, the Intellij IDEA has shown many
> > > > > intellisense
> > > > > > > > about
> > > > > > > > > > cleanup and improvement. The community does not object to
> > > doing
> > > > > the
> > > > > > > > > cleanup
> > > > > > > > > > and improvement work and the work has been started via
> some
> > > > > direct
> > > > > > > > > "minor"
> > > > > > > > > > PRs by some volunteers. The current way is unorganized
> and
> > > hard
> > > > > to
> > > > > > > > > manage.
> > > > > > > > > > For tracking this work, I prefer to manage this work with
> > the
> > > > > Jira
> > > > > > > > issue.
> > > > > > > > > > We can create an umbrella issue. Then, split the work
> into
> > > > > several
> > > > > > > > > > subtasks.
> > > > > > > > > >
> > > > > > > > > > Since those "bad smell" lays anywhere in the whole
> project.
> > > > It's
> > > > > > > > > difficult
> > > > > > > > > > to give a standard to split the subtasks. For example,
> some
> > > > files
> > > > > > > have
> > > > > > > > a
> > > > > > > > > > lot while some modules have few. So I suggest the
> standard
> > > > would
> > > > > > > depend
> > > > > > > > > on
> > > > > > > > > > the volume of the changes. Before working, any subtask
> > should
> > > > > find
> > > > > > a
> > > > > > > > > > committer as a mentor who would judge and approve the
> scope
> > > is
> > > > > > > > suitable.
> > > > > > > > > >
> > > > > > > > > > What do you think?
> > > > > > > > > >
> > > > > > > > > > Any comments and suggestions would be appreciated.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Vino
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by vino yang <ya...@gmail.com>.
Hi all,

The PR has been submitted.[1]

Welcome to help to review it.

[1]: https://github.com/apache/incubator-hudi/pull/1311

vino yang <ya...@gmail.com> 于2020年2月6日周四 上午10:14写道:

> Hi all,
>
> Have filed a Jira issue: https://issues.apache.org/jira/browse/HUDI-602
>
> Best,
> Vino
>
> vino yang <ya...@gmail.com> 于2020年2月4日周二 下午9:39写道:
>
>> Hi Vinoth,
>>
>> Yes, I'd like to give some suggestions about the "MINOR" PR.
>> Will file a Jira issue to track this work.
>>
>> Best,
>> Vino
>>
>> Vinoth Chandar <vi...@apache.org> 于2020年2月4日周二 上午8:16写道:
>>
>>> +1 to vinoyang's suggestions.
>>>
>>> @Vino Yang <vi...@apache.org>  , do you want to formalize this and
>>> update our contributing page?
>>>
>>> On Thu, Jan 30, 2020 at 3:09 AM hmatu <hm...@foxmail.com> wrote:
>>>
>>> > Hi,
>>> >
>>> >
>>> >
>>> > I think these "MINOR" issues are important, a good&nbsp;project
>>> requires
>>> > not only functions, but also good coding style and  habits.
>>> >
>>> >
>>> > Best
>>> > Hmatu
>>> >
>>> >
>>> >
>>> >
>>> > ------------------&nbsp;Original&nbsp;------------------
>>> > From:&nbsp;"Vinoth Chandar"<vinoth@apache.org&gt;;
>>> > Date:&nbsp;Mon, Jan 27, 2020 03:25 AM
>>> > To:&nbsp;"dev"<dev@hudi.apache.org&gt;;
>>> >
>>> > Subject:&nbsp;Re: [DISCUSS] Unify Hudi code cleanup and improvement
>>> >
>>> >
>>> >
>>> > Hi Vino,
>>> >
>>> > You raise a valid point on what "MINOR" PR should be. All JIRAs start
>>> out
>>> > in "NEW" state and committers have to "Accept" the issue already (to
>>> force
>>> > early conversations like this).
>>> >
>>> > May be we should draw some bounds on it like, "cannot be more than 50
>>> > lines", "No functionality changes" .. etc? WDYT?&nbsp; This seems to be
>>> > the core
>>> > of the issue.
>>> >
>>> > On Thu, Jan 23, 2020 at 4:17 PM vino yang <yanghua1127@gmail.com&gt;
>>> > wrote:
>>> >
>>> > &gt; Hi Vinoth,
>>> > &gt;
>>> > &gt; Thank you for your thoughts, I agree that focusing on some higher
>>> > priority
>>> > &gt; work is more valuable.
>>> > &gt;
>>> > &gt; This discussion is to sort out and manage the work that the
>>> community
>>> > is
>>> > &gt; already doing. There are currently some PRs working on this type
>>> of
>>> > work,
>>> > &gt; such as PR[1][2][3][4]. The community has not given guidance on
>>> these
>>> > &gt; tasks. I think it's not very appropriate to open a "MINOR" PR
>>> > directly. So,
>>> > &gt; I want to hear from the community and how to manage them more
>>> > effectively.
>>> > &gt; The discussion does not encourage to give a higher priority to
>>> such
>>> > work.
>>> > &gt;
>>> > &gt; We haven't stopped this kind of work, so we should provide
>>> effective
>>> > &gt; guidance and organization so that it doesn't look disorganized.
>>> WYDT?
>>> > &gt;
>>> > &gt; Best,
>>> > &gt; Vino
>>> > &gt;
>>> > &gt; [1]: https://github.com/apache/incubator-hudi/pull/1237
>>> > &gt; [2]: https://github.com/apache/incubator-hudi/pull/1139
>>> > &gt; [3]: https://github.com/apache/incubator-hudi/pull/1137
>>> > &gt; [4]: https://github.com/apache/incubator-hudi/pull/1136
>>> > &gt;
>>> > &gt; Vinoth Chandar <vinoth@apache.org&gt; 于2020年1月23日周四 下午1:20写道:
>>> > &gt;
>>> > &gt; &gt; Hi,
>>> > &gt; &gt;
>>> > &gt; &gt; Thanks everyone for sharing your views!
>>> > &gt; &gt;
>>> > &gt; &gt; Some of this conversation is starting to feel like boiling
>>> the
>>> > ocean. I
>>> > &gt; &gt; believe in refactoring with purpose and discussing
>>> > class-by-class or
>>> > &gt; &gt; module-by-module does not make sense to me. Can we first list
>>> > down what
>>> > &gt; we
>>> > &gt; &gt; want to achieve? So far, I have only heard fixing
>>> IDE/IntelliJ
>>> > warnings.
>>> > &gt; &gt; Also instead of focussing on new work, how about looking at
>>> the
>>> > pending
>>> > &gt; &gt; JIRAs under "Testing" "Code Cleanup" components first and
>>> see if
>>> > those
>>> > &gt; are
>>> > &gt; &gt; worth tackling.
>>> > &gt; &gt;
>>> > &gt; &gt; We went down this path for code formatting and today we still
>>> > have
>>> > &gt; &gt; inconsistencies. Looking back, I feel we should have clearly
>>> > defined end
>>> > &gt; &gt; goals for the cleanups and we can then rank them based on
>>> ROI.
>>> > &gt; &gt;
>>> > &gt; &gt; Thanks
>>> > &gt; &gt; Vinoth
>>> > &gt; &gt;
>>> > &gt; &gt; On Wed, Jan 22, 2020 at 7:05 PM vino yang <
>>> yanghua1127@gmail.com&gt;
>>> > wrote:
>>> > &gt; &gt;
>>> > &gt; &gt; &gt; Hi Shiyan and Bhavani:
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; Thanks for sharing your thoughts.
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; As I originally stated. The advantage of using modules
>>> as a
>>> > unit to
>>> > &gt; split
>>> > &gt; &gt; &gt; work is that the decomposition is clear, but the
>>> > disadvantage is that
>>> > &gt; the
>>> > &gt; &gt; &gt; volume of changes may be huge, which brings huge risks
>>> > (considering
>>> > &gt; that
>>> > &gt; &gt; &gt; Hudi's test coverage is still not very high) and the
>>> > workload of
>>> > &gt; review.
>>> > &gt; &gt; &gt; The advantage of splitting by class is that the volume
>>> of
>>> > changes is
>>> > &gt; &gt; small
>>> > &gt; &gt; &gt; and the review is more convenient, but the disadvantages
>>> > are too many
>>> > &gt; &gt; tasks
>>> > &gt; &gt; &gt; and high maintenance costs.
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; *In addition, we need to define the boundaries of the
>>> "code
>>> > cleanup" I
>>> > &gt; &gt; &gt; expressed in this topic: it is limited to the smart tips
>>> > shown by
>>> > &gt; &gt; Intellij
>>> > &gt; &gt; &gt; IDEA. If the boundaries are too wide, then this
>>> discussion
>>> > will lose
>>> > &gt; &gt; &gt; control.*
>>> > &gt; &gt; &gt; I agree with Bhavani that we don't take it as the actual
>>> > goal. But we
>>> > &gt; are
>>> > &gt; &gt; &gt; not opposed to the community to help improve the
>>> quality of
>>> > the code
>>> > &gt; &gt; &gt; (basically, these tips given by the IDE are more
>>> > reasonable).
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; So, I still give my thoughts: We manage this work with
>>> > Jira. Before we
>>> > &gt; &gt; &gt; start working, we need to find a committer as a mentor.
>>> The
>>> > mentor must
>>> > &gt; &gt; &gt; decide whether the scale of the subtasks is reasonable
>>> and
>>> > whether
>>> > &gt; &gt; &gt; additional unit tests need to be added to verify the
>>> > changes. And the
>>> > &gt; &gt; &gt; mentor should be responsible for merged changes.
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; What do you think?
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; Best,
>>> > &gt; &gt; &gt; Vino
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; Bhavani Sudha <bhavanisudhas@gmail.com&gt;
>>> 于2020年1月22日周三
>>> > 下午2:22写道:
>>> > &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; Hi @vinoyang thanks for bringing this to
>>> discussion. I
>>> > feel it would
>>> > &gt; be
>>> > &gt; &gt; &gt; &gt; less disruptive to clean up code as part of
>>> individual
>>> > classes being
>>> > &gt; &gt; &gt; &gt; touched for a specific goal rather than code
>>> cleanup
>>> > being the actual
>>> > &gt; &gt; &gt; goal.
>>> > &gt; &gt; &gt; &gt; This would narrow the touch point and ensure test
>>> > coverage (both unit
>>> > &gt; &gt; and
>>> > &gt; &gt; &gt; &gt; integration tests)&nbsp; catches any
>>> > accidental/unintentional changes.
>>> > &gt; Also
>>> > &gt; &gt; it
>>> > &gt; &gt; &gt; &gt; would give chance to change any documentation
>>> > quoting/referencing
>>> > &gt; that
>>> > &gt; &gt; &gt; &gt; code. Wanted to share my personal opinion.
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; Thanks,
>>> > &gt; &gt; &gt; &gt; Sudha
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <
>>> > &gt; &gt; xu.shiyan.raymond@gmail.com&gt;
>>> > &gt; &gt; &gt; &gt; wrote:
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; The clean-up work can actually be split by
>>> > modules.
>>> > &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; Though it is generally a good practice to
>>> follow,
>>> > my concern is the
>>> > &gt; &gt; &gt; &gt; &gt; clean-up is likely to cause conflicts with
>>> some
>>> > on-going changes.
>>> > &gt; If
>>> > &gt; &gt; I
>>> > &gt; &gt; &gt; &gt; may
>>> > &gt; &gt; &gt; &gt; &gt; suggest, the dedicated clean-up tasks should
>>> avoid
>>> > &gt; &gt; &gt; &gt; &gt; - modules that are undergoing multiple feature
>>> > changes/PRs
>>> > &gt; &gt; &gt; &gt; &gt; - modules that are planned to have major
>>> > refactoring due to design
>>> > &gt; &gt; &gt; &gt; changes
>>> > &gt; &gt; &gt; &gt; &gt; (since clean-up can be done altogether during
>>> > refactoring)
>>> > &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 4:17 AM Vinoth
>>> Chandar <
>>> > vinoth@apache.org&gt;
>>> > &gt; &gt; &gt; &gt; wrote:
>>> > &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; Not sure if I fully agree with sweeping
>>> > statements being made.
>>> > &gt; But,
>>> > &gt; &gt; &gt; +1
>>> > &gt; &gt; &gt; &gt; &gt; for
>>> > &gt; &gt; &gt; &gt; &gt; &gt; structuring this work via Jiras and
>>> having
>>> > some committer
>>> > &gt; “accept”
>>> > &gt; &gt; &gt; the
>>> > &gt; &gt; &gt; &gt; &gt; &gt; issue first.&nbsp; Some of these tend to
>>> be
>>> > subjective and we do need
>>> > &gt; to
>>> > &gt; &gt; &gt; &gt; make
>>> > &gt; &gt; &gt; &gt; &gt; &gt; different tradeoffs.
>>> > &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:28 AM vino
>>> yang <
>>> > yanghua1127@gmail.com
>>> > &gt; &gt;
>>> > &gt; &gt; &gt; &gt; wrote:
>>> > &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Pratyaksh,
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Thanks for your thought.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Let's listen to others' comments. If
>>> > there is no objection, we
>>> > &gt; &gt; will
>>> > &gt; &gt; &gt; &gt; &gt; &gt; follow
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; this way.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Pratyaksh Sharma <
>>> pratyaksh13@gmail.com&gt;
>>> > 于2020年1月21日周二
>>> > &gt; 下午4:56写道:
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Vino,
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Big +1 for this initiative. I
>>> have
>>> > done this code cleanup for
>>> > &gt; &gt; &gt; test
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; classes
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; in the past and strongly feel
>>> > there is a need to do the same
>>> > &gt; at
>>> > &gt; &gt; &gt; &gt; other
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; places as well. I would
>>> definitely
>>> > like to volunteer for
>>> > &gt; this.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:52 PM
>>> > vino yang <
>>> > &gt; &gt; yanghua1127@gmail.com
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; wrote:
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi folks,
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Currently, the code
>>> quality
>>> > of some Hudi module is not very
>>> > &gt; &gt; &gt; well.
>>> > &gt; &gt; &gt; &gt; &gt; As
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; many
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; developers have seen, the
>>> > Intellij IDEA has shown many
>>> > &gt; &gt; &gt; &gt; intellisense
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; about
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup and improvement.
>>> The
>>> > community does not object to
>>> > &gt; &gt; doing
>>> > &gt; &gt; &gt; &gt; the
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; and improvement work and
>>> the
>>> > work has been started via some
>>> > &gt; &gt; &gt; &gt; direct
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; "minor"
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; PRs by some volunteers.
>>> The
>>> > current way is unorganized and
>>> > &gt; &gt; hard
>>> > &gt; &gt; &gt; &gt; to
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; manage.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; For tracking this work, I
>>> > prefer to manage this work with
>>> > &gt; the
>>> > &gt; &gt; &gt; &gt; Jira
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; issue.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; We can create an umbrella
>>> > issue. Then, split the work into
>>> > &gt; &gt; &gt; &gt; several
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; subtasks.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Since those "bad smell"
>>> lays
>>> > anywhere in the whole project.
>>> > &gt; &gt; &gt; It's
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; difficult
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; to give a standard to
>>> split
>>> > the subtasks. For example, some
>>> > &gt; &gt; &gt; files
>>> > &gt; &gt; &gt; &gt; &gt; &gt; have
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; a
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; lot while some modules
>>> have
>>> > few. So I suggest the standard
>>> > &gt; &gt; &gt; would
>>> > &gt; &gt; &gt; &gt; &gt; &gt; depend
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; on
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; the volume of the changes.
>>> > Before working, any subtask
>>> > &gt; should
>>> > &gt; &gt; &gt; &gt; find
>>> > &gt; &gt; &gt; &gt; &gt; a
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; committer as a mentor who
>>> > would judge and approve the scope
>>> > &gt; &gt; is
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; suitable.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; What do you think?
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Any comments and
>>> suggestions
>>> > would be appreciated.
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt; &gt;
>>> > &gt; &gt; &gt;
>>> > &gt; &gt;
>>> > &gt;
>>>
>>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by vino yang <ya...@gmail.com>.
Hi all,

Have filed a Jira issue: https://issues.apache.org/jira/browse/HUDI-602

Best,
Vino

vino yang <ya...@gmail.com> 于2020年2月4日周二 下午9:39写道:

> Hi Vinoth,
>
> Yes, I'd like to give some suggestions about the "MINOR" PR.
> Will file a Jira issue to track this work.
>
> Best,
> Vino
>
> Vinoth Chandar <vi...@apache.org> 于2020年2月4日周二 上午8:16写道:
>
>> +1 to vinoyang's suggestions.
>>
>> @Vino Yang <vi...@apache.org>  , do you want to formalize this and
>> update our contributing page?
>>
>> On Thu, Jan 30, 2020 at 3:09 AM hmatu <hm...@foxmail.com> wrote:
>>
>> > Hi,
>> >
>> >
>> >
>> > I think these "MINOR" issues are important, a good&nbsp;project requires
>> > not only functions, but also good coding style and  habits.
>> >
>> >
>> > Best
>> > Hmatu
>> >
>> >
>> >
>> >
>> > ------------------&nbsp;Original&nbsp;------------------
>> > From:&nbsp;"Vinoth Chandar"<vinoth@apache.org&gt;;
>> > Date:&nbsp;Mon, Jan 27, 2020 03:25 AM
>> > To:&nbsp;"dev"<dev@hudi.apache.org&gt;;
>> >
>> > Subject:&nbsp;Re: [DISCUSS] Unify Hudi code cleanup and improvement
>> >
>> >
>> >
>> > Hi Vino,
>> >
>> > You raise a valid point on what "MINOR" PR should be. All JIRAs start
>> out
>> > in "NEW" state and committers have to "Accept" the issue already (to
>> force
>> > early conversations like this).
>> >
>> > May be we should draw some bounds on it like, "cannot be more than 50
>> > lines", "No functionality changes" .. etc? WDYT?&nbsp; This seems to be
>> > the core
>> > of the issue.
>> >
>> > On Thu, Jan 23, 2020 at 4:17 PM vino yang <yanghua1127@gmail.com&gt;
>> > wrote:
>> >
>> > &gt; Hi Vinoth,
>> > &gt;
>> > &gt; Thank you for your thoughts, I agree that focusing on some higher
>> > priority
>> > &gt; work is more valuable.
>> > &gt;
>> > &gt; This discussion is to sort out and manage the work that the
>> community
>> > is
>> > &gt; already doing. There are currently some PRs working on this type of
>> > work,
>> > &gt; such as PR[1][2][3][4]. The community has not given guidance on
>> these
>> > &gt; tasks. I think it's not very appropriate to open a "MINOR" PR
>> > directly. So,
>> > &gt; I want to hear from the community and how to manage them more
>> > effectively.
>> > &gt; The discussion does not encourage to give a higher priority to such
>> > work.
>> > &gt;
>> > &gt; We haven't stopped this kind of work, so we should provide
>> effective
>> > &gt; guidance and organization so that it doesn't look disorganized.
>> WYDT?
>> > &gt;
>> > &gt; Best,
>> > &gt; Vino
>> > &gt;
>> > &gt; [1]: https://github.com/apache/incubator-hudi/pull/1237
>> > &gt; [2]: https://github.com/apache/incubator-hudi/pull/1139
>> > &gt; [3]: https://github.com/apache/incubator-hudi/pull/1137
>> > &gt; [4]: https://github.com/apache/incubator-hudi/pull/1136
>> > &gt;
>> > &gt; Vinoth Chandar <vinoth@apache.org&gt; 于2020年1月23日周四 下午1:20写道:
>> > &gt;
>> > &gt; &gt; Hi,
>> > &gt; &gt;
>> > &gt; &gt; Thanks everyone for sharing your views!
>> > &gt; &gt;
>> > &gt; &gt; Some of this conversation is starting to feel like boiling the
>> > ocean. I
>> > &gt; &gt; believe in refactoring with purpose and discussing
>> > class-by-class or
>> > &gt; &gt; module-by-module does not make sense to me. Can we first list
>> > down what
>> > &gt; we
>> > &gt; &gt; want to achieve? So far, I have only heard fixing IDE/IntelliJ
>> > warnings.
>> > &gt; &gt; Also instead of focussing on new work, how about looking at
>> the
>> > pending
>> > &gt; &gt; JIRAs under "Testing" "Code Cleanup" components first and see
>> if
>> > those
>> > &gt; are
>> > &gt; &gt; worth tackling.
>> > &gt; &gt;
>> > &gt; &gt; We went down this path for code formatting and today we still
>> > have
>> > &gt; &gt; inconsistencies. Looking back, I feel we should have clearly
>> > defined end
>> > &gt; &gt; goals for the cleanups and we can then rank them based on ROI.
>> > &gt; &gt;
>> > &gt; &gt; Thanks
>> > &gt; &gt; Vinoth
>> > &gt; &gt;
>> > &gt; &gt; On Wed, Jan 22, 2020 at 7:05 PM vino yang <
>> yanghua1127@gmail.com&gt;
>> > wrote:
>> > &gt; &gt;
>> > &gt; &gt; &gt; Hi Shiyan and Bhavani:
>> > &gt; &gt; &gt;
>> > &gt; &gt; &gt; Thanks for sharing your thoughts.
>> > &gt; &gt; &gt;
>> > &gt; &gt; &gt; As I originally stated. The advantage of using modules
>> as a
>> > unit to
>> > &gt; split
>> > &gt; &gt; &gt; work is that the decomposition is clear, but the
>> > disadvantage is that
>> > &gt; the
>> > &gt; &gt; &gt; volume of changes may be huge, which brings huge risks
>> > (considering
>> > &gt; that
>> > &gt; &gt; &gt; Hudi's test coverage is still not very high) and the
>> > workload of
>> > &gt; review.
>> > &gt; &gt; &gt; The advantage of splitting by class is that the volume of
>> > changes is
>> > &gt; &gt; small
>> > &gt; &gt; &gt; and the review is more convenient, but the disadvantages
>> > are too many
>> > &gt; &gt; tasks
>> > &gt; &gt; &gt; and high maintenance costs.
>> > &gt; &gt; &gt;
>> > &gt; &gt; &gt;
>> > &gt; &gt; &gt; *In addition, we need to define the boundaries of the
>> "code
>> > cleanup" I
>> > &gt; &gt; &gt; expressed in this topic: it is limited to the smart tips
>> > shown by
>> > &gt; &gt; Intellij
>> > &gt; &gt; &gt; IDEA. If the boundaries are too wide, then this
>> discussion
>> > will lose
>> > &gt; &gt; &gt; control.*
>> > &gt; &gt; &gt; I agree with Bhavani that we don't take it as the actual
>> > goal. But we
>> > &gt; are
>> > &gt; &gt; &gt; not opposed to the community to help improve the quality
>> of
>> > the code
>> > &gt; &gt; &gt; (basically, these tips given by the IDE are more
>> > reasonable).
>> > &gt; &gt; &gt;
>> > &gt; &gt; &gt;
>> > &gt; &gt; &gt; So, I still give my thoughts: We manage this work with
>> > Jira. Before we
>> > &gt; &gt; &gt; start working, we need to find a committer as a mentor.
>> The
>> > mentor must
>> > &gt; &gt; &gt; decide whether the scale of the subtasks is reasonable
>> and
>> > whether
>> > &gt; &gt; &gt; additional unit tests need to be added to verify the
>> > changes. And the
>> > &gt; &gt; &gt; mentor should be responsible for merged changes.
>> > &gt; &gt; &gt;
>> > &gt; &gt; &gt; What do you think?
>> > &gt; &gt; &gt;
>> > &gt; &gt; &gt; Best,
>> > &gt; &gt; &gt; Vino
>> > &gt; &gt; &gt;
>> > &gt; &gt; &gt; Bhavani Sudha <bhavanisudhas@gmail.com&gt; 于2020年1月22日周三
>> > 下午2:22写道:
>> > &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; Hi @vinoyang thanks for bringing this to
>> discussion. I
>> > feel it would
>> > &gt; be
>> > &gt; &gt; &gt; &gt; less disruptive to clean up code as part of
>> individual
>> > classes being
>> > &gt; &gt; &gt; &gt; touched for a specific goal rather than code cleanup
>> > being the actual
>> > &gt; &gt; &gt; goal.
>> > &gt; &gt; &gt; &gt; This would narrow the touch point and ensure test
>> > coverage (both unit
>> > &gt; &gt; and
>> > &gt; &gt; &gt; &gt; integration tests)&nbsp; catches any
>> > accidental/unintentional changes.
>> > &gt; Also
>> > &gt; &gt; it
>> > &gt; &gt; &gt; &gt; would give chance to change any documentation
>> > quoting/referencing
>> > &gt; that
>> > &gt; &gt; &gt; &gt; code. Wanted to share my personal opinion.
>> > &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; Thanks,
>> > &gt; &gt; &gt; &gt; Sudha
>> > &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <
>> > &gt; &gt; xu.shiyan.raymond@gmail.com&gt;
>> > &gt; &gt; &gt; &gt; wrote:
>> > &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; The clean-up work can actually be split by
>> > modules.
>> > &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; Though it is generally a good practice to
>> follow,
>> > my concern is the
>> > &gt; &gt; &gt; &gt; &gt; clean-up is likely to cause conflicts with some
>> > on-going changes.
>> > &gt; If
>> > &gt; &gt; I
>> > &gt; &gt; &gt; &gt; may
>> > &gt; &gt; &gt; &gt; &gt; suggest, the dedicated clean-up tasks should
>> avoid
>> > &gt; &gt; &gt; &gt; &gt; - modules that are undergoing multiple feature
>> > changes/PRs
>> > &gt; &gt; &gt; &gt; &gt; - modules that are planned to have major
>> > refactoring due to design
>> > &gt; &gt; &gt; &gt; changes
>> > &gt; &gt; &gt; &gt; &gt; (since clean-up can be done altogether during
>> > refactoring)
>> > &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar
>> <
>> > vinoth@apache.org&gt;
>> > &gt; &gt; &gt; &gt; wrote:
>> > &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; Not sure if I fully agree with sweeping
>> > statements being made.
>> > &gt; But,
>> > &gt; &gt; &gt; +1
>> > &gt; &gt; &gt; &gt; &gt; for
>> > &gt; &gt; &gt; &gt; &gt; &gt; structuring this work via Jiras and having
>> > some committer
>> > &gt; “accept”
>> > &gt; &gt; &gt; the
>> > &gt; &gt; &gt; &gt; &gt; &gt; issue first.&nbsp; Some of these tend to
>> be
>> > subjective and we do need
>> > &gt; to
>> > &gt; &gt; &gt; &gt; make
>> > &gt; &gt; &gt; &gt; &gt; &gt; different tradeoffs.
>> > &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:28 AM vino yang
>> <
>> > yanghua1127@gmail.com
>> > &gt; &gt;
>> > &gt; &gt; &gt; &gt; wrote:
>> > &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Pratyaksh,
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Thanks for your thought.
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Let's listen to others' comments. If
>> > there is no objection, we
>> > &gt; &gt; will
>> > &gt; &gt; &gt; &gt; &gt; &gt; follow
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; this way.
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Pratyaksh Sharma <
>> pratyaksh13@gmail.com&gt;
>> > 于2020年1月21日周二
>> > &gt; 下午4:56写道:
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Vino,
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Big +1 for this initiative. I
>> have
>> > done this code cleanup for
>> > &gt; &gt; &gt; test
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; classes
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; in the past and strongly feel
>> > there is a need to do the same
>> > &gt; at
>> > &gt; &gt; &gt; &gt; other
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; places as well. I would
>> definitely
>> > like to volunteer for
>> > &gt; this.
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:52 PM
>> > vino yang <
>> > &gt; &gt; yanghua1127@gmail.com
>> > &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; wrote:
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi folks,
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Currently, the code quality
>> > of some Hudi module is not very
>> > &gt; &gt; &gt; well.
>> > &gt; &gt; &gt; &gt; &gt; As
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; many
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; developers have seen, the
>> > Intellij IDEA has shown many
>> > &gt; &gt; &gt; &gt; intellisense
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; about
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup and improvement.
>> The
>> > community does not object to
>> > &gt; &gt; doing
>> > &gt; &gt; &gt; &gt; the
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; and improvement work and
>> the
>> > work has been started via some
>> > &gt; &gt; &gt; &gt; direct
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; "minor"
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; PRs by some volunteers. The
>> > current way is unorganized and
>> > &gt; &gt; hard
>> > &gt; &gt; &gt; &gt; to
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; manage.
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; For tracking this work, I
>> > prefer to manage this work with
>> > &gt; the
>> > &gt; &gt; &gt; &gt; Jira
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; issue.
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; We can create an umbrella
>> > issue. Then, split the work into
>> > &gt; &gt; &gt; &gt; several
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; subtasks.
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Since those "bad smell"
>> lays
>> > anywhere in the whole project.
>> > &gt; &gt; &gt; It's
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; difficult
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; to give a standard to split
>> > the subtasks. For example, some
>> > &gt; &gt; &gt; files
>> > &gt; &gt; &gt; &gt; &gt; &gt; have
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; a
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; lot while some modules have
>> > few. So I suggest the standard
>> > &gt; &gt; &gt; would
>> > &gt; &gt; &gt; &gt; &gt; &gt; depend
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; on
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; the volume of the changes.
>> > Before working, any subtask
>> > &gt; should
>> > &gt; &gt; &gt; &gt; find
>> > &gt; &gt; &gt; &gt; &gt; a
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; committer as a mentor who
>> > would judge and approve the scope
>> > &gt; &gt; is
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; suitable.
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; What do you think?
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Any comments and
>> suggestions
>> > would be appreciated.
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt; &gt;
>> > &gt; &gt; &gt;
>> > &gt; &gt;
>> > &gt;
>>
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by vino yang <ya...@gmail.com>.
Hi Vinoth,

Yes, I'd like to give some suggestions about the "MINOR" PR.
Will file a Jira issue to track this work.

Best,
Vino

Vinoth Chandar <vi...@apache.org> 于2020年2月4日周二 上午8:16写道:

> +1 to vinoyang's suggestions.
>
> @Vino Yang <vi...@apache.org>  , do you want to formalize this and
> update our contributing page?
>
> On Thu, Jan 30, 2020 at 3:09 AM hmatu <hm...@foxmail.com> wrote:
>
> > Hi,
> >
> >
> >
> > I think these "MINOR" issues are important, a good&nbsp;project requires
> > not only functions, but also good coding style and  habits.
> >
> >
> > Best
> > Hmatu
> >
> >
> >
> >
> > ------------------&nbsp;Original&nbsp;------------------
> > From:&nbsp;"Vinoth Chandar"<vinoth@apache.org&gt;;
> > Date:&nbsp;Mon, Jan 27, 2020 03:25 AM
> > To:&nbsp;"dev"<dev@hudi.apache.org&gt;;
> >
> > Subject:&nbsp;Re: [DISCUSS] Unify Hudi code cleanup and improvement
> >
> >
> >
> > Hi Vino,
> >
> > You raise a valid point on what "MINOR" PR should be. All JIRAs start out
> > in "NEW" state and committers have to "Accept" the issue already (to
> force
> > early conversations like this).
> >
> > May be we should draw some bounds on it like, "cannot be more than 50
> > lines", "No functionality changes" .. etc? WDYT?&nbsp; This seems to be
> > the core
> > of the issue.
> >
> > On Thu, Jan 23, 2020 at 4:17 PM vino yang <yanghua1127@gmail.com&gt;
> > wrote:
> >
> > &gt; Hi Vinoth,
> > &gt;
> > &gt; Thank you for your thoughts, I agree that focusing on some higher
> > priority
> > &gt; work is more valuable.
> > &gt;
> > &gt; This discussion is to sort out and manage the work that the
> community
> > is
> > &gt; already doing. There are currently some PRs working on this type of
> > work,
> > &gt; such as PR[1][2][3][4]. The community has not given guidance on
> these
> > &gt; tasks. I think it's not very appropriate to open a "MINOR" PR
> > directly. So,
> > &gt; I want to hear from the community and how to manage them more
> > effectively.
> > &gt; The discussion does not encourage to give a higher priority to such
> > work.
> > &gt;
> > &gt; We haven't stopped this kind of work, so we should provide effective
> > &gt; guidance and organization so that it doesn't look disorganized.
> WYDT?
> > &gt;
> > &gt; Best,
> > &gt; Vino
> > &gt;
> > &gt; [1]: https://github.com/apache/incubator-hudi/pull/1237
> > &gt; [2]: https://github.com/apache/incubator-hudi/pull/1139
> > &gt; [3]: https://github.com/apache/incubator-hudi/pull/1137
> > &gt; [4]: https://github.com/apache/incubator-hudi/pull/1136
> > &gt;
> > &gt; Vinoth Chandar <vinoth@apache.org&gt; 于2020年1月23日周四 下午1:20写道:
> > &gt;
> > &gt; &gt; Hi,
> > &gt; &gt;
> > &gt; &gt; Thanks everyone for sharing your views!
> > &gt; &gt;
> > &gt; &gt; Some of this conversation is starting to feel like boiling the
> > ocean. I
> > &gt; &gt; believe in refactoring with purpose and discussing
> > class-by-class or
> > &gt; &gt; module-by-module does not make sense to me. Can we first list
> > down what
> > &gt; we
> > &gt; &gt; want to achieve? So far, I have only heard fixing IDE/IntelliJ
> > warnings.
> > &gt; &gt; Also instead of focussing on new work, how about looking at the
> > pending
> > &gt; &gt; JIRAs under "Testing" "Code Cleanup" components first and see
> if
> > those
> > &gt; are
> > &gt; &gt; worth tackling.
> > &gt; &gt;
> > &gt; &gt; We went down this path for code formatting and today we still
> > have
> > &gt; &gt; inconsistencies. Looking back, I feel we should have clearly
> > defined end
> > &gt; &gt; goals for the cleanups and we can then rank them based on ROI.
> > &gt; &gt;
> > &gt; &gt; Thanks
> > &gt; &gt; Vinoth
> > &gt; &gt;
> > &gt; &gt; On Wed, Jan 22, 2020 at 7:05 PM vino yang <
> yanghua1127@gmail.com&gt;
> > wrote:
> > &gt; &gt;
> > &gt; &gt; &gt; Hi Shiyan and Bhavani:
> > &gt; &gt; &gt;
> > &gt; &gt; &gt; Thanks for sharing your thoughts.
> > &gt; &gt; &gt;
> > &gt; &gt; &gt; As I originally stated. The advantage of using modules as
> a
> > unit to
> > &gt; split
> > &gt; &gt; &gt; work is that the decomposition is clear, but the
> > disadvantage is that
> > &gt; the
> > &gt; &gt; &gt; volume of changes may be huge, which brings huge risks
> > (considering
> > &gt; that
> > &gt; &gt; &gt; Hudi's test coverage is still not very high) and the
> > workload of
> > &gt; review.
> > &gt; &gt; &gt; The advantage of splitting by class is that the volume of
> > changes is
> > &gt; &gt; small
> > &gt; &gt; &gt; and the review is more convenient, but the disadvantages
> > are too many
> > &gt; &gt; tasks
> > &gt; &gt; &gt; and high maintenance costs.
> > &gt; &gt; &gt;
> > &gt; &gt; &gt;
> > &gt; &gt; &gt; *In addition, we need to define the boundaries of the
> "code
> > cleanup" I
> > &gt; &gt; &gt; expressed in this topic: it is limited to the smart tips
> > shown by
> > &gt; &gt; Intellij
> > &gt; &gt; &gt; IDEA. If the boundaries are too wide, then this discussion
> > will lose
> > &gt; &gt; &gt; control.*
> > &gt; &gt; &gt; I agree with Bhavani that we don't take it as the actual
> > goal. But we
> > &gt; are
> > &gt; &gt; &gt; not opposed to the community to help improve the quality
> of
> > the code
> > &gt; &gt; &gt; (basically, these tips given by the IDE are more
> > reasonable).
> > &gt; &gt; &gt;
> > &gt; &gt; &gt;
> > &gt; &gt; &gt; So, I still give my thoughts: We manage this work with
> > Jira. Before we
> > &gt; &gt; &gt; start working, we need to find a committer as a mentor.
> The
> > mentor must
> > &gt; &gt; &gt; decide whether the scale of the subtasks is reasonable and
> > whether
> > &gt; &gt; &gt; additional unit tests need to be added to verify the
> > changes. And the
> > &gt; &gt; &gt; mentor should be responsible for merged changes.
> > &gt; &gt; &gt;
> > &gt; &gt; &gt; What do you think?
> > &gt; &gt; &gt;
> > &gt; &gt; &gt; Best,
> > &gt; &gt; &gt; Vino
> > &gt; &gt; &gt;
> > &gt; &gt; &gt; Bhavani Sudha <bhavanisudhas@gmail.com&gt; 于2020年1月22日周三
> > 下午2:22写道:
> > &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; Hi @vinoyang thanks for bringing this to discussion.
> I
> > feel it would
> > &gt; be
> > &gt; &gt; &gt; &gt; less disruptive to clean up code as part of
> individual
> > classes being
> > &gt; &gt; &gt; &gt; touched for a specific goal rather than code cleanup
> > being the actual
> > &gt; &gt; &gt; goal.
> > &gt; &gt; &gt; &gt; This would narrow the touch point and ensure test
> > coverage (both unit
> > &gt; &gt; and
> > &gt; &gt; &gt; &gt; integration tests)&nbsp; catches any
> > accidental/unintentional changes.
> > &gt; Also
> > &gt; &gt; it
> > &gt; &gt; &gt; &gt; would give chance to change any documentation
> > quoting/referencing
> > &gt; that
> > &gt; &gt; &gt; &gt; code. Wanted to share my personal opinion.
> > &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; Thanks,
> > &gt; &gt; &gt; &gt; Sudha
> > &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <
> > &gt; &gt; xu.shiyan.raymond@gmail.com&gt;
> > &gt; &gt; &gt; &gt; wrote:
> > &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; The clean-up work can actually be split by
> > modules.
> > &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; Though it is generally a good practice to
> follow,
> > my concern is the
> > &gt; &gt; &gt; &gt; &gt; clean-up is likely to cause conflicts with some
> > on-going changes.
> > &gt; If
> > &gt; &gt; I
> > &gt; &gt; &gt; &gt; may
> > &gt; &gt; &gt; &gt; &gt; suggest, the dedicated clean-up tasks should
> avoid
> > &gt; &gt; &gt; &gt; &gt; - modules that are undergoing multiple feature
> > changes/PRs
> > &gt; &gt; &gt; &gt; &gt; - modules that are planned to have major
> > refactoring due to design
> > &gt; &gt; &gt; &gt; changes
> > &gt; &gt; &gt; &gt; &gt; (since clean-up can be done altogether during
> > refactoring)
> > &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <
> > vinoth@apache.org&gt;
> > &gt; &gt; &gt; &gt; wrote:
> > &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; Not sure if I fully agree with sweeping
> > statements being made.
> > &gt; But,
> > &gt; &gt; &gt; +1
> > &gt; &gt; &gt; &gt; &gt; for
> > &gt; &gt; &gt; &gt; &gt; &gt; structuring this work via Jiras and having
> > some committer
> > &gt; “accept”
> > &gt; &gt; &gt; the
> > &gt; &gt; &gt; &gt; &gt; &gt; issue first.&nbsp; Some of these tend to be
> > subjective and we do need
> > &gt; to
> > &gt; &gt; &gt; &gt; make
> > &gt; &gt; &gt; &gt; &gt; &gt; different tradeoffs.
> > &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:28 AM vino yang <
> > yanghua1127@gmail.com
> > &gt; &gt;
> > &gt; &gt; &gt; &gt; wrote:
> > &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Pratyaksh,
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Thanks for your thought.
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Let's listen to others' comments. If
> > there is no objection, we
> > &gt; &gt; will
> > &gt; &gt; &gt; &gt; &gt; &gt; follow
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; this way.
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; Pratyaksh Sharma <
> pratyaksh13@gmail.com&gt;
> > 于2020年1月21日周二
> > &gt; 下午4:56写道:
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Vino,
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Big +1 for this initiative. I
> have
> > done this code cleanup for
> > &gt; &gt; &gt; test
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; classes
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; in the past and strongly feel
> > there is a need to do the same
> > &gt; at
> > &gt; &gt; &gt; &gt; other
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; places as well. I would
> definitely
> > like to volunteer for
> > &gt; this.
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:52 PM
> > vino yang <
> > &gt; &gt; yanghua1127@gmail.com
> > &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; wrote:
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi folks,
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Currently, the code quality
> > of some Hudi module is not very
> > &gt; &gt; &gt; well.
> > &gt; &gt; &gt; &gt; &gt; As
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; many
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; developers have seen, the
> > Intellij IDEA has shown many
> > &gt; &gt; &gt; &gt; intellisense
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; about
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup and improvement. The
> > community does not object to
> > &gt; &gt; doing
> > &gt; &gt; &gt; &gt; the
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; and improvement work and the
> > work has been started via some
> > &gt; &gt; &gt; &gt; direct
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; "minor"
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; PRs by some volunteers. The
> > current way is unorganized and
> > &gt; &gt; hard
> > &gt; &gt; &gt; &gt; to
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; manage.
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; For tracking this work, I
> > prefer to manage this work with
> > &gt; the
> > &gt; &gt; &gt; &gt; Jira
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; issue.
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; We can create an umbrella
> > issue. Then, split the work into
> > &gt; &gt; &gt; &gt; several
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; subtasks.
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Since those "bad smell" lays
> > anywhere in the whole project.
> > &gt; &gt; &gt; It's
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; difficult
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; to give a standard to split
> > the subtasks. For example, some
> > &gt; &gt; &gt; files
> > &gt; &gt; &gt; &gt; &gt; &gt; have
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; a
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; lot while some modules have
> > few. So I suggest the standard
> > &gt; &gt; &gt; would
> > &gt; &gt; &gt; &gt; &gt; &gt; depend
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; on
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; the volume of the changes.
> > Before working, any subtask
> > &gt; should
> > &gt; &gt; &gt; &gt; find
> > &gt; &gt; &gt; &gt; &gt; a
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; committer as a mentor who
> > would judge and approve the scope
> > &gt; &gt; is
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; suitable.
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; What do you think?
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Any comments and suggestions
> > would be appreciated.
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt; &gt;
> > &gt; &gt; &gt;
> > &gt; &gt;
> > &gt;
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by Vinoth Chandar <vi...@apache.org>.
+1 to vinoyang's suggestions.

@Vino Yang <vi...@apache.org>  , do you want to formalize this and
update our contributing page?

On Thu, Jan 30, 2020 at 3:09 AM hmatu <hm...@foxmail.com> wrote:

> Hi,
>
>
>
> I think these "MINOR" issues are important, a good&nbsp;project requires
> not only functions, but also good coding style and  habits.
>
>
> Best
> Hmatu
>
>
>
>
> ------------------&nbsp;Original&nbsp;------------------
> From:&nbsp;"Vinoth Chandar"<vinoth@apache.org&gt;;
> Date:&nbsp;Mon, Jan 27, 2020 03:25 AM
> To:&nbsp;"dev"<dev@hudi.apache.org&gt;;
>
> Subject:&nbsp;Re: [DISCUSS] Unify Hudi code cleanup and improvement
>
>
>
> Hi Vino,
>
> You raise a valid point on what "MINOR" PR should be. All JIRAs start out
> in "NEW" state and committers have to "Accept" the issue already (to force
> early conversations like this).
>
> May be we should draw some bounds on it like, "cannot be more than 50
> lines", "No functionality changes" .. etc? WDYT?&nbsp; This seems to be
> the core
> of the issue.
>
> On Thu, Jan 23, 2020 at 4:17 PM vino yang <yanghua1127@gmail.com&gt;
> wrote:
>
> &gt; Hi Vinoth,
> &gt;
> &gt; Thank you for your thoughts, I agree that focusing on some higher
> priority
> &gt; work is more valuable.
> &gt;
> &gt; This discussion is to sort out and manage the work that the community
> is
> &gt; already doing. There are currently some PRs working on this type of
> work,
> &gt; such as PR[1][2][3][4]. The community has not given guidance on these
> &gt; tasks. I think it's not very appropriate to open a "MINOR" PR
> directly. So,
> &gt; I want to hear from the community and how to manage them more
> effectively.
> &gt; The discussion does not encourage to give a higher priority to such
> work.
> &gt;
> &gt; We haven't stopped this kind of work, so we should provide effective
> &gt; guidance and organization so that it doesn't look disorganized. WYDT?
> &gt;
> &gt; Best,
> &gt; Vino
> &gt;
> &gt; [1]: https://github.com/apache/incubator-hudi/pull/1237
> &gt; [2]: https://github.com/apache/incubator-hudi/pull/1139
> &gt; [3]: https://github.com/apache/incubator-hudi/pull/1137
> &gt; [4]: https://github.com/apache/incubator-hudi/pull/1136
> &gt;
> &gt; Vinoth Chandar <vinoth@apache.org&gt; 于2020年1月23日周四 下午1:20写道:
> &gt;
> &gt; &gt; Hi,
> &gt; &gt;
> &gt; &gt; Thanks everyone for sharing your views!
> &gt; &gt;
> &gt; &gt; Some of this conversation is starting to feel like boiling the
> ocean. I
> &gt; &gt; believe in refactoring with purpose and discussing
> class-by-class or
> &gt; &gt; module-by-module does not make sense to me. Can we first list
> down what
> &gt; we
> &gt; &gt; want to achieve? So far, I have only heard fixing IDE/IntelliJ
> warnings.
> &gt; &gt; Also instead of focussing on new work, how about looking at the
> pending
> &gt; &gt; JIRAs under "Testing" "Code Cleanup" components first and see if
> those
> &gt; are
> &gt; &gt; worth tackling.
> &gt; &gt;
> &gt; &gt; We went down this path for code formatting and today we still
> have
> &gt; &gt; inconsistencies. Looking back, I feel we should have clearly
> defined end
> &gt; &gt; goals for the cleanups and we can then rank them based on ROI.
> &gt; &gt;
> &gt; &gt; Thanks
> &gt; &gt; Vinoth
> &gt; &gt;
> &gt; &gt; On Wed, Jan 22, 2020 at 7:05 PM vino yang <yanghua1127@gmail.com&gt;
> wrote:
> &gt; &gt;
> &gt; &gt; &gt; Hi Shiyan and Bhavani:
> &gt; &gt; &gt;
> &gt; &gt; &gt; Thanks for sharing your thoughts.
> &gt; &gt; &gt;
> &gt; &gt; &gt; As I originally stated. The advantage of using modules as a
> unit to
> &gt; split
> &gt; &gt; &gt; work is that the decomposition is clear, but the
> disadvantage is that
> &gt; the
> &gt; &gt; &gt; volume of changes may be huge, which brings huge risks
> (considering
> &gt; that
> &gt; &gt; &gt; Hudi's test coverage is still not very high) and the
> workload of
> &gt; review.
> &gt; &gt; &gt; The advantage of splitting by class is that the volume of
> changes is
> &gt; &gt; small
> &gt; &gt; &gt; and the review is more convenient, but the disadvantages
> are too many
> &gt; &gt; tasks
> &gt; &gt; &gt; and high maintenance costs.
> &gt; &gt; &gt;
> &gt; &gt; &gt;
> &gt; &gt; &gt; *In addition, we need to define the boundaries of the "code
> cleanup" I
> &gt; &gt; &gt; expressed in this topic: it is limited to the smart tips
> shown by
> &gt; &gt; Intellij
> &gt; &gt; &gt; IDEA. If the boundaries are too wide, then this discussion
> will lose
> &gt; &gt; &gt; control.*
> &gt; &gt; &gt; I agree with Bhavani that we don't take it as the actual
> goal. But we
> &gt; are
> &gt; &gt; &gt; not opposed to the community to help improve the quality of
> the code
> &gt; &gt; &gt; (basically, these tips given by the IDE are more
> reasonable).
> &gt; &gt; &gt;
> &gt; &gt; &gt;
> &gt; &gt; &gt; So, I still give my thoughts: We manage this work with
> Jira. Before we
> &gt; &gt; &gt; start working, we need to find a committer as a mentor. The
> mentor must
> &gt; &gt; &gt; decide whether the scale of the subtasks is reasonable and
> whether
> &gt; &gt; &gt; additional unit tests need to be added to verify the
> changes. And the
> &gt; &gt; &gt; mentor should be responsible for merged changes.
> &gt; &gt; &gt;
> &gt; &gt; &gt; What do you think?
> &gt; &gt; &gt;
> &gt; &gt; &gt; Best,
> &gt; &gt; &gt; Vino
> &gt; &gt; &gt;
> &gt; &gt; &gt; Bhavani Sudha <bhavanisudhas@gmail.com&gt; 于2020年1月22日周三
> 下午2:22写道:
> &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; Hi @vinoyang thanks for bringing this to discussion. I
> feel it would
> &gt; be
> &gt; &gt; &gt; &gt; less disruptive to clean up code as part of individual
> classes being
> &gt; &gt; &gt; &gt; touched for a specific goal rather than code cleanup
> being the actual
> &gt; &gt; &gt; goal.
> &gt; &gt; &gt; &gt; This would narrow the touch point and ensure test
> coverage (both unit
> &gt; &gt; and
> &gt; &gt; &gt; &gt; integration tests)&nbsp; catches any
> accidental/unintentional changes.
> &gt; Also
> &gt; &gt; it
> &gt; &gt; &gt; &gt; would give chance to change any documentation
> quoting/referencing
> &gt; that
> &gt; &gt; &gt; &gt; code. Wanted to share my personal opinion.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; Thanks,
> &gt; &gt; &gt; &gt; Sudha
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <
> &gt; &gt; xu.shiyan.raymond@gmail.com&gt;
> &gt; &gt; &gt; &gt; wrote:
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; The clean-up work can actually be split by
> modules.
> &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; Though it is generally a good practice to follow,
> my concern is the
> &gt; &gt; &gt; &gt; &gt; clean-up is likely to cause conflicts with some
> on-going changes.
> &gt; If
> &gt; &gt; I
> &gt; &gt; &gt; &gt; may
> &gt; &gt; &gt; &gt; &gt; suggest, the dedicated clean-up tasks should avoid
> &gt; &gt; &gt; &gt; &gt; - modules that are undergoing multiple feature
> changes/PRs
> &gt; &gt; &gt; &gt; &gt; - modules that are planned to have major
> refactoring due to design
> &gt; &gt; &gt; &gt; changes
> &gt; &gt; &gt; &gt; &gt; (since clean-up can be done altogether during
> refactoring)
> &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <
> vinoth@apache.org&gt;
> &gt; &gt; &gt; &gt; wrote:
> &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; Not sure if I fully agree with sweeping
> statements being made.
> &gt; But,
> &gt; &gt; &gt; +1
> &gt; &gt; &gt; &gt; &gt; for
> &gt; &gt; &gt; &gt; &gt; &gt; structuring this work via Jiras and having
> some committer
> &gt; “accept”
> &gt; &gt; &gt; the
> &gt; &gt; &gt; &gt; &gt; &gt; issue first.&nbsp; Some of these tend to be
> subjective and we do need
> &gt; to
> &gt; &gt; &gt; &gt; make
> &gt; &gt; &gt; &gt; &gt; &gt; different tradeoffs.
> &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:28 AM vino yang <
> yanghua1127@gmail.com
> &gt; &gt;
> &gt; &gt; &gt; &gt; wrote:
> &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Pratyaksh,
> &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; Thanks for your thought.
> &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; Let's listen to others' comments. If
> there is no objection, we
> &gt; &gt; will
> &gt; &gt; &gt; &gt; &gt; &gt; follow
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; this way.
> &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
> &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; Pratyaksh Sharma <pratyaksh13@gmail.com&gt;
> 于2020年1月21日周二
> &gt; 下午4:56写道:
> &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Vino,
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Big +1 for this initiative. I have
> done this code cleanup for
> &gt; &gt; &gt; test
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; classes
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; in the past and strongly feel
> there is a need to do the same
> &gt; at
> &gt; &gt; &gt; &gt; other
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; places as well. I would definitely
> like to volunteer for
> &gt; this.
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:52 PM
> vino yang <
> &gt; &gt; yanghua1127@gmail.com
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; wrote:
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi folks,
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Currently, the code quality
> of some Hudi module is not very
> &gt; &gt; &gt; well.
> &gt; &gt; &gt; &gt; &gt; As
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; many
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; developers have seen, the
> Intellij IDEA has shown many
> &gt; &gt; &gt; &gt; intellisense
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; about
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup and improvement. The
> community does not object to
> &gt; &gt; doing
> &gt; &gt; &gt; &gt; the
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; and improvement work and the
> work has been started via some
> &gt; &gt; &gt; &gt; direct
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; "minor"
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; PRs by some volunteers. The
> current way is unorganized and
> &gt; &gt; hard
> &gt; &gt; &gt; &gt; to
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; manage.
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; For tracking this work, I
> prefer to manage this work with
> &gt; the
> &gt; &gt; &gt; &gt; Jira
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; issue.
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; We can create an umbrella
> issue. Then, split the work into
> &gt; &gt; &gt; &gt; several
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; subtasks.
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Since those "bad smell" lays
> anywhere in the whole project.
> &gt; &gt; &gt; It's
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; difficult
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; to give a standard to split
> the subtasks. For example, some
> &gt; &gt; &gt; files
> &gt; &gt; &gt; &gt; &gt; &gt; have
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; a
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; lot while some modules have
> few. So I suggest the standard
> &gt; &gt; &gt; would
> &gt; &gt; &gt; &gt; &gt; &gt; depend
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; on
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; the volume of the changes.
> Before working, any subtask
> &gt; should
> &gt; &gt; &gt; &gt; find
> &gt; &gt; &gt; &gt; &gt; a
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; committer as a mentor who
> would judge and approve the scope
> &gt; &gt; is
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; suitable.
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; What do you think?
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Any comments and suggestions
> would be appreciated.
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt;
> &gt; &gt;
> &gt;

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by hmatu <hm...@foxmail.com>.
Hi,



I think these "MINOR" issues are important, a good&nbsp;project requires not only functions, but also good coding style and  habits.


Best
Hmatu




------------------&nbsp;Original&nbsp;------------------
From:&nbsp;"Vinoth Chandar"<vinoth@apache.org&gt;;
Date:&nbsp;Mon, Jan 27, 2020 03:25 AM
To:&nbsp;"dev"<dev@hudi.apache.org&gt;;

Subject:&nbsp;Re: [DISCUSS] Unify Hudi code cleanup and improvement



Hi Vino,

You raise a valid point on what "MINOR" PR should be. All JIRAs start out
in "NEW" state and committers have to "Accept" the issue already (to force
early conversations like this).

May be we should draw some bounds on it like, "cannot be more than 50
lines", "No functionality changes" .. etc? WDYT?&nbsp; This seems to be the core
of the issue.

On Thu, Jan 23, 2020 at 4:17 PM vino yang <yanghua1127@gmail.com&gt; wrote:

&gt; Hi Vinoth,
&gt;
&gt; Thank you for your thoughts, I agree that focusing on some higher priority
&gt; work is more valuable.
&gt;
&gt; This discussion is to sort out and manage the work that the community is
&gt; already doing. There are currently some PRs working on this type of work,
&gt; such as PR[1][2][3][4]. The community has not given guidance on these
&gt; tasks. I think it's not very appropriate to open a "MINOR" PR directly. So,
&gt; I want to hear from the community and how to manage them more effectively.
&gt; The discussion does not encourage to give a higher priority to such work.
&gt;
&gt; We haven't stopped this kind of work, so we should provide effective
&gt; guidance and organization so that it doesn't look disorganized. WYDT?
&gt;
&gt; Best,
&gt; Vino
&gt;
&gt; [1]: https://github.com/apache/incubator-hudi/pull/1237
&gt; [2]: https://github.com/apache/incubator-hudi/pull/1139
&gt; [3]: https://github.com/apache/incubator-hudi/pull/1137
&gt; [4]: https://github.com/apache/incubator-hudi/pull/1136
&gt;
&gt; Vinoth Chandar <vinoth@apache.org&gt; 于2020年1月23日周四 下午1:20写道:
&gt;
&gt; &gt; Hi,
&gt; &gt;
&gt; &gt; Thanks everyone for sharing your views!
&gt; &gt;
&gt; &gt; Some of this conversation is starting to feel like boiling the ocean. I
&gt; &gt; believe in refactoring with purpose and discussing class-by-class or
&gt; &gt; module-by-module does not make sense to me. Can we first list down what
&gt; we
&gt; &gt; want to achieve? So far, I have only heard fixing IDE/IntelliJ warnings.
&gt; &gt; Also instead of focussing on new work, how about looking at the pending
&gt; &gt; JIRAs under "Testing" "Code Cleanup" components first and see if those
&gt; are
&gt; &gt; worth tackling.
&gt; &gt;
&gt; &gt; We went down this path for code formatting and today we still have
&gt; &gt; inconsistencies. Looking back, I feel we should have clearly defined end
&gt; &gt; goals for the cleanups and we can then rank them based on ROI.
&gt; &gt;
&gt; &gt; Thanks
&gt; &gt; Vinoth
&gt; &gt;
&gt; &gt; On Wed, Jan 22, 2020 at 7:05 PM vino yang <yanghua1127@gmail.com&gt; wrote:
&gt; &gt;
&gt; &gt; &gt; Hi Shiyan and Bhavani:
&gt; &gt; &gt;
&gt; &gt; &gt; Thanks for sharing your thoughts.
&gt; &gt; &gt;
&gt; &gt; &gt; As I originally stated. The advantage of using modules as a unit to
&gt; split
&gt; &gt; &gt; work is that the decomposition is clear, but the disadvantage is that
&gt; the
&gt; &gt; &gt; volume of changes may be huge, which brings huge risks (considering
&gt; that
&gt; &gt; &gt; Hudi's test coverage is still not very high) and the workload of
&gt; review.
&gt; &gt; &gt; The advantage of splitting by class is that the volume of changes is
&gt; &gt; small
&gt; &gt; &gt; and the review is more convenient, but the disadvantages are too many
&gt; &gt; tasks
&gt; &gt; &gt; and high maintenance costs.
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; *In addition, we need to define the boundaries of the "code cleanup" I
&gt; &gt; &gt; expressed in this topic: it is limited to the smart tips shown by
&gt; &gt; Intellij
&gt; &gt; &gt; IDEA. If the boundaries are too wide, then this discussion will lose
&gt; &gt; &gt; control.*
&gt; &gt; &gt; I agree with Bhavani that we don't take it as the actual goal. But we
&gt; are
&gt; &gt; &gt; not opposed to the community to help improve the quality of the code
&gt; &gt; &gt; (basically, these tips given by the IDE are more reasonable).
&gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt; &gt; So, I still give my thoughts: We manage this work with Jira. Before we
&gt; &gt; &gt; start working, we need to find a committer as a mentor. The mentor must
&gt; &gt; &gt; decide whether the scale of the subtasks is reasonable and whether
&gt; &gt; &gt; additional unit tests need to be added to verify the changes. And the
&gt; &gt; &gt; mentor should be responsible for merged changes.
&gt; &gt; &gt;
&gt; &gt; &gt; What do you think?
&gt; &gt; &gt;
&gt; &gt; &gt; Best,
&gt; &gt; &gt; Vino
&gt; &gt; &gt;
&gt; &gt; &gt; Bhavani Sudha <bhavanisudhas@gmail.com&gt; 于2020年1月22日周三 下午2:22写道:
&gt; &gt; &gt;
&gt; &gt; &gt; &gt; Hi @vinoyang thanks for bringing this to discussion. I feel it would
&gt; be
&gt; &gt; &gt; &gt; less disruptive to clean up code as part of individual classes being
&gt; &gt; &gt; &gt; touched for a specific goal rather than code cleanup being the actual
&gt; &gt; &gt; goal.
&gt; &gt; &gt; &gt; This would narrow the touch point and ensure test coverage (both unit
&gt; &gt; and
&gt; &gt; &gt; &gt; integration tests)&nbsp; catches any accidental/unintentional changes.
&gt; Also
&gt; &gt; it
&gt; &gt; &gt; &gt; would give chance to change any documentation quoting/referencing
&gt; that
&gt; &gt; &gt; &gt; code. Wanted to share my personal opinion.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; Thanks,
&gt; &gt; &gt; &gt; Sudha
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <
&gt; &gt; xu.shiyan.raymond@gmail.com&gt;
&gt; &gt; &gt; &gt; wrote:
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; The clean-up work can actually be split by modules.
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; Though it is generally a good practice to follow, my concern is the
&gt; &gt; &gt; &gt; &gt; clean-up is likely to cause conflicts with some on-going changes.
&gt; If
&gt; &gt; I
&gt; &gt; &gt; &gt; may
&gt; &gt; &gt; &gt; &gt; suggest, the dedicated clean-up tasks should avoid
&gt; &gt; &gt; &gt; &gt; - modules that are undergoing multiple feature changes/PRs
&gt; &gt; &gt; &gt; &gt; - modules that are planned to have major refactoring due to design
&gt; &gt; &gt; &gt; changes
&gt; &gt; &gt; &gt; &gt; (since clean-up can be done altogether during refactoring)
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <vinoth@apache.org&gt;
&gt; &gt; &gt; &gt; wrote:
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; Not sure if I fully agree with sweeping statements being made.
&gt; But,
&gt; &gt; &gt; +1
&gt; &gt; &gt; &gt; &gt; for
&gt; &gt; &gt; &gt; &gt; &gt; structuring this work via Jiras and having some committer
&gt; “accept”
&gt; &gt; &gt; the
&gt; &gt; &gt; &gt; &gt; &gt; issue first.&nbsp; Some of these tend to be subjective and we do need
&gt; to
&gt; &gt; &gt; &gt; make
&gt; &gt; &gt; &gt; &gt; &gt; different tradeoffs.
&gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:28 AM vino yang <yanghua1127@gmail.com
&gt; &gt;
&gt; &gt; &gt; &gt; wrote:
&gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Pratyaksh,
&gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Thanks for your thought.
&gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Let's listen to others' comments. If there is no objection, we
&gt; &gt; will
&gt; &gt; &gt; &gt; &gt; &gt; follow
&gt; &gt; &gt; &gt; &gt; &gt; &gt; this way.
&gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
&gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Pratyaksh Sharma <pratyaksh13@gmail.com&gt; 于2020年1月21日周二
&gt; 下午4:56写道:
&gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi Vino,
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Big +1 for this initiative. I have done this code cleanup for
&gt; &gt; &gt; test
&gt; &gt; &gt; &gt; &gt; &gt; &gt; classes
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; in the past and strongly feel there is a need to do the same
&gt; at
&gt; &gt; &gt; &gt; other
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; places as well. I would definitely like to volunteer for
&gt; this.
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; On Tue, Jan 21, 2020 at 1:52 PM vino yang <
&gt; &gt; yanghua1127@gmail.com
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; wrote:
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Hi folks,
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Currently, the code quality of some Hudi module is not very
&gt; &gt; &gt; well.
&gt; &gt; &gt; &gt; &gt; As
&gt; &gt; &gt; &gt; &gt; &gt; &gt; many
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; developers have seen, the Intellij IDEA has shown many
&gt; &gt; &gt; &gt; intellisense
&gt; &gt; &gt; &gt; &gt; &gt; &gt; about
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup and improvement. The community does not object to
&gt; &gt; doing
&gt; &gt; &gt; &gt; the
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; cleanup
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; and improvement work and the work has been started via some
&gt; &gt; &gt; &gt; direct
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; "minor"
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; PRs by some volunteers. The current way is unorganized and
&gt; &gt; hard
&gt; &gt; &gt; &gt; to
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; manage.
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; For tracking this work, I prefer to manage this work with
&gt; the
&gt; &gt; &gt; &gt; Jira
&gt; &gt; &gt; &gt; &gt; &gt; &gt; issue.
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; We can create an umbrella issue. Then, split the work into
&gt; &gt; &gt; &gt; several
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; subtasks.
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Since those "bad smell" lays anywhere in the whole project.
&gt; &gt; &gt; It's
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; difficult
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; to give a standard to split the subtasks. For example, some
&gt; &gt; &gt; files
&gt; &gt; &gt; &gt; &gt; &gt; have
&gt; &gt; &gt; &gt; &gt; &gt; &gt; a
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; lot while some modules have few. So I suggest the standard
&gt; &gt; &gt; would
&gt; &gt; &gt; &gt; &gt; &gt; depend
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; on
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; the volume of the changes. Before working, any subtask
&gt; should
&gt; &gt; &gt; &gt; find
&gt; &gt; &gt; &gt; &gt; a
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; committer as a mentor who would judge and approve the scope
&gt; &gt; is
&gt; &gt; &gt; &gt; &gt; &gt; &gt; suitable.
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; What do you think?
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Any comments and suggestions would be appreciated.
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Best,
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Vino
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt;
&gt; &gt; &gt;
&gt; &gt;
&gt;

Re:Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by lamberken <la...@163.com>.

Hi @Vinoth @Vino


IMO, we can use SonarQube[1] and Sonarlint[2] tools to help us detect and fix quality issues.


Local env, follow below steps:
------------------------------------------------------------------------------------------
1, docker run -d --name sonarqube -p 9000:9000 sonarqube
2, mvn sonar:sonar
3, http://localhost:9000
------------------------------------------------------------------------------------------


[1] https://www.sonarqube.org
[2] https://www.sonarlint.org




Thanks
Lamber-Ken




At 2020-01-27 03:25:02, "Vinoth Chandar" <vi...@apache.org> wrote:
>Hi Vino,
>
>You raise a valid point on what "MINOR" PR should be. All JIRAs start out
>in "NEW" state and committers have to "Accept" the issue already (to force
>early conversations like this).
>
>May be we should draw some bounds on it like, "cannot be more than 50
>lines", "No functionality changes" .. etc? WDYT?  This seems to be the core
>of the issue.
>
>On Thu, Jan 23, 2020 at 4:17 PM vino yang <ya...@gmail.com> wrote:
>
>> Hi Vinoth,
>>
>> Thank you for your thoughts, I agree that focusing on some higher priority
>> work is more valuable.
>>
>> This discussion is to sort out and manage the work that the community is
>> already doing. There are currently some PRs working on this type of work,
>> such as PR[1][2][3][4]. The community has not given guidance on these
>> tasks. I think it's not very appropriate to open a "MINOR" PR directly. So,
>> I want to hear from the community and how to manage them more effectively.
>> The discussion does not encourage to give a higher priority to such work.
>>
>> We haven't stopped this kind of work, so we should provide effective
>> guidance and organization so that it doesn't look disorganized. WYDT?
>>
>> Best,
>> Vino
>>
>> [1]: https://github.com/apache/incubator-hudi/pull/1237
>> [2]: https://github.com/apache/incubator-hudi/pull/1139
>> [3]: https://github.com/apache/incubator-hudi/pull/1137
>> [4]: https://github.com/apache/incubator-hudi/pull/1136
>>
>> Vinoth Chandar <vi...@apache.org> 于2020年1月23日周四 下午1:20写道:
>>
>> > Hi,
>> >
>> > Thanks everyone for sharing your views!
>> >
>> > Some of this conversation is starting to feel like boiling the ocean. I
>> > believe in refactoring with purpose and discussing class-by-class or
>> > module-by-module does not make sense to me. Can we first list down what
>> we
>> > want to achieve? So far, I have only heard fixing IDE/IntelliJ warnings.
>> > Also instead of focussing on new work, how about looking at the pending
>> > JIRAs under "Testing" "Code Cleanup" components first and see if those
>> are
>> > worth tackling.
>> >
>> > We went down this path for code formatting and today we still have
>> > inconsistencies. Looking back, I feel we should have clearly defined end
>> > goals for the cleanups and we can then rank them based on ROI.
>> >
>> > Thanks
>> > Vinoth
>> >
>> > On Wed, Jan 22, 2020 at 7:05 PM vino yang <ya...@gmail.com> wrote:
>> >
>> > > Hi Shiyan and Bhavani:
>> > >
>> > > Thanks for sharing your thoughts.
>> > >
>> > > As I originally stated. The advantage of using modules as a unit to
>> split
>> > > work is that the decomposition is clear, but the disadvantage is that
>> the
>> > > volume of changes may be huge, which brings huge risks (considering
>> that
>> > > Hudi's test coverage is still not very high) and the workload of
>> review.
>> > > The advantage of splitting by class is that the volume of changes is
>> > small
>> > > and the review is more convenient, but the disadvantages are too many
>> > tasks
>> > > and high maintenance costs.
>> > >
>> > >
>> > > *In addition, we need to define the boundaries of the "code cleanup" I
>> > > expressed in this topic: it is limited to the smart tips shown by
>> > Intellij
>> > > IDEA. If the boundaries are too wide, then this discussion will lose
>> > > control.*
>> > > I agree with Bhavani that we don't take it as the actual goal. But we
>> are
>> > > not opposed to the community to help improve the quality of the code
>> > > (basically, these tips given by the IDE are more reasonable).
>> > >
>> > >
>> > > So, I still give my thoughts: We manage this work with Jira. Before we
>> > > start working, we need to find a committer as a mentor. The mentor must
>> > > decide whether the scale of the subtasks is reasonable and whether
>> > > additional unit tests need to be added to verify the changes. And the
>> > > mentor should be responsible for merged changes.
>> > >
>> > > What do you think?
>> > >
>> > > Best,
>> > > Vino
>> > >
>> > > Bhavani Sudha <bh...@gmail.com> 于2020年1月22日周三 下午2:22写道:
>> > >
>> > > > Hi @vinoyang thanks for bringing this to discussion. I feel it would
>> be
>> > > > less disruptive to clean up code as part of individual classes being
>> > > > touched for a specific goal rather than code cleanup being the actual
>> > > goal.
>> > > > This would narrow the touch point and ensure test coverage (both unit
>> > and
>> > > > integration tests)  catches any accidental/unintentional changes.
>> Also
>> > it
>> > > > would give chance to change any documentation quoting/referencing
>> that
>> > > > code. Wanted to share my personal opinion.
>> > > >
>> > > > Thanks,
>> > > > Sudha
>> > > >
>> > > >
>> > > >
>> > > > On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <
>> > xu.shiyan.raymond@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > The clean-up work can actually be split by modules.
>> > > > >
>> > > > > Though it is generally a good practice to follow, my concern is the
>> > > > > clean-up is likely to cause conflicts with some on-going changes.
>> If
>> > I
>> > > > may
>> > > > > suggest, the dedicated clean-up tasks should avoid
>> > > > > - modules that are undergoing multiple feature changes/PRs
>> > > > > - modules that are planned to have major refactoring due to design
>> > > > changes
>> > > > > (since clean-up can be done altogether during refactoring)
>> > > > >
>> > > > > On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <vi...@apache.org>
>> > > > wrote:
>> > > > >
>> > > > > > Not sure if I fully agree with sweeping statements being made.
>> But,
>> > > +1
>> > > > > for
>> > > > > > structuring this work via Jiras and having some committer
>> “accept”
>> > > the
>> > > > > > issue first.  Some of these tend to be subjective and we do need
>> to
>> > > > make
>> > > > > > different tradeoffs.
>> > > > > >
>> > > > > > On Tue, Jan 21, 2020 at 1:28 AM vino yang <yanghua1127@gmail.com
>> >
>> > > > wrote:
>> > > > > >
>> > > > > > > Hi Pratyaksh,
>> > > > > > >
>> > > > > > > Thanks for your thought.
>> > > > > > >
>> > > > > > > Let's listen to others' comments. If there is no objection, we
>> > will
>> > > > > > follow
>> > > > > > > this way.
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Vino
>> > > > > > >
>> > > > > > >
>> > > > > > > Pratyaksh Sharma <pr...@gmail.com> 于2020年1月21日周二
>> 下午4:56写道:
>> > > > > > >
>> > > > > > > > Hi Vino,
>> > > > > > > >
>> > > > > > > > Big +1 for this initiative. I have done this code cleanup for
>> > > test
>> > > > > > > classes
>> > > > > > > > in the past and strongly feel there is a need to do the same
>> at
>> > > > other
>> > > > > > > > places as well. I would definitely like to volunteer for
>> this.
>> > > > > > > >
>> > > > > > > > On Tue, Jan 21, 2020 at 1:52 PM vino yang <
>> > yanghua1127@gmail.com
>> > > >
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi folks,
>> > > > > > > > >
>> > > > > > > > > Currently, the code quality of some Hudi module is not very
>> > > well.
>> > > > > As
>> > > > > > > many
>> > > > > > > > > developers have seen, the Intellij IDEA has shown many
>> > > > intellisense
>> > > > > > > about
>> > > > > > > > > cleanup and improvement. The community does not object to
>> > doing
>> > > > the
>> > > > > > > > cleanup
>> > > > > > > > > and improvement work and the work has been started via some
>> > > > direct
>> > > > > > > > "minor"
>> > > > > > > > > PRs by some volunteers. The current way is unorganized and
>> > hard
>> > > > to
>> > > > > > > > manage.
>> > > > > > > > > For tracking this work, I prefer to manage this work with
>> the
>> > > > Jira
>> > > > > > > issue.
>> > > > > > > > > We can create an umbrella issue. Then, split the work into
>> > > > several
>> > > > > > > > > subtasks.
>> > > > > > > > >
>> > > > > > > > > Since those "bad smell" lays anywhere in the whole project.
>> > > It's
>> > > > > > > > difficult
>> > > > > > > > > to give a standard to split the subtasks. For example, some
>> > > files
>> > > > > > have
>> > > > > > > a
>> > > > > > > > > lot while some modules have few. So I suggest the standard
>> > > would
>> > > > > > depend
>> > > > > > > > on
>> > > > > > > > > the volume of the changes. Before working, any subtask
>> should
>> > > > find
>> > > > > a
>> > > > > > > > > committer as a mentor who would judge and approve the scope
>> > is
>> > > > > > > suitable.
>> > > > > > > > >
>> > > > > > > > > What do you think?
>> > > > > > > > >
>> > > > > > > > > Any comments and suggestions would be appreciated.
>> > > > > > > > >
>> > > > > > > > > Best,
>> > > > > > > > > Vino
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by Vinoth Chandar <vi...@apache.org>.
Hi Vino,

You raise a valid point on what "MINOR" PR should be. All JIRAs start out
in "NEW" state and committers have to "Accept" the issue already (to force
early conversations like this).

May be we should draw some bounds on it like, "cannot be more than 50
lines", "No functionality changes" .. etc? WDYT?  This seems to be the core
of the issue.

On Thu, Jan 23, 2020 at 4:17 PM vino yang <ya...@gmail.com> wrote:

> Hi Vinoth,
>
> Thank you for your thoughts, I agree that focusing on some higher priority
> work is more valuable.
>
> This discussion is to sort out and manage the work that the community is
> already doing. There are currently some PRs working on this type of work,
> such as PR[1][2][3][4]. The community has not given guidance on these
> tasks. I think it's not very appropriate to open a "MINOR" PR directly. So,
> I want to hear from the community and how to manage them more effectively.
> The discussion does not encourage to give a higher priority to such work.
>
> We haven't stopped this kind of work, so we should provide effective
> guidance and organization so that it doesn't look disorganized. WYDT?
>
> Best,
> Vino
>
> [1]: https://github.com/apache/incubator-hudi/pull/1237
> [2]: https://github.com/apache/incubator-hudi/pull/1139
> [3]: https://github.com/apache/incubator-hudi/pull/1137
> [4]: https://github.com/apache/incubator-hudi/pull/1136
>
> Vinoth Chandar <vi...@apache.org> 于2020年1月23日周四 下午1:20写道:
>
> > Hi,
> >
> > Thanks everyone for sharing your views!
> >
> > Some of this conversation is starting to feel like boiling the ocean. I
> > believe in refactoring with purpose and discussing class-by-class or
> > module-by-module does not make sense to me. Can we first list down what
> we
> > want to achieve? So far, I have only heard fixing IDE/IntelliJ warnings.
> > Also instead of focussing on new work, how about looking at the pending
> > JIRAs under "Testing" "Code Cleanup" components first and see if those
> are
> > worth tackling.
> >
> > We went down this path for code formatting and today we still have
> > inconsistencies. Looking back, I feel we should have clearly defined end
> > goals for the cleanups and we can then rank them based on ROI.
> >
> > Thanks
> > Vinoth
> >
> > On Wed, Jan 22, 2020 at 7:05 PM vino yang <ya...@gmail.com> wrote:
> >
> > > Hi Shiyan and Bhavani:
> > >
> > > Thanks for sharing your thoughts.
> > >
> > > As I originally stated. The advantage of using modules as a unit to
> split
> > > work is that the decomposition is clear, but the disadvantage is that
> the
> > > volume of changes may be huge, which brings huge risks (considering
> that
> > > Hudi's test coverage is still not very high) and the workload of
> review.
> > > The advantage of splitting by class is that the volume of changes is
> > small
> > > and the review is more convenient, but the disadvantages are too many
> > tasks
> > > and high maintenance costs.
> > >
> > >
> > > *In addition, we need to define the boundaries of the "code cleanup" I
> > > expressed in this topic: it is limited to the smart tips shown by
> > Intellij
> > > IDEA. If the boundaries are too wide, then this discussion will lose
> > > control.*
> > > I agree with Bhavani that we don't take it as the actual goal. But we
> are
> > > not opposed to the community to help improve the quality of the code
> > > (basically, these tips given by the IDE are more reasonable).
> > >
> > >
> > > So, I still give my thoughts: We manage this work with Jira. Before we
> > > start working, we need to find a committer as a mentor. The mentor must
> > > decide whether the scale of the subtasks is reasonable and whether
> > > additional unit tests need to be added to verify the changes. And the
> > > mentor should be responsible for merged changes.
> > >
> > > What do you think?
> > >
> > > Best,
> > > Vino
> > >
> > > Bhavani Sudha <bh...@gmail.com> 于2020年1月22日周三 下午2:22写道:
> > >
> > > > Hi @vinoyang thanks for bringing this to discussion. I feel it would
> be
> > > > less disruptive to clean up code as part of individual classes being
> > > > touched for a specific goal rather than code cleanup being the actual
> > > goal.
> > > > This would narrow the touch point and ensure test coverage (both unit
> > and
> > > > integration tests)  catches any accidental/unintentional changes.
> Also
> > it
> > > > would give chance to change any documentation quoting/referencing
> that
> > > > code. Wanted to share my personal opinion.
> > > >
> > > > Thanks,
> > > > Sudha
> > > >
> > > >
> > > >
> > > > On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <
> > xu.shiyan.raymond@gmail.com>
> > > > wrote:
> > > >
> > > > > The clean-up work can actually be split by modules.
> > > > >
> > > > > Though it is generally a good practice to follow, my concern is the
> > > > > clean-up is likely to cause conflicts with some on-going changes.
> If
> > I
> > > > may
> > > > > suggest, the dedicated clean-up tasks should avoid
> > > > > - modules that are undergoing multiple feature changes/PRs
> > > > > - modules that are planned to have major refactoring due to design
> > > > changes
> > > > > (since clean-up can be done altogether during refactoring)
> > > > >
> > > > > On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <vi...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Not sure if I fully agree with sweeping statements being made.
> But,
> > > +1
> > > > > for
> > > > > > structuring this work via Jiras and having some committer
> “accept”
> > > the
> > > > > > issue first.  Some of these tend to be subjective and we do need
> to
> > > > make
> > > > > > different tradeoffs.
> > > > > >
> > > > > > On Tue, Jan 21, 2020 at 1:28 AM vino yang <yanghua1127@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Pratyaksh,
> > > > > > >
> > > > > > > Thanks for your thought.
> > > > > > >
> > > > > > > Let's listen to others' comments. If there is no objection, we
> > will
> > > > > > follow
> > > > > > > this way.
> > > > > > >
> > > > > > > Best,
> > > > > > > Vino
> > > > > > >
> > > > > > >
> > > > > > > Pratyaksh Sharma <pr...@gmail.com> 于2020年1月21日周二
> 下午4:56写道:
> > > > > > >
> > > > > > > > Hi Vino,
> > > > > > > >
> > > > > > > > Big +1 for this initiative. I have done this code cleanup for
> > > test
> > > > > > > classes
> > > > > > > > in the past and strongly feel there is a need to do the same
> at
> > > > other
> > > > > > > > places as well. I would definitely like to volunteer for
> this.
> > > > > > > >
> > > > > > > > On Tue, Jan 21, 2020 at 1:52 PM vino yang <
> > yanghua1127@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi folks,
> > > > > > > > >
> > > > > > > > > Currently, the code quality of some Hudi module is not very
> > > well.
> > > > > As
> > > > > > > many
> > > > > > > > > developers have seen, the Intellij IDEA has shown many
> > > > intellisense
> > > > > > > about
> > > > > > > > > cleanup and improvement. The community does not object to
> > doing
> > > > the
> > > > > > > > cleanup
> > > > > > > > > and improvement work and the work has been started via some
> > > > direct
> > > > > > > > "minor"
> > > > > > > > > PRs by some volunteers. The current way is unorganized and
> > hard
> > > > to
> > > > > > > > manage.
> > > > > > > > > For tracking this work, I prefer to manage this work with
> the
> > > > Jira
> > > > > > > issue.
> > > > > > > > > We can create an umbrella issue. Then, split the work into
> > > > several
> > > > > > > > > subtasks.
> > > > > > > > >
> > > > > > > > > Since those "bad smell" lays anywhere in the whole project.
> > > It's
> > > > > > > > difficult
> > > > > > > > > to give a standard to split the subtasks. For example, some
> > > files
> > > > > > have
> > > > > > > a
> > > > > > > > > lot while some modules have few. So I suggest the standard
> > > would
> > > > > > depend
> > > > > > > > on
> > > > > > > > > the volume of the changes. Before working, any subtask
> should
> > > > find
> > > > > a
> > > > > > > > > committer as a mentor who would judge and approve the scope
> > is
> > > > > > > suitable.
> > > > > > > > >
> > > > > > > > > What do you think?
> > > > > > > > >
> > > > > > > > > Any comments and suggestions would be appreciated.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Vino
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by vino yang <ya...@gmail.com>.
Hi Vinoth,

Thank you for your thoughts, I agree that focusing on some higher priority
work is more valuable.

This discussion is to sort out and manage the work that the community is
already doing. There are currently some PRs working on this type of work,
such as PR[1][2][3][4]. The community has not given guidance on these
tasks. I think it's not very appropriate to open a "MINOR" PR directly. So,
I want to hear from the community and how to manage them more effectively.
The discussion does not encourage to give a higher priority to such work.

We haven't stopped this kind of work, so we should provide effective
guidance and organization so that it doesn't look disorganized. WYDT?

Best,
Vino

[1]: https://github.com/apache/incubator-hudi/pull/1237
[2]: https://github.com/apache/incubator-hudi/pull/1139
[3]: https://github.com/apache/incubator-hudi/pull/1137
[4]: https://github.com/apache/incubator-hudi/pull/1136

Vinoth Chandar <vi...@apache.org> 于2020年1月23日周四 下午1:20写道:

> Hi,
>
> Thanks everyone for sharing your views!
>
> Some of this conversation is starting to feel like boiling the ocean. I
> believe in refactoring with purpose and discussing class-by-class or
> module-by-module does not make sense to me. Can we first list down what we
> want to achieve? So far, I have only heard fixing IDE/IntelliJ warnings.
> Also instead of focussing on new work, how about looking at the pending
> JIRAs under "Testing" "Code Cleanup" components first and see if those are
> worth tackling.
>
> We went down this path for code formatting and today we still have
> inconsistencies. Looking back, I feel we should have clearly defined end
> goals for the cleanups and we can then rank them based on ROI.
>
> Thanks
> Vinoth
>
> On Wed, Jan 22, 2020 at 7:05 PM vino yang <ya...@gmail.com> wrote:
>
> > Hi Shiyan and Bhavani:
> >
> > Thanks for sharing your thoughts.
> >
> > As I originally stated. The advantage of using modules as a unit to split
> > work is that the decomposition is clear, but the disadvantage is that the
> > volume of changes may be huge, which brings huge risks (considering that
> > Hudi's test coverage is still not very high) and the workload of review.
> > The advantage of splitting by class is that the volume of changes is
> small
> > and the review is more convenient, but the disadvantages are too many
> tasks
> > and high maintenance costs.
> >
> >
> > *In addition, we need to define the boundaries of the "code cleanup" I
> > expressed in this topic: it is limited to the smart tips shown by
> Intellij
> > IDEA. If the boundaries are too wide, then this discussion will lose
> > control.*
> > I agree with Bhavani that we don't take it as the actual goal. But we are
> > not opposed to the community to help improve the quality of the code
> > (basically, these tips given by the IDE are more reasonable).
> >
> >
> > So, I still give my thoughts: We manage this work with Jira. Before we
> > start working, we need to find a committer as a mentor. The mentor must
> > decide whether the scale of the subtasks is reasonable and whether
> > additional unit tests need to be added to verify the changes. And the
> > mentor should be responsible for merged changes.
> >
> > What do you think?
> >
> > Best,
> > Vino
> >
> > Bhavani Sudha <bh...@gmail.com> 于2020年1月22日周三 下午2:22写道:
> >
> > > Hi @vinoyang thanks for bringing this to discussion. I feel it would be
> > > less disruptive to clean up code as part of individual classes being
> > > touched for a specific goal rather than code cleanup being the actual
> > goal.
> > > This would narrow the touch point and ensure test coverage (both unit
> and
> > > integration tests)  catches any accidental/unintentional changes. Also
> it
> > > would give chance to change any documentation quoting/referencing that
> > > code. Wanted to share my personal opinion.
> > >
> > > Thanks,
> > > Sudha
> > >
> > >
> > >
> > > On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <
> xu.shiyan.raymond@gmail.com>
> > > wrote:
> > >
> > > > The clean-up work can actually be split by modules.
> > > >
> > > > Though it is generally a good practice to follow, my concern is the
> > > > clean-up is likely to cause conflicts with some on-going changes. If
> I
> > > may
> > > > suggest, the dedicated clean-up tasks should avoid
> > > > - modules that are undergoing multiple feature changes/PRs
> > > > - modules that are planned to have major refactoring due to design
> > > changes
> > > > (since clean-up can be done altogether during refactoring)
> > > >
> > > > On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <vi...@apache.org>
> > > wrote:
> > > >
> > > > > Not sure if I fully agree with sweeping statements being made. But,
> > +1
> > > > for
> > > > > structuring this work via Jiras and having some committer “accept”
> > the
> > > > > issue first.  Some of these tend to be subjective and we do need to
> > > make
> > > > > different tradeoffs.
> > > > >
> > > > > On Tue, Jan 21, 2020 at 1:28 AM vino yang <ya...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Pratyaksh,
> > > > > >
> > > > > > Thanks for your thought.
> > > > > >
> > > > > > Let's listen to others' comments. If there is no objection, we
> will
> > > > > follow
> > > > > > this way.
> > > > > >
> > > > > > Best,
> > > > > > Vino
> > > > > >
> > > > > >
> > > > > > Pratyaksh Sharma <pr...@gmail.com> 于2020年1月21日周二 下午4:56写道:
> > > > > >
> > > > > > > Hi Vino,
> > > > > > >
> > > > > > > Big +1 for this initiative. I have done this code cleanup for
> > test
> > > > > > classes
> > > > > > > in the past and strongly feel there is a need to do the same at
> > > other
> > > > > > > places as well. I would definitely like to volunteer for this.
> > > > > > >
> > > > > > > On Tue, Jan 21, 2020 at 1:52 PM vino yang <
> yanghua1127@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi folks,
> > > > > > > >
> > > > > > > > Currently, the code quality of some Hudi module is not very
> > well.
> > > > As
> > > > > > many
> > > > > > > > developers have seen, the Intellij IDEA has shown many
> > > intellisense
> > > > > > about
> > > > > > > > cleanup and improvement. The community does not object to
> doing
> > > the
> > > > > > > cleanup
> > > > > > > > and improvement work and the work has been started via some
> > > direct
> > > > > > > "minor"
> > > > > > > > PRs by some volunteers. The current way is unorganized and
> hard
> > > to
> > > > > > > manage.
> > > > > > > > For tracking this work, I prefer to manage this work with the
> > > Jira
> > > > > > issue.
> > > > > > > > We can create an umbrella issue. Then, split the work into
> > > several
> > > > > > > > subtasks.
> > > > > > > >
> > > > > > > > Since those "bad smell" lays anywhere in the whole project.
> > It's
> > > > > > > difficult
> > > > > > > > to give a standard to split the subtasks. For example, some
> > files
> > > > > have
> > > > > > a
> > > > > > > > lot while some modules have few. So I suggest the standard
> > would
> > > > > depend
> > > > > > > on
> > > > > > > > the volume of the changes. Before working, any subtask should
> > > find
> > > > a
> > > > > > > > committer as a mentor who would judge and approve the scope
> is
> > > > > > suitable.
> > > > > > > >
> > > > > > > > What do you think?
> > > > > > > >
> > > > > > > > Any comments and suggestions would be appreciated.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Vino
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by Vinoth Chandar <vi...@apache.org>.
Hi,

Thanks everyone for sharing your views!

Some of this conversation is starting to feel like boiling the ocean. I
believe in refactoring with purpose and discussing class-by-class or
module-by-module does not make sense to me. Can we first list down what we
want to achieve? So far, I have only heard fixing IDE/IntelliJ warnings.
Also instead of focussing on new work, how about looking at the pending
JIRAs under "Testing" "Code Cleanup" components first and see if those are
worth tackling.

We went down this path for code formatting and today we still have
inconsistencies. Looking back, I feel we should have clearly defined end
goals for the cleanups and we can then rank them based on ROI.

Thanks
Vinoth

On Wed, Jan 22, 2020 at 7:05 PM vino yang <ya...@gmail.com> wrote:

> Hi Shiyan and Bhavani:
>
> Thanks for sharing your thoughts.
>
> As I originally stated. The advantage of using modules as a unit to split
> work is that the decomposition is clear, but the disadvantage is that the
> volume of changes may be huge, which brings huge risks (considering that
> Hudi's test coverage is still not very high) and the workload of review.
> The advantage of splitting by class is that the volume of changes is small
> and the review is more convenient, but the disadvantages are too many tasks
> and high maintenance costs.
>
>
> *In addition, we need to define the boundaries of the "code cleanup" I
> expressed in this topic: it is limited to the smart tips shown by Intellij
> IDEA. If the boundaries are too wide, then this discussion will lose
> control.*
> I agree with Bhavani that we don't take it as the actual goal. But we are
> not opposed to the community to help improve the quality of the code
> (basically, these tips given by the IDE are more reasonable).
>
>
> So, I still give my thoughts: We manage this work with Jira. Before we
> start working, we need to find a committer as a mentor. The mentor must
> decide whether the scale of the subtasks is reasonable and whether
> additional unit tests need to be added to verify the changes. And the
> mentor should be responsible for merged changes.
>
> What do you think?
>
> Best,
> Vino
>
> Bhavani Sudha <bh...@gmail.com> 于2020年1月22日周三 下午2:22写道:
>
> > Hi @vinoyang thanks for bringing this to discussion. I feel it would be
> > less disruptive to clean up code as part of individual classes being
> > touched for a specific goal rather than code cleanup being the actual
> goal.
> > This would narrow the touch point and ensure test coverage (both unit and
> > integration tests)  catches any accidental/unintentional changes. Also it
> > would give chance to change any documentation quoting/referencing that
> > code. Wanted to share my personal opinion.
> >
> > Thanks,
> > Sudha
> >
> >
> >
> > On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <xu...@gmail.com>
> > wrote:
> >
> > > The clean-up work can actually be split by modules.
> > >
> > > Though it is generally a good practice to follow, my concern is the
> > > clean-up is likely to cause conflicts with some on-going changes. If I
> > may
> > > suggest, the dedicated clean-up tasks should avoid
> > > - modules that are undergoing multiple feature changes/PRs
> > > - modules that are planned to have major refactoring due to design
> > changes
> > > (since clean-up can be done altogether during refactoring)
> > >
> > > On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <vi...@apache.org>
> > wrote:
> > >
> > > > Not sure if I fully agree with sweeping statements being made. But,
> +1
> > > for
> > > > structuring this work via Jiras and having some committer “accept”
> the
> > > > issue first.  Some of these tend to be subjective and we do need to
> > make
> > > > different tradeoffs.
> > > >
> > > > On Tue, Jan 21, 2020 at 1:28 AM vino yang <ya...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Pratyaksh,
> > > > >
> > > > > Thanks for your thought.
> > > > >
> > > > > Let's listen to others' comments. If there is no objection, we will
> > > > follow
> > > > > this way.
> > > > >
> > > > > Best,
> > > > > Vino
> > > > >
> > > > >
> > > > > Pratyaksh Sharma <pr...@gmail.com> 于2020年1月21日周二 下午4:56写道:
> > > > >
> > > > > > Hi Vino,
> > > > > >
> > > > > > Big +1 for this initiative. I have done this code cleanup for
> test
> > > > > classes
> > > > > > in the past and strongly feel there is a need to do the same at
> > other
> > > > > > places as well. I would definitely like to volunteer for this.
> > > > > >
> > > > > > On Tue, Jan 21, 2020 at 1:52 PM vino yang <yanghua1127@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi folks,
> > > > > > >
> > > > > > > Currently, the code quality of some Hudi module is not very
> well.
> > > As
> > > > > many
> > > > > > > developers have seen, the Intellij IDEA has shown many
> > intellisense
> > > > > about
> > > > > > > cleanup and improvement. The community does not object to doing
> > the
> > > > > > cleanup
> > > > > > > and improvement work and the work has been started via some
> > direct
> > > > > > "minor"
> > > > > > > PRs by some volunteers. The current way is unorganized and hard
> > to
> > > > > > manage.
> > > > > > > For tracking this work, I prefer to manage this work with the
> > Jira
> > > > > issue.
> > > > > > > We can create an umbrella issue. Then, split the work into
> > several
> > > > > > > subtasks.
> > > > > > >
> > > > > > > Since those "bad smell" lays anywhere in the whole project.
> It's
> > > > > > difficult
> > > > > > > to give a standard to split the subtasks. For example, some
> files
> > > > have
> > > > > a
> > > > > > > lot while some modules have few. So I suggest the standard
> would
> > > > depend
> > > > > > on
> > > > > > > the volume of the changes. Before working, any subtask should
> > find
> > > a
> > > > > > > committer as a mentor who would judge and approve the scope is
> > > > > suitable.
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > Any comments and suggestions would be appreciated.
> > > > > > >
> > > > > > > Best,
> > > > > > > Vino
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by vino yang <ya...@gmail.com>.
Hi Shiyan and Bhavani:

Thanks for sharing your thoughts.

As I originally stated. The advantage of using modules as a unit to split
work is that the decomposition is clear, but the disadvantage is that the
volume of changes may be huge, which brings huge risks (considering that
Hudi's test coverage is still not very high) and the workload of review.
The advantage of splitting by class is that the volume of changes is small
and the review is more convenient, but the disadvantages are too many tasks
and high maintenance costs.


*In addition, we need to define the boundaries of the "code cleanup" I
expressed in this topic: it is limited to the smart tips shown by Intellij
IDEA. If the boundaries are too wide, then this discussion will lose
control.*
I agree with Bhavani that we don't take it as the actual goal. But we are
not opposed to the community to help improve the quality of the code
(basically, these tips given by the IDE are more reasonable).


So, I still give my thoughts: We manage this work with Jira. Before we
start working, we need to find a committer as a mentor. The mentor must
decide whether the scale of the subtasks is reasonable and whether
additional unit tests need to be added to verify the changes. And the
mentor should be responsible for merged changes.

What do you think?

Best,
Vino

Bhavani Sudha <bh...@gmail.com> 于2020年1月22日周三 下午2:22写道:

> Hi @vinoyang thanks for bringing this to discussion. I feel it would be
> less disruptive to clean up code as part of individual classes being
> touched for a specific goal rather than code cleanup being the actual goal.
> This would narrow the touch point and ensure test coverage (both unit and
> integration tests)  catches any accidental/unintentional changes. Also it
> would give chance to change any documentation quoting/referencing that
> code. Wanted to share my personal opinion.
>
> Thanks,
> Sudha
>
>
>
> On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <xu...@gmail.com>
> wrote:
>
> > The clean-up work can actually be split by modules.
> >
> > Though it is generally a good practice to follow, my concern is the
> > clean-up is likely to cause conflicts with some on-going changes. If I
> may
> > suggest, the dedicated clean-up tasks should avoid
> > - modules that are undergoing multiple feature changes/PRs
> > - modules that are planned to have major refactoring due to design
> changes
> > (since clean-up can be done altogether during refactoring)
> >
> > On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <vi...@apache.org>
> wrote:
> >
> > > Not sure if I fully agree with sweeping statements being made. But,  +1
> > for
> > > structuring this work via Jiras and having some committer “accept” the
> > > issue first.  Some of these tend to be subjective and we do need to
> make
> > > different tradeoffs.
> > >
> > > On Tue, Jan 21, 2020 at 1:28 AM vino yang <ya...@gmail.com>
> wrote:
> > >
> > > > Hi Pratyaksh,
> > > >
> > > > Thanks for your thought.
> > > >
> > > > Let's listen to others' comments. If there is no objection, we will
> > > follow
> > > > this way.
> > > >
> > > > Best,
> > > > Vino
> > > >
> > > >
> > > > Pratyaksh Sharma <pr...@gmail.com> 于2020年1月21日周二 下午4:56写道:
> > > >
> > > > > Hi Vino,
> > > > >
> > > > > Big +1 for this initiative. I have done this code cleanup for test
> > > > classes
> > > > > in the past and strongly feel there is a need to do the same at
> other
> > > > > places as well. I would definitely like to volunteer for this.
> > > > >
> > > > > On Tue, Jan 21, 2020 at 1:52 PM vino yang <ya...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > Currently, the code quality of some Hudi module is not very well.
> > As
> > > > many
> > > > > > developers have seen, the Intellij IDEA has shown many
> intellisense
> > > > about
> > > > > > cleanup and improvement. The community does not object to doing
> the
> > > > > cleanup
> > > > > > and improvement work and the work has been started via some
> direct
> > > > > "minor"
> > > > > > PRs by some volunteers. The current way is unorganized and hard
> to
> > > > > manage.
> > > > > > For tracking this work, I prefer to manage this work with the
> Jira
> > > > issue.
> > > > > > We can create an umbrella issue. Then, split the work into
> several
> > > > > > subtasks.
> > > > > >
> > > > > > Since those "bad smell" lays anywhere in the whole project. It's
> > > > > difficult
> > > > > > to give a standard to split the subtasks. For example, some files
> > > have
> > > > a
> > > > > > lot while some modules have few. So I suggest the standard would
> > > depend
> > > > > on
> > > > > > the volume of the changes. Before working, any subtask should
> find
> > a
> > > > > > committer as a mentor who would judge and approve the scope is
> > > > suitable.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > Any comments and suggestions would be appreciated.
> > > > > >
> > > > > > Best,
> > > > > > Vino
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by Bhavani Sudha <bh...@gmail.com>.
Hi @vinoyang thanks for bringing this to discussion. I feel it would be
less disruptive to clean up code as part of individual classes being
touched for a specific goal rather than code cleanup being the actual goal.
This would narrow the touch point and ensure test coverage (both unit and
integration tests)  catches any accidental/unintentional changes. Also it
would give chance to change any documentation quoting/referencing that
code. Wanted to share my personal opinion.

Thanks,
Sudha



On Tue, Jan 21, 2020 at 11:36 AM Shiyan Xu <xu...@gmail.com>
wrote:

> The clean-up work can actually be split by modules.
>
> Though it is generally a good practice to follow, my concern is the
> clean-up is likely to cause conflicts with some on-going changes. If I may
> suggest, the dedicated clean-up tasks should avoid
> - modules that are undergoing multiple feature changes/PRs
> - modules that are planned to have major refactoring due to design changes
> (since clean-up can be done altogether during refactoring)
>
> On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <vi...@apache.org> wrote:
>
> > Not sure if I fully agree with sweeping statements being made. But,  +1
> for
> > structuring this work via Jiras and having some committer “accept” the
> > issue first.  Some of these tend to be subjective and we do need to make
> > different tradeoffs.
> >
> > On Tue, Jan 21, 2020 at 1:28 AM vino yang <ya...@gmail.com> wrote:
> >
> > > Hi Pratyaksh,
> > >
> > > Thanks for your thought.
> > >
> > > Let's listen to others' comments. If there is no objection, we will
> > follow
> > > this way.
> > >
> > > Best,
> > > Vino
> > >
> > >
> > > Pratyaksh Sharma <pr...@gmail.com> 于2020年1月21日周二 下午4:56写道:
> > >
> > > > Hi Vino,
> > > >
> > > > Big +1 for this initiative. I have done this code cleanup for test
> > > classes
> > > > in the past and strongly feel there is a need to do the same at other
> > > > places as well. I would definitely like to volunteer for this.
> > > >
> > > > On Tue, Jan 21, 2020 at 1:52 PM vino yang <ya...@gmail.com>
> > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > Currently, the code quality of some Hudi module is not very well.
> As
> > > many
> > > > > developers have seen, the Intellij IDEA has shown many intellisense
> > > about
> > > > > cleanup and improvement. The community does not object to doing the
> > > > cleanup
> > > > > and improvement work and the work has been started via some direct
> > > > "minor"
> > > > > PRs by some volunteers. The current way is unorganized and hard to
> > > > manage.
> > > > > For tracking this work, I prefer to manage this work with the Jira
> > > issue.
> > > > > We can create an umbrella issue. Then, split the work into several
> > > > > subtasks.
> > > > >
> > > > > Since those "bad smell" lays anywhere in the whole project. It's
> > > > difficult
> > > > > to give a standard to split the subtasks. For example, some files
> > have
> > > a
> > > > > lot while some modules have few. So I suggest the standard would
> > depend
> > > > on
> > > > > the volume of the changes. Before working, any subtask should find
> a
> > > > > committer as a mentor who would judge and approve the scope is
> > > suitable.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Any comments and suggestions would be appreciated.
> > > > >
> > > > > Best,
> > > > > Vino
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by Shiyan Xu <xu...@gmail.com>.
The clean-up work can actually be split by modules.

Though it is generally a good practice to follow, my concern is the
clean-up is likely to cause conflicts with some on-going changes. If I may
suggest, the dedicated clean-up tasks should avoid
- modules that are undergoing multiple feature changes/PRs
- modules that are planned to have major refactoring due to design changes
(since clean-up can be done altogether during refactoring)

On Tue, Jan 21, 2020 at 4:17 AM Vinoth Chandar <vi...@apache.org> wrote:

> Not sure if I fully agree with sweeping statements being made. But,  +1 for
> structuring this work via Jiras and having some committer “accept” the
> issue first.  Some of these tend to be subjective and we do need to make
> different tradeoffs.
>
> On Tue, Jan 21, 2020 at 1:28 AM vino yang <ya...@gmail.com> wrote:
>
> > Hi Pratyaksh,
> >
> > Thanks for your thought.
> >
> > Let's listen to others' comments. If there is no objection, we will
> follow
> > this way.
> >
> > Best,
> > Vino
> >
> >
> > Pratyaksh Sharma <pr...@gmail.com> 于2020年1月21日周二 下午4:56写道:
> >
> > > Hi Vino,
> > >
> > > Big +1 for this initiative. I have done this code cleanup for test
> > classes
> > > in the past and strongly feel there is a need to do the same at other
> > > places as well. I would definitely like to volunteer for this.
> > >
> > > On Tue, Jan 21, 2020 at 1:52 PM vino yang <ya...@gmail.com>
> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > Currently, the code quality of some Hudi module is not very well. As
> > many
> > > > developers have seen, the Intellij IDEA has shown many intellisense
> > about
> > > > cleanup and improvement. The community does not object to doing the
> > > cleanup
> > > > and improvement work and the work has been started via some direct
> > > "minor"
> > > > PRs by some volunteers. The current way is unorganized and hard to
> > > manage.
> > > > For tracking this work, I prefer to manage this work with the Jira
> > issue.
> > > > We can create an umbrella issue. Then, split the work into several
> > > > subtasks.
> > > >
> > > > Since those "bad smell" lays anywhere in the whole project. It's
> > > difficult
> > > > to give a standard to split the subtasks. For example, some files
> have
> > a
> > > > lot while some modules have few. So I suggest the standard would
> depend
> > > on
> > > > the volume of the changes. Before working, any subtask should find a
> > > > committer as a mentor who would judge and approve the scope is
> > suitable.
> > > >
> > > > What do you think?
> > > >
> > > > Any comments and suggestions would be appreciated.
> > > >
> > > > Best,
> > > > Vino
> > > >
> > >
> >
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by Vinoth Chandar <vi...@apache.org>.
Not sure if I fully agree with sweeping statements being made. But,  +1 for
structuring this work via Jiras and having some committer “accept” the
issue first.  Some of these tend to be subjective and we do need to make
different tradeoffs.

On Tue, Jan 21, 2020 at 1:28 AM vino yang <ya...@gmail.com> wrote:

> Hi Pratyaksh,
>
> Thanks for your thought.
>
> Let's listen to others' comments. If there is no objection, we will follow
> this way.
>
> Best,
> Vino
>
>
> Pratyaksh Sharma <pr...@gmail.com> 于2020年1月21日周二 下午4:56写道:
>
> > Hi Vino,
> >
> > Big +1 for this initiative. I have done this code cleanup for test
> classes
> > in the past and strongly feel there is a need to do the same at other
> > places as well. I would definitely like to volunteer for this.
> >
> > On Tue, Jan 21, 2020 at 1:52 PM vino yang <ya...@gmail.com> wrote:
> >
> > > Hi folks,
> > >
> > > Currently, the code quality of some Hudi module is not very well. As
> many
> > > developers have seen, the Intellij IDEA has shown many intellisense
> about
> > > cleanup and improvement. The community does not object to doing the
> > cleanup
> > > and improvement work and the work has been started via some direct
> > "minor"
> > > PRs by some volunteers. The current way is unorganized and hard to
> > manage.
> > > For tracking this work, I prefer to manage this work with the Jira
> issue.
> > > We can create an umbrella issue. Then, split the work into several
> > > subtasks.
> > >
> > > Since those "bad smell" lays anywhere in the whole project. It's
> > difficult
> > > to give a standard to split the subtasks. For example, some files have
> a
> > > lot while some modules have few. So I suggest the standard would depend
> > on
> > > the volume of the changes. Before working, any subtask should find a
> > > committer as a mentor who would judge and approve the scope is
> suitable.
> > >
> > > What do you think?
> > >
> > > Any comments and suggestions would be appreciated.
> > >
> > > Best,
> > > Vino
> > >
> >
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by vino yang <ya...@gmail.com>.
Hi Pratyaksh,

Thanks for your thought.

Let's listen to others' comments. If there is no objection, we will follow
this way.

Best,
Vino


Pratyaksh Sharma <pr...@gmail.com> 于2020年1月21日周二 下午4:56写道:

> Hi Vino,
>
> Big +1 for this initiative. I have done this code cleanup for test classes
> in the past and strongly feel there is a need to do the same at other
> places as well. I would definitely like to volunteer for this.
>
> On Tue, Jan 21, 2020 at 1:52 PM vino yang <ya...@gmail.com> wrote:
>
> > Hi folks,
> >
> > Currently, the code quality of some Hudi module is not very well. As many
> > developers have seen, the Intellij IDEA has shown many intellisense about
> > cleanup and improvement. The community does not object to doing the
> cleanup
> > and improvement work and the work has been started via some direct
> "minor"
> > PRs by some volunteers. The current way is unorganized and hard to
> manage.
> > For tracking this work, I prefer to manage this work with the Jira issue.
> > We can create an umbrella issue. Then, split the work into several
> > subtasks.
> >
> > Since those "bad smell" lays anywhere in the whole project. It's
> difficult
> > to give a standard to split the subtasks. For example, some files have a
> > lot while some modules have few. So I suggest the standard would depend
> on
> > the volume of the changes. Before working, any subtask should find a
> > committer as a mentor who would judge and approve the scope is suitable.
> >
> > What do you think?
> >
> > Any comments and suggestions would be appreciated.
> >
> > Best,
> > Vino
> >
>

Re: [DISCUSS] Unify Hudi code cleanup and improvement

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Hi Vino,

Big +1 for this initiative. I have done this code cleanup for test classes
in the past and strongly feel there is a need to do the same at other
places as well. I would definitely like to volunteer for this.

On Tue, Jan 21, 2020 at 1:52 PM vino yang <ya...@gmail.com> wrote:

> Hi folks,
>
> Currently, the code quality of some Hudi module is not very well. As many
> developers have seen, the Intellij IDEA has shown many intellisense about
> cleanup and improvement. The community does not object to doing the cleanup
> and improvement work and the work has been started via some direct "minor"
> PRs by some volunteers. The current way is unorganized and hard to manage.
> For tracking this work, I prefer to manage this work with the Jira issue.
> We can create an umbrella issue. Then, split the work into several
> subtasks.
>
> Since those "bad smell" lays anywhere in the whole project. It's difficult
> to give a standard to split the subtasks. For example, some files have a
> lot while some modules have few. So I suggest the standard would depend on
> the volume of the changes. Before working, any subtask should find a
> committer as a mentor who would judge and approve the scope is suitable.
>
> What do you think?
>
> Any comments and suggestions would be appreciated.
>
> Best,
> Vino
>