You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Hyukjin Kwon <gu...@gmail.com> on 2017/07/13 08:16:39 UTC

Re: Crowdsourced triage Scapegoat compiler plugin warnings

Hi all,


Another gentle ping for help.

Probably, let me open up a JIRA and proceed this after a couple of weeks if
no one is going to do this although I hope someone takes this.


Thanks.

2017-06-18 2:16 GMT+09:00 Sean Owen <so...@cloudera.com>:

> Looks like a whole lot of the results have been analyzed. I suspect
> there's more than enough to act on already. I think we should wait until
> after 2.2 is done.
> Anybody prefer how to proceed here -- just open a JIRA to take care of a
> batch of related types of issues and go for it?
>
> On Sat, Jun 17, 2017 at 4:45 PM Hyukjin Kwon <gu...@gmail.com> wrote:
>
>> Gentle ping to dev for help. I hope this effort is not abandoned.
>>
>>
>> On 25 May 2017 9:41 am, "Josh Rosen" <jo...@databricks.com> wrote:
>>
>> I'm interested in using the Scapegoat
>> <https://github.com/sksamuel/scapegoat> Scala compiler plugin to find
>> potential bugs and performance problems in Spark. Scapegoat has a useful
>> built-in set of inspections and is pretty easy to extend with custom ones.
>> For example, I added an inspection to spot places where we call
>> *.apply()* on a Seq which is not an IndexedSeq
>> <https://github.com/sksamuel/scapegoat/pull/159> in order to make it
>> easier to spot potential O(n^2) performance bugs.
>>
>> There are lots of false-positives and benign warnings (as with any linter
>> / static analyzer) so I don't think it's feasible to us to include this as
>> a blocking step in our regular build. I am planning to build tooling to
>> surface only new warnings so going forward this can become a useful
>> code-review aid.
>>
>> The current codebase has roughly 1700 warnings that I would like to
>> triage and categorize as false-positives or real bugs. I can't do this
>> alone, so here's how you can help:
>>
>>    - Visit the Google Docs spreadsheet at https://docs.google.com/
>>    spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?usp=
>>    sharing
>>    <https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?usp=sharing> and
>>    find an un-triaged warning.
>>    - In the columns at the right of the sheet, enter your name in the
>>    appropriate column to mark a warning as a false-positive or as a real bug
>>    and/or performance issue. If think a warning is a real issue then use the
>>    "comments" column for providing additional detail.
>>    - Please don't file JIRAs or PRs for individual warnings; I suspect
>>    that we'll find clusters of issues which are best fixed in a few larger PRs
>>    vs. lots of smaller ones. Certain warnings are probably simply style issues
>>    so we should discuss those before trying to fix them.
>>
>> The sheet has hidden columns capturing the Spark revision and Scapegoat
>> revision. I can use this to programmatically update the sheet and remap
>> lines after updating either Scapegoat (to suppress false-positives) or
>> Spark (to incorporate fixes and surface new warnings). For those who are
>> interested, the sheet was produced with this script: https://gist.github.
>> com/JoshRosen/1ae12a979880d9a98988aa87d70ff2a8
>>
>> Depending on the results of this experiment we might want to integrate a
>> high-signal subset of the Scapegoat warnings into our build. I'm also
>> hoping that we'll be able to build a useful corpus of triaged warnings in
>> order to help improve Scapegoat itself and eliminate common false-positives.
>>
>> Thanks and happy bug-hunting,
>> Josh Rosen
>>
>>
>>

Re: Crowdsourced triage Scapegoat compiler plugin warnings

Posted by Holden Karau <ho...@pigscanfly.ca>.
I'm happy to help out in the places I'm more familiar with some starting
next week. I suspect things probably just fell to the wayside for some
folks during the 2.2.0 release.

On Thu, Jul 13, 2017 at 1:16 AM, Hyukjin Kwon <gu...@gmail.com> wrote:

> Hi all,
>
>
> Another gentle ping for help.
>
> Probably, let me open up a JIRA and proceed this after a couple of weeks
> if no one is going to do this although I hope someone takes this.
>
>
> Thanks.
>
> 2017-06-18 2:16 GMT+09:00 Sean Owen <so...@cloudera.com>:
>
>> Looks like a whole lot of the results have been analyzed. I suspect
>> there's more than enough to act on already. I think we should wait until
>> after 2.2 is done.
>> Anybody prefer how to proceed here -- just open a JIRA to take care of a
>> batch of related types of issues and go for it?
>>
>> On Sat, Jun 17, 2017 at 4:45 PM Hyukjin Kwon <gu...@gmail.com> wrote:
>>
>>> Gentle ping to dev for help. I hope this effort is not abandoned.
>>>
>>>
>>> On 25 May 2017 9:41 am, "Josh Rosen" <jo...@databricks.com> wrote:
>>>
>>> I'm interested in using the Scapegoat
>>> <https://github.com/sksamuel/scapegoat> Scala compiler plugin to find
>>> potential bugs and performance problems in Spark. Scapegoat has a useful
>>> built-in set of inspections and is pretty easy to extend with custom ones.
>>> For example, I added an inspection to spot places where we call
>>> *.apply()* on a Seq which is not an IndexedSeq
>>> <https://github.com/sksamuel/scapegoat/pull/159> in order to make it
>>> easier to spot potential O(n^2) performance bugs.
>>>
>>> There are lots of false-positives and benign warnings (as with any
>>> linter / static analyzer) so I don't think it's feasible to us to include
>>> this as a blocking step in our regular build. I am planning to build
>>> tooling to surface only new warnings so going forward this can become a
>>> useful code-review aid.
>>>
>>> The current codebase has roughly 1700 warnings that I would like to
>>> triage and categorize as false-positives or real bugs. I can't do this
>>> alone, so here's how you can help:
>>>
>>>    - Visit the Google Docs spreadsheet at https://docs.google.com/spread
>>>    sheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?
>>>    usp=sharing
>>>    <https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?usp=sharing> and
>>>    find an un-triaged warning.
>>>    - In the columns at the right of the sheet, enter your name in the
>>>    appropriate column to mark a warning as a false-positive or as a real bug
>>>    and/or performance issue. If think a warning is a real issue then use the
>>>    "comments" column for providing additional detail.
>>>    - Please don't file JIRAs or PRs for individual warnings; I suspect
>>>    that we'll find clusters of issues which are best fixed in a few larger PRs
>>>    vs. lots of smaller ones. Certain warnings are probably simply style issues
>>>    so we should discuss those before trying to fix them.
>>>
>>> The sheet has hidden columns capturing the Spark revision and Scapegoat
>>> revision. I can use this to programmatically update the sheet and remap
>>> lines after updating either Scapegoat (to suppress false-positives) or
>>> Spark (to incorporate fixes and surface new warnings). For those who are
>>> interested, the sheet was produced with this script:
>>> https://gist.github.com/JoshRosen/1ae12a979880d9a98988aa87d70ff2a8
>>>
>>> Depending on the results of this experiment we might want to integrate a
>>> high-signal subset of the Scapegoat warnings into our build. I'm also
>>> hoping that we'll be able to build a useful corpus of triaged warnings in
>>> order to help improve Scapegoat itself and eliminate common false-positives.
>>>
>>> Thanks and happy bug-hunting,
>>> Josh Rosen
>>>
>>>
>>>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: Crowdsourced triage Scapegoat compiler plugin warnings

Posted by Sean Owen <so...@cloudera.com>.
Possibly, but it will take some degree of sophistication and understanding
of Spark to evaluate these. I'd want committers or at least Spark
developers to look at this rather than users.

On Thu, Jul 13, 2017, 15:00 Jorge Sánchez <jo...@gmail.com> wrote:

> It might be worth sharing this with the user list, there must be people
> willing to collaborate who are not on the dev list.
>
> 2017-07-13 10:00 GMT+01:00 Sean Owen <so...@cloudera.com>:
>
>> I don't think everything needs to be triaged. There are a ton of useful
>> changes that have been identified. I think you could just pick some warning
>> types where they've all been triaged and go fix them.
>>
>> On Thu, Jul 13, 2017 at 9:16 AM Hyukjin Kwon <gu...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>>
>>> Another gentle ping for help.
>>>
>>> Probably, let me open up a JIRA and proceed this after a couple of weeks
>>> if no one is going to do this although I hope someone takes this.
>>>
>>>
>>>
>

Re: Crowdsourced triage Scapegoat compiler plugin warnings

Posted by Jorge Sánchez <jo...@gmail.com>.
It might be worth sharing this with the user list, there must be people
willing to collaborate who are not on the dev list.

2017-07-13 10:00 GMT+01:00 Sean Owen <so...@cloudera.com>:

> I don't think everything needs to be triaged. There are a ton of useful
> changes that have been identified. I think you could just pick some warning
> types where they've all been triaged and go fix them.
>
> On Thu, Jul 13, 2017 at 9:16 AM Hyukjin Kwon <gu...@gmail.com> wrote:
>
>> Hi all,
>>
>>
>> Another gentle ping for help.
>>
>> Probably, let me open up a JIRA and proceed this after a couple of weeks
>> if no one is going to do this although I hope someone takes this.
>>
>>
>>

Re: Crowdsourced triage Scapegoat compiler plugin warnings

Posted by Sean Owen <so...@cloudera.com>.
I don't think everything needs to be triaged. There are a ton of useful
changes that have been identified. I think you could just pick some warning
types where they've all been triaged and go fix them.

On Thu, Jul 13, 2017 at 9:16 AM Hyukjin Kwon <gu...@gmail.com> wrote:

> Hi all,
>
>
> Another gentle ping for help.
>
> Probably, let me open up a JIRA and proceed this after a couple of weeks
> if no one is going to do this although I hope someone takes this.
>
>
>