You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Peter Vary <pv...@cloudera.com.INVALID> on 2022/06/15 09:04:54 UTC

[DISCUSS] Spellcheck on CI

Hi Team,

I have seen that the CI test are continuously failing with the following errors:

Check warning on line 18 in serde/src/java/org/apache/hadoop/hive/serde2/esriJson/deserializer/SpatialReferenceJsonDeserializer.java
GitHub Actions
/ Spell checking
serde/src/java/org/apache/hadoop/hive/serde2/esriJson/deserializer/SpatialReferenceJsonDeserializer.java#L18
`esri` is not a recognized word.

Sourabh Badhya kindly helped me identifying the commit which caused this issue:
https://github.com/apache/hive/commit/0099b14aa6a50d4470b057e93a95a7391b74add7 <https://github.com/apache/hive/commit/0099b14aa6a50d4470b057e93a95a7391b74add7>

The jira for the change is:
https://issues.apache.org/jira/browse/HIVE-25733 <https://issues.apache.org/jira/browse/HIVE-25733>

To fix the issues I pushed an addendum:
https://github.com/apache/hive/commit/0b4e466866fe07a160b0e4b0c27d2b3fb7613c45 <https://github.com/apache/hive/commit/0b4e466866fe07a160b0e4b0c27d2b3fb7613c45>

Hopefully after a rebase all of the tests should be green now (the first tests are running now).


The new tests add a spellcheck for the CI. Currently it runs only for the `serde` package. The check parses the code files, and if it founds some unrecognisable / non-dictionary words, then it throws an error.

This could be useful for catching spelling errors not recognised by the reviewers, but keeping the exception list (in .github/actions/spelling/expect.txt) might become cumbersome. Currently we have 452 entries in the file for the `serde` package alone.

I would like to hear the community’s opinion whether we would like to keep it and use it for other modules as well.

Thanks,
Peter

Re: [DISCUSS] Spellcheck on CI

Posted by Stamatis Zampetakis <za...@gmail.com>.
It's the first time that I see a spellchecker in action in the projects
that I contribute to so I can't formulate an opinion before I use it for
sometime.

Activating in every module or removing altogether may be a bit premature at
this stage.
I would suggest keeping things as they are right now and revisit the
situation in the near future.
If others want to act now that is fine with me as well.

Best,
Stamatis

On Wed, Jun 15, 2022 at 11:32 AM Ayush Saxena <ay...@gmail.com> wrote:

> Doesn’t make sense to me untill and unless that typo can create any major
> impact, like it is there in the docs, in the name of metrics/UDF or system
> tables or at places where correcting the name might become an incompatible
> change.
> Checking the except.txt it has entries like dfs, Hdfs and so as well.
> If we try to cover up other FileSystems and all it would have
> ofs,o3fs,s3fs,abfs and many more cases in that case this except.txt will
> become huge and very tough to manage.
> Secondly, it has some strange entries as well, decoding their meaning
> isn’t very possible for everyone..
>
> Not blocking, but we shouldn’t in general consdier extending this to whole
> project may be just pick and choose the relevant places/packages only if we
> plan to increase the coverage
>
> -Ayush
>
> > On 15-Jun-2022, at 2:35 PM, Peter Vary <pv...@cloudera.com.invalid>
> wrote:
> >
> > Hi Team,
> >
> > I have seen that the CI test are continuously failing with the following
> errors:
> >
> > Check warning on line 18 in
> serde/src/java/org/apache/hadoop/hive/serde2/esriJson/deserializer/SpatialReferenceJsonDeserializer.java
> > GitHub Actions
> > / Spell checking
> >
> serde/src/java/org/apache/hadoop/hive/serde2/esriJson/deserializer/SpatialReferenceJsonDeserializer.java#L18
> > `esri` is not a recognized word.
> >
> > Sourabh Badhya kindly helped me identifying the commit which caused this
> issue:
> >
> https://github.com/apache/hive/commit/0099b14aa6a50d4470b057e93a95a7391b74add7
> <
> https://github.com/apache/hive/commit/0099b14aa6a50d4470b057e93a95a7391b74add7
> >
> >
> > The jira for the change is:
> > https://issues.apache.org/jira/browse/HIVE-25733 <
> https://issues.apache.org/jira/browse/HIVE-25733>
> >
> > To fix the issues I pushed an addendum:
> >
> https://github.com/apache/hive/commit/0b4e466866fe07a160b0e4b0c27d2b3fb7613c45
> <
> https://github.com/apache/hive/commit/0b4e466866fe07a160b0e4b0c27d2b3fb7613c45
> >
> >
> > Hopefully after a rebase all of the tests should be green now (the first
> tests are running now).
> >
> >
> > The new tests add a spellcheck for the CI. Currently it runs only for
> the `serde` package. The check parses the code files, and if it founds some
> unrecognisable / non-dictionary words, then it throws an error.
> >
> > This could be useful for catching spelling errors not recognised by the
> reviewers, but keeping the exception list (in
> .github/actions/spelling/expect.txt) might become cumbersome. Currently we
> have 452 entries in the file for the `serde` package alone.
> >
> > I would like to hear the community’s opinion whether we would like to
> keep it and use it for other modules as well.
> >
> > Thanks,
> > Peter
>

Re: [DISCUSS] Spellcheck on CI

Posted by Ayush Saxena <ay...@gmail.com>.
Doesn’t make sense to me untill and unless that typo can create any major impact, like it is there in the docs, in the name of metrics/UDF or system tables or at places where correcting the name might become an incompatible change.
Checking the except.txt it has entries like dfs, Hdfs and so as well.
If we try to cover up other FileSystems and all it would have ofs,o3fs,s3fs,abfs and many more cases in that case this except.txt will become huge and very tough to manage.
Secondly, it has some strange entries as well, decoding their meaning isn’t very possible for everyone..

Not blocking, but we shouldn’t in general consdier extending this to whole project may be just pick and choose the relevant places/packages only if we plan to increase the coverage 

-Ayush

> On 15-Jun-2022, at 2:35 PM, Peter Vary <pv...@cloudera.com.invalid> wrote:
> 
> Hi Team,
> 
> I have seen that the CI test are continuously failing with the following errors:
> 
> Check warning on line 18 in serde/src/java/org/apache/hadoop/hive/serde2/esriJson/deserializer/SpatialReferenceJsonDeserializer.java
> GitHub Actions
> / Spell checking
> serde/src/java/org/apache/hadoop/hive/serde2/esriJson/deserializer/SpatialReferenceJsonDeserializer.java#L18
> `esri` is not a recognized word.
> 
> Sourabh Badhya kindly helped me identifying the commit which caused this issue:
> https://github.com/apache/hive/commit/0099b14aa6a50d4470b057e93a95a7391b74add7 <https://github.com/apache/hive/commit/0099b14aa6a50d4470b057e93a95a7391b74add7>
> 
> The jira for the change is:
> https://issues.apache.org/jira/browse/HIVE-25733 <https://issues.apache.org/jira/browse/HIVE-25733>
> 
> To fix the issues I pushed an addendum:
> https://github.com/apache/hive/commit/0b4e466866fe07a160b0e4b0c27d2b3fb7613c45 <https://github.com/apache/hive/commit/0b4e466866fe07a160b0e4b0c27d2b3fb7613c45>
> 
> Hopefully after a rebase all of the tests should be green now (the first tests are running now).
> 
> 
> The new tests add a spellcheck for the CI. Currently it runs only for the `serde` package. The check parses the code files, and if it founds some unrecognisable / non-dictionary words, then it throws an error.
> 
> This could be useful for catching spelling errors not recognised by the reviewers, but keeping the exception list (in .github/actions/spelling/expect.txt) might become cumbersome. Currently we have 452 entries in the file for the `serde` package alone.
> 
> I would like to hear the community’s opinion whether we would like to keep it and use it for other modules as well.
> 
> Thanks,
> Peter