You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Konstantin Gribov <gr...@gmail.com> on 2017/03/29 17:39:40 UTC

[DISCUSS] Contribution guide & style enforcement

Hi, folks.

Currently we have something like contribution guide parts in several places
(I thought about [1] and [2] and Chris also mentioned [3]) covering
different facets of contributing to Apache Tika.

One thing which make me upset is that we have very inconsistent codebase
with different style, formatting, dependency management. It seems
inevitable on some stage of any popular open source project developed by
many contributors. But we can make it more consistent with moderate effort
for maintaining status quo after.

I propose:

   1. make one source of truth about contribution guide and then
   automatically mirror it to README.md/CONTRIBUTING.md for github, publish on
   tika.a.o etc;
   2. add info about logging in tika-core and other packages to these
   contribution guide to make all contributions consistent with current policy
   (with examples how logging should be used in different modules):
      1. JUL in tika-core
      2. SLF4J in `private static final Logger LOG` field in all other
      modules;
      3. Allow to use logging backend (log4j) in tests (e.g. for tuning log
      levels for upstream libraries) and standalone application (e.g.
to support
      `--quiet` and `--verbose` CLI keys);
      4. Document logging configuration in case OSGi bundle is used;
   3. add info about dependency handling (e.g. no additional deps in
   tika-core policy, exlusion of commons-logging/commons-logging-api/log4j
   from dependencies etc);
   4. integrate checkstyle plugin [5], [6] to Maven build to allow
   contributors easily check that their code is conformant with simple policy
   to start (4 spaces indent, no TABs, spaces before opening braces, spaces
   after if/else/try/catch/finally, egyptian-style braces);
   5. add documentation about checkstyle [5] configuration in IDE to
   simplify it's usage (I can write one for JetBrains IDEA at least).

Main point are to bring Tika codebase to more consistent and clear state,
simplify its maintainance and make it easier for contributors to make clean
and pretty patches. Checkstyle configuration should be as simple as it can
be to real to refactor.

Also, these items should be integrated gradually, step by step.

What do you think, folks?
Would it be good thing for Tika and its community?
Would it bring any serios challenges of which I've forgot?

[1]: http://tika.apache.org/contribute.html
[2]: https://wiki.apache.org/tika/DeveloperResources
[3]: https://github.com/apache/tika/#contributing-via-github
[4]: https://issues.apache.org/jira/browse/TIKA-2316 tracking issue
[5]: http://checkstyle.sourceforge.net/
[6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/



-- 

Best regards,
Konstantin Gribov

Re: [DISCUSS] Contribution guide & style enforcement

Posted by "Mattmann, Chris A (3010)" <ch...@jpl.nasa.gov>.
Konstantin,

I’ve read through this and I am +1 to try. Trying to organize and make the code
base better makes sense to me. I don’t want to make it scary for people to 
contribute, but based on the below, I think it’s guidelines and tools to help us.

So this is great. +1

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 3/29/17, 1:39 PM, "Konstantin Gribov" <gr...@gmail.com> wrote:

    Hi, folks.
    
    Currently we have something like contribution guide parts in several places
    (I thought about [1] and [2] and Chris also mentioned [3]) covering
    different facets of contributing to Apache Tika.
    
    One thing which make me upset is that we have very inconsistent codebase
    with different style, formatting, dependency management. It seems
    inevitable on some stage of any popular open source project developed by
    many contributors. But we can make it more consistent with moderate effort
    for maintaining status quo after.
    
    I propose:
    
       1. make one source of truth about contribution guide and then
       automatically mirror it to README.md/CONTRIBUTING.md for github, publish on
       tika.a.o etc;
       2. add info about logging in tika-core and other packages to these
       contribution guide to make all contributions consistent with current policy
       (with examples how logging should be used in different modules):
          1. JUL in tika-core
          2. SLF4J in `private static final Logger LOG` field in all other
          modules;
          3. Allow to use logging backend (log4j) in tests (e.g. for tuning log
          levels for upstream libraries) and standalone application (e.g.
    to support
          `--quiet` and `--verbose` CLI keys);
          4. Document logging configuration in case OSGi bundle is used;
       3. add info about dependency handling (e.g. no additional deps in
       tika-core policy, exlusion of commons-logging/commons-logging-api/log4j
       from dependencies etc);
       4. integrate checkstyle plugin [5], [6] to Maven build to allow
       contributors easily check that their code is conformant with simple policy
       to start (4 spaces indent, no TABs, spaces before opening braces, spaces
       after if/else/try/catch/finally, egyptian-style braces);
       5. add documentation about checkstyle [5] configuration in IDE to
       simplify it's usage (I can write one for JetBrains IDEA at least).
    
    Main point are to bring Tika codebase to more consistent and clear state,
    simplify its maintainance and make it easier for contributors to make clean
    and pretty patches. Checkstyle configuration should be as simple as it can
    be to real to refactor.
    
    Also, these items should be integrated gradually, step by step.
    
    What do you think, folks?
    Would it be good thing for Tika and its community?
    Would it bring any serios challenges of which I've forgot?
    
    [1]: http://tika.apache.org/contribute.html
    [2]: https://wiki.apache.org/tika/DeveloperResources
    [3]: https://github.com/apache/tika/#contributing-via-github
    [4]: https://issues.apache.org/jira/browse/TIKA-2316 tracking issue
    [5]: http://checkstyle.sourceforge.net/
    [6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/
    
    
    
    -- 
    
    Best regards,
    Konstantin Gribov
    


Re: [DISCUSS] Contribution guide & style enforcement

Posted by Oleg Tikhonov <ol...@apache.org>.
Definitely true, +1

On Wed, Mar 29, 2017 at 9:19 PM, Allison, Timothy B. <ta...@mitre.org>
wrote:

> +1  Y, thank you!
>
> -----Original Message-----
> From: Ken Krugler [mailto:kkrugler_lists@transpac.com]
> Sent: Wednesday, March 29, 2017 2:07 PM
> To: dev@tika.apache.org
> Subject: Re: [DISCUSS] Contribution guide & style enforcement
>
> Hi Konstantin,
>
> Thanks for the thoughtful and detailed writeup.
>
> And yes, +1 to all 5 top-level suggestions.
>
> — Ken
>
> > On Mar 29, 2017, at 10:39am, Konstantin Gribov <gr...@gmail.com>
> wrote:
> >
> > Hi, folks.
> >
> > Currently we have something like contribution guide parts in several
> > places (I thought about [1] and [2] and Chris also mentioned [3])
> > covering different facets of contributing to Apache Tika.
> >
> > One thing which make me upset is that we have very inconsistent
> > codebase with different style, formatting, dependency management. It
> > seems inevitable on some stage of any popular open source project
> > developed by many contributors. But we can make it more consistent
> > with moderate effort for maintaining status quo after.
> >
> > I propose:
> >
> >   1. make one source of truth about contribution guide and then
> >   automatically mirror it to README.md/CONTRIBUTING.md for github,
> publish on
> >   tika.a.o etc;
> >   2. add info about logging in tika-core and other packages to these
> >   contribution guide to make all contributions consistent with current
> policy
> >   (with examples how logging should be used in different modules):
> >      1. JUL in tika-core
> >      2. SLF4J in `private static final Logger LOG` field in all other
> >      modules;
> >      3. Allow to use logging backend (log4j) in tests (e.g. for tuning
> log
> >      levels for upstream libraries) and standalone application (e.g.
> > to support
> >      `--quiet` and `--verbose` CLI keys);
> >      4. Document logging configuration in case OSGi bundle is used;
> >   3. add info about dependency handling (e.g. no additional deps in
> >   tika-core policy, exlusion of commons-logging/commons-
> logging-api/log4j
> >   from dependencies etc);
> >   4. integrate checkstyle plugin [5], [6] to Maven build to allow
> >   contributors easily check that their code is conformant with simple
> policy
> >   to start (4 spaces indent, no TABs, spaces before opening braces,
> spaces
> >   after if/else/try/catch/finally, egyptian-style braces);
> >   5. add documentation about checkstyle [5] configuration in IDE to
> >   simplify it's usage (I can write one for JetBrains IDEA at least).
> >
> > Main point are to bring Tika codebase to more consistent and clear
> > state, simplify its maintainance and make it easier for contributors
> > to make clean and pretty patches. Checkstyle configuration should be
> > as simple as it can be to real to refactor.
> >
> > Also, these items should be integrated gradually, step by step.
> >
> > What do you think, folks?
> > Would it be good thing for Tika and its community?
> > Would it bring any serios challenges of which I've forgot?
> >
> > [1]: http://tika.apache.org/contribute.html
> > [2]: https://wiki.apache.org/tika/DeveloperResources
> > [3]: https://github.com/apache/tika/#contributing-via-github
> > [4]: https://issues.apache.org/jira/browse/TIKA-2316 tracking issue
> > [5]: http://checkstyle.sourceforge.net/
> > [6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/
> >
> >
> >
> > --
> >
> > Best regards,
> > Konstantin Gribov
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>

RE: [DISCUSS] Contribution guide & style enforcement

Posted by "Allison, Timothy B." <ta...@mitre.org>.
+1  Y, thank you!

-----Original Message-----
From: Ken Krugler [mailto:kkrugler_lists@transpac.com] 
Sent: Wednesday, March 29, 2017 2:07 PM
To: dev@tika.apache.org
Subject: Re: [DISCUSS] Contribution guide & style enforcement

Hi Konstantin,

Thanks for the thoughtful and detailed writeup.

And yes, +1 to all 5 top-level suggestions.

— Ken

> On Mar 29, 2017, at 10:39am, Konstantin Gribov <gr...@gmail.com> wrote:
> 
> Hi, folks.
> 
> Currently we have something like contribution guide parts in several 
> places (I thought about [1] and [2] and Chris also mentioned [3]) 
> covering different facets of contributing to Apache Tika.
> 
> One thing which make me upset is that we have very inconsistent 
> codebase with different style, formatting, dependency management. It 
> seems inevitable on some stage of any popular open source project 
> developed by many contributors. But we can make it more consistent 
> with moderate effort for maintaining status quo after.
> 
> I propose:
> 
>   1. make one source of truth about contribution guide and then
>   automatically mirror it to README.md/CONTRIBUTING.md for github, publish on
>   tika.a.o etc;
>   2. add info about logging in tika-core and other packages to these
>   contribution guide to make all contributions consistent with current policy
>   (with examples how logging should be used in different modules):
>      1. JUL in tika-core
>      2. SLF4J in `private static final Logger LOG` field in all other
>      modules;
>      3. Allow to use logging backend (log4j) in tests (e.g. for tuning log
>      levels for upstream libraries) and standalone application (e.g.
> to support
>      `--quiet` and `--verbose` CLI keys);
>      4. Document logging configuration in case OSGi bundle is used;
>   3. add info about dependency handling (e.g. no additional deps in
>   tika-core policy, exlusion of commons-logging/commons-logging-api/log4j
>   from dependencies etc);
>   4. integrate checkstyle plugin [5], [6] to Maven build to allow
>   contributors easily check that their code is conformant with simple policy
>   to start (4 spaces indent, no TABs, spaces before opening braces, spaces
>   after if/else/try/catch/finally, egyptian-style braces);
>   5. add documentation about checkstyle [5] configuration in IDE to
>   simplify it's usage (I can write one for JetBrains IDEA at least).
> 
> Main point are to bring Tika codebase to more consistent and clear 
> state, simplify its maintainance and make it easier for contributors 
> to make clean and pretty patches. Checkstyle configuration should be 
> as simple as it can be to real to refactor.
> 
> Also, these items should be integrated gradually, step by step.
> 
> What do you think, folks?
> Would it be good thing for Tika and its community?
> Would it bring any serios challenges of which I've forgot?
> 
> [1]: http://tika.apache.org/contribute.html
> [2]: https://wiki.apache.org/tika/DeveloperResources
> [3]: https://github.com/apache/tika/#contributing-via-github
> [4]: https://issues.apache.org/jira/browse/TIKA-2316 tracking issue
> [5]: http://checkstyle.sourceforge.net/
> [6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/
> 
> 
> 
> --
> 
> Best regards,
> Konstantin Gribov

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr




Re: [DISCUSS] Contribution guide & style enforcement

Posted by Ken Krugler <kk...@transpac.com>.
Hi Konstantin,

Thanks for the thoughtful and detailed writeup.

And yes, +1 to all 5 top-level suggestions.

— Ken

> On Mar 29, 2017, at 10:39am, Konstantin Gribov <gr...@gmail.com> wrote:
> 
> Hi, folks.
> 
> Currently we have something like contribution guide parts in several places
> (I thought about [1] and [2] and Chris also mentioned [3]) covering
> different facets of contributing to Apache Tika.
> 
> One thing which make me upset is that we have very inconsistent codebase
> with different style, formatting, dependency management. It seems
> inevitable on some stage of any popular open source project developed by
> many contributors. But we can make it more consistent with moderate effort
> for maintaining status quo after.
> 
> I propose:
> 
>   1. make one source of truth about contribution guide and then
>   automatically mirror it to README.md/CONTRIBUTING.md for github, publish on
>   tika.a.o etc;
>   2. add info about logging in tika-core and other packages to these
>   contribution guide to make all contributions consistent with current policy
>   (with examples how logging should be used in different modules):
>      1. JUL in tika-core
>      2. SLF4J in `private static final Logger LOG` field in all other
>      modules;
>      3. Allow to use logging backend (log4j) in tests (e.g. for tuning log
>      levels for upstream libraries) and standalone application (e.g.
> to support
>      `--quiet` and `--verbose` CLI keys);
>      4. Document logging configuration in case OSGi bundle is used;
>   3. add info about dependency handling (e.g. no additional deps in
>   tika-core policy, exlusion of commons-logging/commons-logging-api/log4j
>   from dependencies etc);
>   4. integrate checkstyle plugin [5], [6] to Maven build to allow
>   contributors easily check that their code is conformant with simple policy
>   to start (4 spaces indent, no TABs, spaces before opening braces, spaces
>   after if/else/try/catch/finally, egyptian-style braces);
>   5. add documentation about checkstyle [5] configuration in IDE to
>   simplify it's usage (I can write one for JetBrains IDEA at least).
> 
> Main point are to bring Tika codebase to more consistent and clear state,
> simplify its maintainance and make it easier for contributors to make clean
> and pretty patches. Checkstyle configuration should be as simple as it can
> be to real to refactor.
> 
> Also, these items should be integrated gradually, step by step.
> 
> What do you think, folks?
> Would it be good thing for Tika and its community?
> Would it bring any serios challenges of which I've forgot?
> 
> [1]: http://tika.apache.org/contribute.html
> [2]: https://wiki.apache.org/tika/DeveloperResources
> [3]: https://github.com/apache/tika/#contributing-via-github
> [4]: https://issues.apache.org/jira/browse/TIKA-2316 tracking issue
> [5]: http://checkstyle.sourceforge.net/
> [6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/
> 
> 
> 
> -- 
> 
> Best regards,
> Konstantin Gribov

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr