You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by ???????? ?????? <ma...@hotmail.com> on 2017/11/15 15:15:17 UTC

Issues fixed in Apache Jena

Dear developers,

I am a PhD student in the university of Groningen and the topic of my PhD is the evolution of Technical Debt (TD) in open-source development.
I have analyzed some projects from the Apache Foundation (using SonarQube) and I realized that your project has a tremendous number (405,700) of fixed issues, when we compare it to other projects from Apache.
I would like to ask you the following 3 questions:

  1.  Why had been introduced so many issues of TD into your project?
  2.  The fixing of those issues was in purpose or a coincidence?
  3.  Do you use SonarQube (or SonarLint) in order to detect and fix the issues?

Thank you in advance!

With kind regards,

George Digkas

Re: Issues fixed in Apache Jena

Posted by Rob Vesse <rv...@dotnetrdf.org>.
I think what is being counted is any issue that SonarQube TD reports, and this is being done on every single commit and summed together. This doesn’t seem like a particularly meaningful statistic since you would inevitably count the same issue N times where N is the number of commits between where an issue was introduced and where it was fixed. It seems like there should really be some attempts to perform de-duplication.

I am not sure that this analysis is meaningful in any sense. I imagine it is flagging a lot of trivial stylistic issues of which we have many given the length of time the Code Base has existed for. 

It also sounds like it doesn’t make any attempts to account for common development practices i.e. New code often develops over a series of commits with developers implementing outlines first and then refining and cleaning up a feature and cleaning up a feature as it matures.

Rob

On 16/11/2017, 10:40, "Lorenz B." <co...@googlemail.com> wrote:

    I don't get the meaning of this number:
    
    # issues: 405,700
    
    What is a single issue in your context?
    
    
    On 16.11.2017 11:15, Γεώργιος Δίγκας wrote:
    > Dear All,
    >
    > First of all I would like to thank you very much for your fast response.
    > I have created a spreadsheet (https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing) with all the projects that I have analyzed. Below, I describe briefly the first two sheets.
    > Projects: It contains that stats (Project Name, # of Java Files, # of commits, # of weeks, # of issues over time, and # of FIXED issues over time Fixing Rate) for all the analyzed projects. As the sheet shows the fixing rate for Jena is 90.2%. Which means 90.2% of the issues that were introduced in the project over time, were fixed.
    > Jena: Open Issues - October 7, 2017: On this sheet there all the issues that are still open (42,258) in the last analyzed commit (SHA: 030398cb0f4bef4dad2b7313e8fce171a6179839 and DATE: October 7, 2017).
    >
    >>> How did you go about making these calculations? What span of time does > your analysis concern?
    > I analyzed the last commit for each week of the selected project over their history.
    >>> What are you counting as technical debt (anything > that SonarQube claims is "technical debt")?
    > In this study yes. Everything that SonarQube claims as TD. But, for Apache Jena, the most impressive thing is the fixing rate. You have fixed 90% of the issues that SonarQube detects as TD.
    >>> Are you comparing Jena to > other projects with a similar lifespan? Are you comparing Jena to  > projects that have a similar contribution history? etc. etc.
    > I would say yes. As the Projects sheet shows there are other projects with the same characteristics.
    >>> Where does that number come from?
    > The number (405,700) represents the fixed issues over time. On the last analyzed commit (SHA: 030398cb0f4bef4dad2b7313e8fce171a6179839) there were 481,590 lines of java code.
    >
    > Thank you in advance!
    >
    >
    > With kind regards,
    >
    > George Digkas
    >
    >
    >
    > ________________________________
    > From: Andy Seaborne <an...@apache.org>
    > Sent: Wednesday, November 15, 2017 5:42 PM
    > To: dev@jena.apache.org
    > Subject: Re: Issues fixed in Apache Jena
    >
    >>> a tremendous number (405,700) of fixed issues
    > Where does that number come from?
    >
    > There are "only" 481,512 lines of java code in the entire codebase!
    >
    > (When Jena was imported into Apache, the whole of the SVN history came
    > over as well.)
    >
    >      Andy
    >
    > cloc =>
    >
    > ------------------------------------------------------------------------
    > Language              files          blank        comment           code
    > ------------------------------------------------------------------------
    > HTML                   2669            871         112007         525790
    > Java                   5637         107373         227331         481512
    > XML                     504           1016           1144          52578
    > Maven                   293           3684           6706          52355
    > JavaScript               71           4515           5929          26718
    > Bourne Shell             62            982           1219           4459
    > Bourne Again Shell       51            579            967           2389
    > XSLT                      6            495            225           2010
    > Ruby                      7            307            297           1788
    > CSS                      20            219            194           1629
    > DOS Batch                46            245             27           1026
    > Perl                      6            251            275            651
    > Markdown                 12            132              0            287
    > Smarty                    7             16              0            218
    > DTD                       2             91            147            176
    > INI                       2             19              0             65
    > AspectJ                   1              8             46             36
    > XSD                       1              6             13             34
    > Elixir                    1             12             42              9
    > YAML                      1              0             95              0
    > ------------------------------------------------------------------------
    > SUM:                   9399         120821         356664        1153730
    > ------------------------------------------------------------------------
    >
    >
    >
    > On 15/11/17 15:31, ajs6f@apache.org wrote:
    >> It's not really clear to me how to answer these question without more
    >> context.
    >>
    >> How did you go about making these calculations? What span of time does
    >> your analysis concern? What are you counting as technical debt (anything
    >> that SonarQube claims is "technical debt")? Are you comparing Jena to
    >> other projects with a similar lifespan? Are you comparing Jena to
    >> projects that have a similar contribution history? etc. etc.
    >>
    >>
    >>
    >> ajs6f
    >>
    >> ???????? ?????? wrote on 11/15/17 10:15 AM:
    >>> Dear developers,
    >>>
    >>> I am a PhD student in the university of Groningen and the topic of my
    >>> PhD is the evolution of Technical Debt (TD) in open-source development.
    >>> I have analyzed some projects from the Apache Foundation (using
    >>> SonarQube) and I realized that your project has a tremendous number
    >>> (405,700) of fixed issues, when we compare it to other projects from
    >>> Apache.
    >>> I would like to ask you the following 3 questions:
    >>>
    >>>   1.  Why had been introduced so many issues of TD into your project?
    >>>   2.  The fixing of those issues was in purpose or a coincidence?
    >>>   3.  Do you use SonarQube (or SonarLint) in order to detect and fix
    >>> the issues?
    >>>
    >>> Thank you in advance!
    >>>
    >>> With kind regards,
    >>>
    >>> George Digkas
    >>>
    
    





Re: Issues fixed in Apache Jena

Posted by "Lorenz B." <co...@googlemail.com>.
I don't get the meaning of this number:

# issues: 405,700

What is a single issue in your context?


On 16.11.2017 11:15, Γεώργιος Δίγκας wrote:
> Dear All,
>
> First of all I would like to thank you very much for your fast response.
> I have created a spreadsheet (https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing) with all the projects that I have analyzed. Below, I describe briefly the first two sheets.
> Projects: It contains that stats (Project Name, # of Java Files, # of commits, # of weeks, # of issues over time, and # of FIXED issues over time Fixing Rate) for all the analyzed projects. As the sheet shows the fixing rate for Jena is 90.2%. Which means 90.2% of the issues that were introduced in the project over time, were fixed.
> Jena: Open Issues - October 7, 2017: On this sheet there all the issues that are still open (42,258) in the last analyzed commit (SHA: 030398cb0f4bef4dad2b7313e8fce171a6179839 and DATE: October 7, 2017).
>
>>> How did you go about making these calculations? What span of time does > your analysis concern?
> I analyzed the last commit for each week of the selected project over their history.
>>> What are you counting as technical debt (anything > that SonarQube claims is "technical debt")?
> In this study yes. Everything that SonarQube claims as TD. But, for Apache Jena, the most impressive thing is the fixing rate. You have fixed 90% of the issues that SonarQube detects as TD.
>>> Are you comparing Jena to > other projects with a similar lifespan? Are you comparing Jena to  > projects that have a similar contribution history? etc. etc.
> I would say yes. As the Projects sheet shows there are other projects with the same characteristics.
>>> Where does that number come from?
> The number (405,700) represents the fixed issues over time. On the last analyzed commit (SHA: 030398cb0f4bef4dad2b7313e8fce171a6179839) there were 481,590 lines of java code.
>
> Thank you in advance!
>
>
> With kind regards,
>
> George Digkas
>
>
>
> ________________________________
> From: Andy Seaborne <an...@apache.org>
> Sent: Wednesday, November 15, 2017 5:42 PM
> To: dev@jena.apache.org
> Subject: Re: Issues fixed in Apache Jena
>
>>> a tremendous number (405,700) of fixed issues
> Where does that number come from?
>
> There are "only" 481,512 lines of java code in the entire codebase!
>
> (When Jena was imported into Apache, the whole of the SVN history came
> over as well.)
>
>      Andy
>
> cloc =>
>
> ------------------------------------------------------------------------
> Language              files          blank        comment           code
> ------------------------------------------------------------------------
> HTML                   2669            871         112007         525790
> Java                   5637         107373         227331         481512
> XML                     504           1016           1144          52578
> Maven                   293           3684           6706          52355
> JavaScript               71           4515           5929          26718
> Bourne Shell             62            982           1219           4459
> Bourne Again Shell       51            579            967           2389
> XSLT                      6            495            225           2010
> Ruby                      7            307            297           1788
> CSS                      20            219            194           1629
> DOS Batch                46            245             27           1026
> Perl                      6            251            275            651
> Markdown                 12            132              0            287
> Smarty                    7             16              0            218
> DTD                       2             91            147            176
> INI                       2             19              0             65
> AspectJ                   1              8             46             36
> XSD                       1              6             13             34
> Elixir                    1             12             42              9
> YAML                      1              0             95              0
> ------------------------------------------------------------------------
> SUM:                   9399         120821         356664        1153730
> ------------------------------------------------------------------------
>
>
>
> On 15/11/17 15:31, ajs6f@apache.org wrote:
>> It's not really clear to me how to answer these question without more
>> context.
>>
>> How did you go about making these calculations? What span of time does
>> your analysis concern? What are you counting as technical debt (anything
>> that SonarQube claims is "technical debt")? Are you comparing Jena to
>> other projects with a similar lifespan? Are you comparing Jena to
>> projects that have a similar contribution history? etc. etc.
>>
>>
>>
>> ajs6f
>>
>> ???????? ?????? wrote on 11/15/17 10:15 AM:
>>> Dear developers,
>>>
>>> I am a PhD student in the university of Groningen and the topic of my
>>> PhD is the evolution of Technical Debt (TD) in open-source development.
>>> I have analyzed some projects from the Apache Foundation (using
>>> SonarQube) and I realized that your project has a tremendous number
>>> (405,700) of fixed issues, when we compare it to other projects from
>>> Apache.
>>> I would like to ask you the following 3 questions:
>>>
>>>   1.  Why had been introduced so many issues of TD into your project?
>>>   2.  The fixing of those issues was in purpose or a coincidence?
>>>   3.  Do you use SonarQube (or SonarLint) in order to detect and fix
>>> the issues?
>>>
>>> Thank you in advance!
>>>
>>> With kind regards,
>>>
>>> George Digkas
>>>


Re: Issues fixed in Apache Jena

Posted by Rob Vesse <rv...@dotnetrdf.org>.
Comments inline:

On 20/11/2017, 14:47, "Γεώργιος Δίγκας" <ma...@hotmail.com> wrote:

    Dear Andy and All,
    
    Thank you very much for the information that you have provided to me. You really helped me a lot.
    Below I list some of the most frequent issue types that I have found while I was examining the evolution of Jena.
    
    Issue Name      Issues over time        Fixed   Currently Open
    The members of an interface declaration or class should appear in a pre-defined order   52875   47530   5345
    Sections of code should not be "commented out"  35465   31905   3560
    Method names should comply with a naming convention     32809   29448   3361
    Constant names should comply with a naming convention   18397   16572   1825
    String literals should not be duplicated<https://sbforge.org/sonar/rules/show/squid:S1192?layout=false> 18106   16390   1716
    Standard outputs should not be used directly to log anything<https://sbforge.org/sonar/rules/show/squid:S106?layout=false>      16174   14506   1668
    Exception handlers should preserve the original exceptions<https://sbforge.org/sonar/rules/show/squid:S1166?layout=false>       10763   9747    1016
    Source files should not have any duplicated blocks      9141    8078    1063
    <https://sbforge.org/sonar/rules/show/squid:S1186?layout=false>Methods should not be empty<https://sbforge.org/sonar/rules/show/squid:S1186?layout=false>       8653    7796    857
    switch case clauses should not have too many lines of code<https://sbforge.org/sonar/rules/show/squid:S1151?layout=false>       8280    7491    789
    throws declarations should not be superfluous<https://sbforge.org/sonar/rules/show/squid:RedundantThrowsDeclarationCheck?layout=false>  7012    6138    874
    Class variable fields should not have public accessibility<https://sbforge.org/sonar/rules/show/squid:ClassVariableVisibilityCheck?layout=false>        6593    5954    639
    
    
1. The issue that appears the most frequently is: The members of an interface declaration or class should appear in a pre-defined order. "According to the Java Code Conventions as defined by Oracle, the members of a class or interface declaration should appear in the following order in the source files: 1. Class and instance variables, 2.Constructors, and 3. Methods."

 This is pretty nitpicky, I have rarely seen this enforced because quite frankly it is annoying to enforce and with a modern IDE is somewhat irrelevant anyway

2. The second most frequent is: Sections of code should not be "commented out". "Programmers should not comment out code as it bloats programs and reduces readability.  Unused code should be deleted and can be retrieved from source control history if required."

For projects with a long history, Jena started in 2001, this is pretty common. Often a programmer will leave a section of code commented out because they rewrote that section and can’t verify that they didn’t break something and may want to easily go back to the old version should bugs be reported.   Also sometimes commented out code is used to illustrate a naïve/simple implementation that makes it easier for programmers to understand the intent of the code but then the actual implementation may reflect a more performant, but less understandable, implementation.

      3.  Method names should comply with a naming convention. "Shared naming conventions allow teams to collaborate efficiently. This rule checks that all method names match the default provided regular expression ^[a-z][a-zA-Z0-9]*$"
      4.  Constant names should comply with a naming convention. "Shared coding conventions allow teams to collaborate efficiently. This rule checks that all constant names match the default regular expression ^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$"

These are both personal preference type things. Some projects enforce this while others do not, again with a history as long as ours enforcing this is undesirable.

 I know that you are using the default here. If you are using the same approach to analyse other projects you may be wanting to customise these rules to match the actual conventions that a given project uses.

      5.  String literals should not be duplicated. "Duplicated string literals make the process of refactoring error-prone, since you must be sure to update all occurrences.  On the other hand, constants can be referenced from many places, but only need to be updated in a single place."

 I suspect that some of this comes from the fact that Jena consists of many modules and not every module depends on every other module so there will inevitably be some duplication across modules. Also string literals are used heavily in test cases where there will be plenty of intentional duplication since each set of tests should be independent of each other.

      6.  Standard outputs should not be used directly to log anything. "When logging a message there are several important requirements which must be fulfilled:
    * The user must be able to easily retrieve the logs
    * The format of all logged message must be uniform to allow the user to easily read the log
    * Logged data must actually be recorded
    * Sensitive data must only be logged securely
    If a program directly writes to the standard outputs, there is absolutely no way to comply with those requirements. That's why defining and using a dedicated logger is highly recommended."

 We do predominantly use a logging framework. However, we also provide a suite of command line tools which must interact with Standard output to be useful to the end user.

      7.  Exception handlers should preserve the original exceptions. "When handling a caught exception, the original exception's message and stack trace should be logged or passed forward."

 While this is an ideal there are many cases where actually you do want to suppress errors. Either because the underlying error is not helpful or because you are explicitly trying to recover from an error and you don’t want your cleanup code to be derailed by further errors. We use this pattern extensively in relation to IO, when something goes wrong with IO you want to make sure that resource handles are not leaked by closing streams etc but you don’t know in advance that the attempts to close might also fail i.e. you want to make a best efforts to clean up and the further errors are irrelevant to the original error.

      8.  Source files should not have any duplicated blocks. "An issue is created on a file as soon as there is at least one block of duplicated code on this file"
      9.  Methods should not be empty. "There are several reasons for a method not to have a method body:
    * It is an unintentional omission, and should be fixed to prevent an unexpected behavior in production.
    * It is not yet, or never will be, supported. In this case an UnsupportedOperationException should be thrown.
    * The method is an intentionally-blank override. In this case a nested comment should explain the reason for the blank override."
      10. switch case clauses should not have too many lines of code. "The switch statement should be used only to clearly define some new branches in the control flow. As soon as a case clause contains too many statements this highly decreases the readability of the overall control flow statement. In such case, the content of the case clause should be extracted into a dedicated method."
      11.
    "throws" declarations should not be superfluous. "An exception in a throws declaration in Java is superfluous if it is:
    * listed multiple times
    * a subclass of another listed exception
    * a RuntimeException, or one of its descendants
    * completely unnecessary because the declared exception type cannot actually be thrown"
      12. Class variable fields should not have public accessibility. "Public class variable fields do not respect the encapsulation principle and has three main disadvantages:
    * Additional behavior such as validation cannot be added.
    * The internal representation is exposed, and cannot be changed afterwards.
    * Member values are subject to change from anywhere in the code and may not meet the programmer's assumptions.
    By using private attributes and accessor methods (set and get), unauthorized modifications are prevented."

 Again these are mostly stylistic preference and history combined. We try to avoid making unnecessary changes to our code base, and with 11 and 12 these form part of our public API which we have to take great care in making changes to in order to ensure we don’t create problems for our users when they upgrade Versions.
    
    I would like to ask you:
    
      *   If you agree/disagree (and why)
      *   Are you aware of those?
      *   Why do you think that they introduced?

Comments with regards to your first three questions are inline above.

I think my general comment is that much of these “issues” are stylistic preferences in my personal opinion.  The Oracle Java code style referenced by a few of those rules is a specific example of a style, while projects may follow some aspects of it each project tends to evolve its own style and may incorporate several different styles as it evolves over time. There are other base styles but I don’t think I have ever worked on a single project, neither open source or commercial, that has wholesale adopted one of those. Even in project that strictly enforced styles those styles have been tailored to the project in question.

      *   Do you think that all those introduced because Jena does not enforce a "squash-before-merging" policy?

This has zero relevance in my opinion. This is primarily down to the long history of the code base and the fact that we haven’t ever enforced a specific code style. While we try and avoid introducing these issues they are very much a minor concern compared to addressing actual functional bug reports.

      *   Are you planing to fix some issues of the above categories?

Probably not. As with all open source projects Jena depends on volunteers to donate their time. Volunteers typically choose to spend their time on feature and bug work, people are not interested on spending their time on cleaning up years of stylistic inconsistency which as I stated are often personal preference issues anyway.

Also where things form the public API we make a concerted effort to avoid unnecessary changes. And where we do consider the changes necessary we follow a deprecation process that means old methods and fields often continue to exist for several releases alongside their new equivalents before being removed.

Rob

      *   All the currently open issues are listed on the sheet: Jena: Open Issues - October 7, 2017 of this<https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing> spreadsheet
    
    Thank you in advance!
    
    
    With kind regards,
    
    
    George Digkas
    
    ________________________________
    From: Andy Seaborne <an...@apache.org>
    Sent: Saturday, November 18, 2017 6:43 PM
    To: Γεώργιος Δίγκας; dev@jena.apache.org
    Subject: Re: Issues fixed in Apache Jena
    
    Jena was registered at SourceForge 2001-11-20
    
    I found this in ASF SVN:
    
    [[
    Added Mon Nov 26 17:41:44 2001 UTC (15 years, 11 months ago) by der
    ]]
    on copyright.txt.
    
    so it looks like we have full history somewhere in the Apache
    infrastructure.
    
    CVS:SF->SVN:SF->SVN:ASF
    
    ASF git does not include the pre-git history.
    
         Andy
    
    On 16/11/17 11:55, Andy Seaborne wrote:
    > Do not take git as complete!
    >
    > Jena started in 2000.
    > https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html
    >
    > Jena 2.0 was released 2003-08-28.
    > A whole 40M including dependencies! A 14.7M zip file!
    > https://sourceforge.net/projects/jena/files/
    >
    > The whole of SF SVN history was imported by the Apache infrastructure
    > team (a herculean effort) into Apache SVN. I don't know how to get to it
    > from git, it may not be there and only in SVN.
    >
    > The earliest git root commit is for the move to Apache from SF
    > [4298106f1e], 6 years ago. (There are 4 root commits due to merges)
    
    
    
    
    >
    > ---
    >
    > It's an interesting start and to make the analysis usefully inform the
    > reader as to the state of the project I suggest treating different kinds
    > of issues different, not uniformly important.
    >
    > There are many (, many) minor things and they outweigh the major
    > problems. Calling them all "issues" gives them equal weight. Some are
    > about canonicalization of the code.
    >
    > Yet reformatting the whole code base (if practical, which it arguable)
    > then greatly decreases the usefulness of git history. That would be a
    > huge loss.
    >
    > (NB the "issue" word has a specific meaning for JIRA which a lot of
    > Apache projects use. Jena's current total, now, is 1424.)
    >
    >      Andy
    >
    >>
    >> Thank you in advance!
    >>
    >>
    >> With kind regards,
    >>
    >> George Digkas
    





Re: Issues fixed in Apache Jena

Posted by Γεώργιος Δίγκας <ma...@hotmail.com>.
Dear Andy and All,

Thank you very much for the information that you have provided to me. You really helped me a lot.
Below I list some of the most frequent issue types that I have found while I was examining the evolution of Jena.

Issue Name      Issues over time        Fixed   Currently Open
The members of an interface declaration or class should appear in a pre-defined order   52875   47530   5345
Sections of code should not be "commented out"  35465   31905   3560
Method names should comply with a naming convention     32809   29448   3361
Constant names should comply with a naming convention   18397   16572   1825
String literals should not be duplicated<https://sbforge.org/sonar/rules/show/squid:S1192?layout=false> 18106   16390   1716
Standard outputs should not be used directly to log anything<https://sbforge.org/sonar/rules/show/squid:S106?layout=false>      16174   14506   1668
Exception handlers should preserve the original exceptions<https://sbforge.org/sonar/rules/show/squid:S1166?layout=false>       10763   9747    1016
Source files should not have any duplicated blocks      9141    8078    1063
<https://sbforge.org/sonar/rules/show/squid:S1186?layout=false>Methods should not be empty<https://sbforge.org/sonar/rules/show/squid:S1186?layout=false>       8653    7796    857
switch case clauses should not have too many lines of code<https://sbforge.org/sonar/rules/show/squid:S1151?layout=false>       8280    7491    789
throws declarations should not be superfluous<https://sbforge.org/sonar/rules/show/squid:RedundantThrowsDeclarationCheck?layout=false>  7012    6138    874
Class variable fields should not have public accessibility<https://sbforge.org/sonar/rules/show/squid:ClassVariableVisibilityCheck?layout=false>        6593    5954    639


  1.  The issue that appears the most frequently is: The members of an interface declaration or class should appear in a pre-defined order. "According to the Java Code Conventions as defined by Oracle, the members of a class or interface declaration should appear in the following order in the source files: 1. Class and instance variables, 2.Constructors, and 3. Methods."
  2.  The second most frequent is: Sections of code should not be "commented out". "Programmers should not comment out code as it bloats programs and reduces readability.  Unused code should be deleted and can be retrieved from source control history if required."
  3.  Method names should comply with a naming convention. "Shared naming conventions allow teams to collaborate efficiently. This rule checks that all method names match the default provided regular expression ^[a-z][a-zA-Z0-9]*$"
  4.  Constant names should comply with a naming convention. "Shared coding conventions allow teams to collaborate efficiently. This rule checks that all constant names match the default regular expression ^[A-Z][A-Z0-9]*(_[A-Z0-9]+)*$"
  5.  String literals should not be duplicated. "Duplicated string literals make the process of refactoring error-prone, since you must be sure to update all occurrences.  On the other hand, constants can be referenced from many places, but only need to be updated in a single place."
  6.  Standard outputs should not be used directly to log anything. "When logging a message there are several important requirements which must be fulfilled:
* The user must be able to easily retrieve the logs
* The format of all logged message must be uniform to allow the user to easily read the log
* Logged data must actually be recorded
* Sensitive data must only be logged securely
If a program directly writes to the standard outputs, there is absolutely no way to comply with those requirements. That's why defining and using a dedicated logger is highly recommended."
  7.  Exception handlers should preserve the original exceptions. "When handling a caught exception, the original exception's message and stack trace should be logged or passed forward."
  8.  Source files should not have any duplicated blocks. "An issue is created on a file as soon as there is at least one block of duplicated code on this file"
  9.  Methods should not be empty. "There are several reasons for a method not to have a method body:
* It is an unintentional omission, and should be fixed to prevent an unexpected behavior in production.
* It is not yet, or never will be, supported. In this case an UnsupportedOperationException should be thrown.
* The method is an intentionally-blank override. In this case a nested comment should explain the reason for the blank override."
  10. switch case clauses should not have too many lines of code. "The switch statement should be used only to clearly define some new branches in the control flow. As soon as a case clause contains too many statements this highly decreases the readability of the overall control flow statement. In such case, the content of the case clause should be extracted into a dedicated method."
  11.
"throws" declarations should not be superfluous. "An exception in a throws declaration in Java is superfluous if it is:
* listed multiple times
* a subclass of another listed exception
* a RuntimeException, or one of its descendants
* completely unnecessary because the declared exception type cannot actually be thrown"
  12. Class variable fields should not have public accessibility. "Public class variable fields do not respect the encapsulation principle and has three main disadvantages:
* Additional behavior such as validation cannot be added.
* The internal representation is exposed, and cannot be changed afterwards.
* Member values are subject to change from anywhere in the code and may not meet the programmer's assumptions.
By using private attributes and accessor methods (set and get), unauthorized modifications are prevented."

I would like to ask you:

  *   If you agree/disagree (and why)
  *   Are you aware of those?
  *   Why do you think that they introduced?
  *   Do you think that all those introduced because Jena does not enforce a "squash-before-merging" policy?
  *   Are you planing to fix some issues of the above categories?
  *   All the currently open issues are listed on the sheet: Jena: Open Issues - October 7, 2017 of this<https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing> spreadsheet

Thank you in advance!


With kind regards,


George Digkas

________________________________
From: Andy Seaborne <an...@apache.org>
Sent: Saturday, November 18, 2017 6:43 PM
To: Γεώργιος Δίγκας; dev@jena.apache.org
Subject: Re: Issues fixed in Apache Jena

Jena was registered at SourceForge 2001-11-20

I found this in ASF SVN:

[[
Added Mon Nov 26 17:41:44 2001 UTC (15 years, 11 months ago) by der
]]
on copyright.txt.

so it looks like we have full history somewhere in the Apache
infrastructure.

CVS:SF->SVN:SF->SVN:ASF

ASF git does not include the pre-git history.

     Andy

On 16/11/17 11:55, Andy Seaborne wrote:
> Do not take git as complete!
>
> Jena started in 2000.
> https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html
>
> Jena 2.0 was released 2003-08-28.
> A whole 40M including dependencies! A 14.7M zip file!
> https://sourceforge.net/projects/jena/files/
>
> The whole of SF SVN history was imported by the Apache infrastructure
> team (a herculean effort) into Apache SVN. I don't know how to get to it
> from git, it may not be there and only in SVN.
>
> The earliest git root commit is for the move to Apache from SF
> [4298106f1e], 6 years ago. (There are 4 root commits due to merges)




>
> ---
>
> It's an interesting start and to make the analysis usefully inform the
> reader as to the state of the project I suggest treating different kinds
> of issues different, not uniformly important.
>
> There are many (, many) minor things and they outweigh the major
> problems. Calling them all "issues" gives them equal weight. Some are
> about canonicalization of the code.
>
> Yet reformatting the whole code base (if practical, which it arguable)
> then greatly decreases the usefulness of git history. That would be a
> huge loss.
>
> (NB the "issue" word has a specific meaning for JIRA which a lot of
> Apache projects use. Jena's current total, now, is 1424.)
>
>      Andy
>
>>
>> Thank you in advance!
>>
>>
>> With kind regards,
>>
>> George Digkas

Re: Issues fixed in Apache Jena

Posted by Andy Seaborne <an...@apache.org>.
Jena was registered at SourceForge 2001-11-20

I found this in ASF SVN:

[[
Added Mon Nov 26 17:41:44 2001 UTC (15 years, 11 months ago) by der
]]
on copyright.txt.

so it looks like we have full history somewhere in the Apache 
infrastructure.

CVS:SF->SVN:SF->SVN:ASF

ASF git does not include the pre-git history.

     Andy

On 16/11/17 11:55, Andy Seaborne wrote:
> Do not take git as complete!
> 
> Jena started in 2000.
> https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html
> 
> Jena 2.0 was released 2003-08-28.
> A whole 40M including dependencies! A 14.7M zip file!
> https://sourceforge.net/projects/jena/files/
> 
> The whole of SF SVN history was imported by the Apache infrastructure 
> team (a herculean effort) into Apache SVN. I don't know how to get to it 
> from git, it may not be there and only in SVN.
> 
> The earliest git root commit is for the move to Apache from SF 
> [4298106f1e], 6 years ago. (There are 4 root commits due to merges)




> 
> ---
> 
> It's an interesting start and to make the analysis usefully inform the 
> reader as to the state of the project I suggest treating different kinds 
> of issues different, not uniformly important.
> 
> There are many (, many) minor things and they outweigh the major 
> problems. Calling them all "issues" gives them equal weight. Some are 
> about canonicalization of the code.
> 
> Yet reformatting the whole code base (if practical, which it arguable) 
> then greatly decreases the usefulness of git history. That would be a 
> huge loss.
> 
> (NB the "issue" word has a specific meaning for JIRA which a lot of 
> Apache projects use. Jena's current total, now, is 1424.)
> 
>      Andy
> 
>>
>> Thank you in advance!
>>
>>
>> With kind regards,
>>
>> George Digkas

Re: Issues fixed in Apache Jena

Posted by ajs6f <aj...@apache.org>.
One possibility: Jena does not (and I assume never did) enforce a "squash-before-merging" policy.

That is to say, if I write a PR with ten commits, and it is approved, and we merge it, it will normally go in as all ten commits. Some projects demand that such a PR be "squashed" (all ten commits be reduced into one with the sum of changes present) before merging. 

If that is part of the difference, I suppose it should show up in the same way as a difference between Jena and other projects in the number of commits per time unit in the main branch.

ajs6f

> On Nov 16, 2017, at 7:55 AM, Γεώργιος Δίγκας <ma...@hotmail.com> wrote:
> 
> Dear All,
> 
> I would like to thank you for your replies!
> 
>>> What is a single issue in your context?
> SonarQube uses a set of coding rules<https://docs.sonarqube.org/display/SONAR/Issues> in order to measure the TD. While running an analysis, it raises an issue every time a piece of code breaks a coding rule.
>>> I think what is being counted is any issue that SonarQube TD reports, and this is being done on every single commit and summed together. This doesn’t seem like a particularly meaningful statistic since you would inevitably count the same issue N times where N is the number of commits between where an issue was introduced and where it was fixed. It seems like there should really be some attempts to perform de-duplication.
> The number refers to unique issues and it does not include any duplication. (If one issue was fixed and then after some time the same issue appeared in the same piece of code I count as new).
>>> It also sounds like it doesn’t make any attempts to account for common development practices i.e. New code often develops over a series of commits with developers implementing outlines first and then refining and cleaning up a feature and cleaning up a feature as it matures.
> I totally agree with the last sentence. As I said on my previous e-mails the cleaning up rate on your project is the highest among the Apache projects that I analyzed and I am wondering why is that. What practices do you follow? Is it a coincidence?
>>> There are many (, many) minor things and they outweigh the major problems. Calling them all "issues" gives them equal weight. Some are about canonicalization of the code.
> I have updated the previously sent spreadsheet (https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing). Now on the sheet: Jena: Open Issues - October 7, 2017 I have added the Severity and the Type of each issue and you can filter them based on these two criteria (they are based on SoanrQube's default classification).
>>> NB the "issue" word has a specific meaning for JIRA which a lot of Apache projects use. Jena's current total, now, is 1424.
> Thank you for the clarification. I should had mention in my first e-mail that I refer to SonarQube's Issues and to to Jira.
> 
> With kind regards,
> 
> George Digkas
> ________________________________
> From: Andy Seaborne <an...@apache.org>
> Sent: Thursday, November 16, 2017 12:55 PM
> To: Γεώργιος Δίγκας; dev@jena.apache.org
> Subject: Re: Issues fixed in Apache Jena
> 
> Do not take git as complete!
> 
> Jena started in 2000.
> https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html
> 
> Jena 2.0 was released 2003-08-28.
> A whole 40M including dependencies! A 14.7M zip file!
> https://sourceforge.net/projects/jena/files/
> 
> The whole of SF SVN history was imported by the Apache infrastructure
> team (a herculean effort) into Apache SVN. I don't know how to get to it
> from git, it may not be there and only in SVN.
> 
> The earliest git root commit is for the move to Apache from SF
> [4298106f1e], 6 years ago. (There are 4 root commits due to merges)
> 
> ---
> 
> It's an interesting start and to make the analysis usefully inform the
> reader as to the state of the project I suggest treating different kinds
> of issues different, not uniformly important.
> 
> There are many (, many) minor things and they outweigh the major
> problems. Calling them all "issues" gives them equal weight. Some are
> about canonicalization of the code.
> 
> Yet reformatting the whole code base (if practical, which it arguable)
> then greatly decreases the usefulness of git history. That would be a
> huge loss.
> 
> (NB the "issue" word has a specific meaning for JIRA which a lot of
> Apache projects use. Jena's current total, now, is 1424.)
> 
>     Andy
> 
>> 
>> Thank you in advance!
>> 
>> 
>> With kind regards,
>> 
>> George Digkas


Re: Issues fixed in Apache Jena

Posted by Γεώργιος Δίγκας <ma...@hotmail.com>.
Dear All,

I would like to thank you for your replies!

>> What is a single issue in your context?
SonarQube uses a set of coding rules<https://docs.sonarqube.org/display/SONAR/Issues> in order to measure the TD. While running an analysis, it raises an issue every time a piece of code breaks a coding rule.
>> I think what is being counted is any issue that SonarQube TD reports, and this is being done on every single commit and summed together. This doesn’t seem like a particularly meaningful statistic since you would inevitably count the same issue N times where N is the number of commits between where an issue was introduced and where it was fixed. It seems like there should really be some attempts to perform de-duplication.
The number refers to unique issues and it does not include any duplication. (If one issue was fixed and then after some time the same issue appeared in the same piece of code I count as new).
>>  It also sounds like it doesn’t make any attempts to account for common development practices i.e. New code often develops over a series of commits with developers implementing outlines first and then refining and cleaning up a feature and cleaning up a feature as it matures.
I totally agree with the last sentence. As I said on my previous e-mails the cleaning up rate on your project is the highest among the Apache projects that I analyzed and I am wondering why is that. What practices do you follow? Is it a coincidence?
>> There are many (, many) minor things and they outweigh the major problems. Calling them all "issues" gives them equal weight. Some are about canonicalization of the code.
I have updated the previously sent spreadsheet (https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing). Now on the sheet: Jena: Open Issues - October 7, 2017 I have added the Severity and the Type of each issue and you can filter them based on these two criteria (they are based on SoanrQube's default classification).
>> NB the "issue" word has a specific meaning for JIRA which a lot of Apache projects use. Jena's current total, now, is 1424.
Thank you for the clarification. I should had mention in my first e-mail that I refer to SonarQube's Issues and to to Jira.

With kind regards,

George Digkas
________________________________
From: Andy Seaborne <an...@apache.org>
Sent: Thursday, November 16, 2017 12:55 PM
To: Γεώργιος Δίγκας; dev@jena.apache.org
Subject: Re: Issues fixed in Apache Jena

Do not take git as complete!

Jena started in 2000.
https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html

Jena 2.0 was released 2003-08-28.
A whole 40M including dependencies! A 14.7M zip file!
https://sourceforge.net/projects/jena/files/

The whole of SF SVN history was imported by the Apache infrastructure
team (a herculean effort) into Apache SVN. I don't know how to get to it
from git, it may not be there and only in SVN.

The earliest git root commit is for the move to Apache from SF
[4298106f1e], 6 years ago. (There are 4 root commits due to merges)

---

It's an interesting start and to make the analysis usefully inform the
reader as to the state of the project I suggest treating different kinds
of issues different, not uniformly important.

There are many (, many) minor things and they outweigh the major
problems. Calling them all "issues" gives them equal weight. Some are
about canonicalization of the code.

Yet reformatting the whole code base (if practical, which it arguable)
then greatly decreases the usefulness of git history. That would be a
huge loss.

(NB the "issue" word has a specific meaning for JIRA which a lot of
Apache projects use. Jena's current total, now, is 1424.)

     Andy

>
> Thank you in advance!
>
>
> With kind regards,
>
> George Digkas

Re: Issues fixed in Apache Jena

Posted by Andy Seaborne <an...@apache.org>.
Do not take git as complete!

Jena started in 2000.
https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html

Jena 2.0 was released 2003-08-28.
A whole 40M including dependencies! A 14.7M zip file!
https://sourceforge.net/projects/jena/files/

The whole of SF SVN history was imported by the Apache infrastructure 
team (a herculean effort) into Apache SVN. I don't know how to get to it 
from git, it may not be there and only in SVN.

The earliest git root commit is for the move to Apache from SF 
[4298106f1e], 6 years ago. (There are 4 root commits due to merges)

---

It's an interesting start and to make the analysis usefully inform the 
reader as to the state of the project I suggest treating different kinds 
of issues different, not uniformly important.

There are many (, many) minor things and they outweigh the major 
problems. Calling them all "issues" gives them equal weight. Some are 
about canonicalization of the code.

Yet reformatting the whole code base (if practical, which it arguable) 
then greatly decreases the usefulness of git history. That would be a 
huge loss.

(NB the "issue" word has a specific meaning for JIRA which a lot of 
Apache projects use. Jena's current total, now, is 1424.)

     Andy

> 
> Thank you in advance!
> 
> 
> With kind regards,
> 
> George Digkas

Re: Issues fixed in Apache Jena

Posted by Γεώργιος Δίγκας <ma...@hotmail.com>.
Dear All,

First of all I would like to thank you very much for your fast response.
I have created a spreadsheet (https://docs.google.com/spreadsheets/d/1DloQ_GS9l2KS6ldgdHOQkjsCB1J_rrMyUauHC_Ymgfk/edit?usp=sharing) with all the projects that I have analyzed. Below, I describe briefly the first two sheets.
Projects: It contains that stats (Project Name, # of Java Files, # of commits, # of weeks, # of issues over time, and # of FIXED issues over time Fixing Rate) for all the analyzed projects. As the sheet shows the fixing rate for Jena is 90.2%. Which means 90.2% of the issues that were introduced in the project over time, were fixed.
Jena: Open Issues - October 7, 2017: On this sheet there all the issues that are still open (42,258) in the last analyzed commit (SHA: 030398cb0f4bef4dad2b7313e8fce171a6179839 and DATE: October 7, 2017).

>> How did you go about making these calculations? What span of time does > your analysis concern?
I analyzed the last commit for each week of the selected project over their history.
>> What are you counting as technical debt (anything > that SonarQube claims is "technical debt")?
In this study yes. Everything that SonarQube claims as TD. But, for Apache Jena, the most impressive thing is the fixing rate. You have fixed 90% of the issues that SonarQube detects as TD.
>> Are you comparing Jena to > other projects with a similar lifespan? Are you comparing Jena to  > projects that have a similar contribution history? etc. etc.
I would say yes. As the Projects sheet shows there are other projects with the same characteristics.
>> Where does that number come from?
The number (405,700) represents the fixed issues over time. On the last analyzed commit (SHA: 030398cb0f4bef4dad2b7313e8fce171a6179839) there were 481,590 lines of java code.

Thank you in advance!


With kind regards,

George Digkas



________________________________
From: Andy Seaborne <an...@apache.org>
Sent: Wednesday, November 15, 2017 5:42 PM
To: dev@jena.apache.org
Subject: Re: Issues fixed in Apache Jena

>> a tremendous number (405,700) of fixed issues

Where does that number come from?

There are "only" 481,512 lines of java code in the entire codebase!

(When Jena was imported into Apache, the whole of the SVN history came
over as well.)

     Andy

cloc =>

------------------------------------------------------------------------
Language              files          blank        comment           code
------------------------------------------------------------------------
HTML                   2669            871         112007         525790
Java                   5637         107373         227331         481512
XML                     504           1016           1144          52578
Maven                   293           3684           6706          52355
JavaScript               71           4515           5929          26718
Bourne Shell             62            982           1219           4459
Bourne Again Shell       51            579            967           2389
XSLT                      6            495            225           2010
Ruby                      7            307            297           1788
CSS                      20            219            194           1629
DOS Batch                46            245             27           1026
Perl                      6            251            275            651
Markdown                 12            132              0            287
Smarty                    7             16              0            218
DTD                       2             91            147            176
INI                       2             19              0             65
AspectJ                   1              8             46             36
XSD                       1              6             13             34
Elixir                    1             12             42              9
YAML                      1              0             95              0
------------------------------------------------------------------------
SUM:                   9399         120821         356664        1153730
------------------------------------------------------------------------



On 15/11/17 15:31, ajs6f@apache.org wrote:
> It's not really clear to me how to answer these question without more
> context.
>
> How did you go about making these calculations? What span of time does
> your analysis concern? What are you counting as technical debt (anything
> that SonarQube claims is "technical debt")? Are you comparing Jena to
> other projects with a similar lifespan? Are you comparing Jena to
> projects that have a similar contribution history? etc. etc.
>
>
>
> ajs6f
>
> ???????? ?????? wrote on 11/15/17 10:15 AM:
>> Dear developers,
>>
>> I am a PhD student in the university of Groningen and the topic of my
>> PhD is the evolution of Technical Debt (TD) in open-source development.
>> I have analyzed some projects from the Apache Foundation (using
>> SonarQube) and I realized that your project has a tremendous number
>> (405,700) of fixed issues, when we compare it to other projects from
>> Apache.
>> I would like to ask you the following 3 questions:
>>
>>   1.  Why had been introduced so many issues of TD into your project?
>>   2.  The fixing of those issues was in purpose or a coincidence?
>>   3.  Do you use SonarQube (or SonarLint) in order to detect and fix
>> the issues?
>>
>> Thank you in advance!
>>
>> With kind regards,
>>
>> George Digkas
>>

Re: Issues fixed in Apache Jena

Posted by Andy Seaborne <an...@apache.org>.
 >> a tremendous number (405,700) of fixed issues

Where does that number come from?

There are "only" 481,512 lines of java code in the entire codebase!

(When Jena was imported into Apache, the whole of the SVN history came 
over as well.)

     Andy

cloc =>

------------------------------------------------------------------------
Language              files          blank        comment           code
------------------------------------------------------------------------
HTML                   2669            871         112007         525790
Java                   5637         107373         227331         481512
XML                     504           1016           1144          52578
Maven                   293           3684           6706          52355
JavaScript               71           4515           5929          26718
Bourne Shell             62            982           1219           4459
Bourne Again Shell       51            579            967           2389
XSLT                      6            495            225           2010
Ruby                      7            307            297           1788
CSS                      20            219            194           1629
DOS Batch                46            245             27           1026
Perl                      6            251            275            651
Markdown                 12            132              0            287
Smarty                    7             16              0            218
DTD                       2             91            147            176
INI                       2             19              0             65
AspectJ                   1              8             46             36
XSD                       1              6             13             34
Elixir                    1             12             42              9
YAML                      1              0             95              0
------------------------------------------------------------------------
SUM:                   9399         120821         356664        1153730
------------------------------------------------------------------------



On 15/11/17 15:31, ajs6f@apache.org wrote:
> It's not really clear to me how to answer these question without more 
> context.
> 
> How did you go about making these calculations? What span of time does 
> your analysis concern? What are you counting as technical debt (anything 
> that SonarQube claims is "technical debt")? Are you comparing Jena to 
> other projects with a similar lifespan? Are you comparing Jena to 
> projects that have a similar contribution history? etc. etc.
> 
> 
> 
> ajs6f
> 
> ???????? ?????? wrote on 11/15/17 10:15 AM:
>> Dear developers,
>>
>> I am a PhD student in the university of Groningen and the topic of my 
>> PhD is the evolution of Technical Debt (TD) in open-source development.
>> I have analyzed some projects from the Apache Foundation (using 
>> SonarQube) and I realized that your project has a tremendous number 
>> (405,700) of fixed issues, when we compare it to other projects from 
>> Apache.
>> I would like to ask you the following 3 questions:
>>
>>   1.  Why had been introduced so many issues of TD into your project?
>>   2.  The fixing of those issues was in purpose or a coincidence?
>>   3.  Do you use SonarQube (or SonarLint) in order to detect and fix 
>> the issues?
>>
>> Thank you in advance!
>>
>> With kind regards,
>>
>> George Digkas
>>

Re: Issues fixed in Apache Jena

Posted by aj...@apache.org.
It's not really clear to me how to answer these question without more context.

How did you go about making these calculations? What span of time does your analysis concern? What are you counting as 
technical debt (anything that SonarQube claims is "technical debt")? Are you comparing Jena to other projects with a 
similar lifespan? Are you comparing Jena to projects that have a similar contribution history? etc. etc.



ajs6f

???????? ?????? wrote on 11/15/17 10:15 AM:
> Dear developers,
>
> I am a PhD student in the university of Groningen and the topic of my PhD is the evolution of Technical Debt (TD) in open-source development.
> I have analyzed some projects from the Apache Foundation (using SonarQube) and I realized that your project has a tremendous number (405,700) of fixed issues, when we compare it to other projects from Apache.
> I would like to ask you the following 3 questions:
>
>   1.  Why had been introduced so many issues of TD into your project?
>   2.  The fixing of those issues was in purpose or a coincidence?
>   3.  Do you use SonarQube (or SonarLint) in order to detect and fix the issues?
>
> Thank you in advance!
>
> With kind regards,
>
> George Digkas
>