You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Ted Dunning (JIRA)" <ji...@apache.org> on 2009/08/22 21:04:14 UTC

[jira] Commented: (MAHOUT-166) Potpourri 2

    [ https://issues.apache.org/jira/browse/MAHOUT-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746490#action_12746490 ] 

Ted Dunning commented on MAHOUT-166:
------------------------------------


I would suspect that Patterns compiled from constant strings are cached by the system.  This would make several of these changes moot.  Since this is a prospective performance tweak, the key test should be readability first, profiling second.  IF the region of code containing the regex usage does not constitute the majority of the time, then it probably should be left in whatever is the most readable form.

A good example of a change that I would see as less readable would be the modification of this:
{code}
     String label = labelFeaturePair.split(",")[0];
{code}
into this:
{code}
    int comma = labelFeaturePair.indexOf(',');
    String label = comma < 0 ? labelFeaturePair : labelFeaturePair.substring(0, comma);
{code}
In canopy/InputMapper.java, I see a change for the better with this:
{code}
    protected Constructor<?> constructor;
{code}
Can this be further changed to Constructor<? extends Vector> to avoid a cast shortly thereafter?

As a random questions, why is this still transient if the class is not serializable?
{code}
  private static final transient Logger log = LoggerFactory.getLogger(WikipediaDatasetCreatorDriver.class);
{code}

In the category of gratuitous strangeness, why the single quotes here?  This can't be better in any way that I can think of (concatenating string constants might be forced out of compile time, this is a test class and so on).
{code}
           "   @RELATION Mahout\n" +
          '\n' +
           "   @ATTRIBUTE foo  NUMERIC\n" +
{code}


> Potpourri 2
> -----------
>
>                 Key: MAHOUT-166
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-166
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 0.2
>
>         Attachments: MAHOUT-166.patch
>
>
> Another large changelist constructed from FindBugs and IntelliJ analysis. It's big enough I figured I'd run it by the list. Key changes:
> - Making stuff final, private that can be
> - Dead code elimination
> - Simplifying JUnit assertions -- "assertTrue(a.equals(b) == true)" could be "assertEquals(a, b)" for instance. Also fixed some expected/actual value issues
> - Not compiling a Pattern object millions of times -- String.split() and replace()/replaceAll() do this and can be profitably replaced with a precompiled Pattern.
> - Small bug fixes picked up by analysis

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.