You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "AppChecker (JIRA)" <ji...@apache.org> on 2017/06/12 22:27:00 UTC

[jira] [Created] (NUTCH-2394) Possible bugs in the source code

AppChecker created NUTCH-2394:
---------------------------------

             Summary: Possible bugs in the source code
                 Key: NUTCH-2394
                 URL: https://issues.apache.org/jira/browse/NUTCH-2394
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.13
            Reporter: AppChecker


Hi!
I've checked your project with static analyzer [AppChecker|https://npo-echelon.ru/en/solutions/appchecker.php] and if found several suspicious code fragments:
1) [src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java#L56]

{code:java}
heading.trim();
{code}
heading is not changed, because java.lang.String.trim returns new string.
Probably, it should be:
{code:java}
heading = heading.trim();
{code}

see also:
* [src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78]
* [src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115]
* [src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-protocol/src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76]
* [src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-slash/src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78]
* [src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326]

2) [src/java/org/apache/nutch/crawl/URLPartitioner.java#L84|https://github.com/apache/nutch/blob/2b93a66f0472e93223c69053d5482dcbef26de6d/src/java/org/apache/nutch/crawl/URLPartitioner.java#L84]

{code:java}
if (mode.equals(PARTITION_MODE_DOMAIN) && url != null)
  ...
else if ..
  ...
  InetAddress address = InetAddress.getByName(url.getHost());
  ...
{code}
if url is null, method url.getHost() will be invoked, so NullPointerException wiil be thrown


3) [src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346]

{code:java}
String[] fullPathLevels = fullDir.split(File.separator);
{code}
Using File.separator in regular expressions may throws java.util.regex.PatternSyntaxException exceptions, because it is "\" on Windows-based systems.
Possible 	correction:
{code:java}
String[] fullPathLevels = fullDir.split(Pattern.quote(File.separator));
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)