You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "AppChecker (JIRA)" <ji...@apache.org> on 2017/06/12 22:27:00 UTC
[jira] [Created] (NUTCH-2394) Possible bugs in the source code
AppChecker created NUTCH-2394:
---------------------------------
Summary: Possible bugs in the source code
Key: NUTCH-2394
URL: https://issues.apache.org/jira/browse/NUTCH-2394
Project: Nutch
Issue Type: Bug
Affects Versions: 1.13
Reporter: AppChecker
Hi!
I've checked your project with static analyzer [AppChecker|https://npo-echelon.ru/en/solutions/appchecker.php] and if found several suspicious code fragments:
1) [src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java#L56]
{code:java}
heading.trim();
{code}
heading is not changed, because java.lang.String.trim returns new string.
Probably, it should be:
{code:java}
heading = heading.trim();
{code}
see also:
* [src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/HostURLNormalizer.java#L78]
* [src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java#L115]
* [src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-protocol/src/java/org/apache/nutch/net/urlnormalizer/protocol/ProtocolURLNormalizer.java#L76]
* [src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/urlnormalizer-slash/src/java/org/apache/nutch/net/urlnormalizer/slash/SlashURLNormalizer.java#L78]
* [src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java#L326]
2) [src/java/org/apache/nutch/crawl/URLPartitioner.java#L84|https://github.com/apache/nutch/blob/2b93a66f0472e93223c69053d5482dcbef26de6d/src/java/org/apache/nutch/crawl/URLPartitioner.java#L84]
{code:java}
if (mode.equals(PARTITION_MODE_DOMAIN) && url != null)
...
else if ..
...
InetAddress address = InetAddress.getByName(url.getHost());
...
{code}
if url is null, method url.getHost() will be invoked, so NullPointerException wiil be thrown
3) [src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346|https://github.com/apache/nutch/blob/e53b34b2322f2d071981a72577644a225642ecbc/src/java/org/apache/nutch/tools/CommonCrawlDataDumper.java#L346]
{code:java}
String[] fullPathLevels = fullDir.split(File.separator);
{code}
Using File.separator in regular expressions may throws java.util.regex.PatternSyntaxException exceptions, because it is "\" on Windows-based systems.
Possible correction:
{code:java}
String[] fullPathLevels = fullDir.split(Pattern.quote(File.separator));
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)