You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Kim Whitehall (JIRA)" <ji...@apache.org> on 2015/09/16 02:10:45 UTC
[jira] [Closed] (NUTCH-2100) Nutch dump command doesnt dump
anything
[ https://issues.apache.org/jira/browse/NUTCH-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kim Whitehall closed NUTCH-2100.
--------------------------------
Resolution: Invalid
The command was used incorrectly. There is no bug.
> Nutch dump command doesnt dump anything
> ----------------------------------------
>
> Key: NUTCH-2100
> URL: https://issues.apache.org/jira/browse/NUTCH-2100
> Project: Nutch
> Issue Type: Bug
> Reporter: Kim Whitehall
> Assignee: Chris A. Mattmann
>
> When running the cmd
> nutch dump -segment segment -outputDir dumpFolder -mimeStats
> I receive the following
> Dumper File Stats:
> TOTAL Stats:
> [
> ]
> The log indicates that segments are being skipped.
> Note, if I use nutch/readseg -dump I can see there is content there.
> The log is shown below:
> 2015-09-15 20:10:56,142 INFO tools.FileDumper - Accepting all mimetypes.
> 2015-09-15 20:10:56,782 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 2015-09-15 20:10:57,057 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/crawl_generate]
> 2015-09-15 20:10:57,057 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/crawl_generate/content/part-00000/data]: no data directory present
> 2015-09-15 20:10:57,057 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/crawl_fetch]
> 2015-09-15 20:10:57,057 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/crawl_fetch/content/part-00000/data]: no data directory present
> 2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/content]
> 2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/content/content/part-00000/data]: no data directory present
> 2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/parse_text]
> 2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/parse_text/content/part-00000/data]: no data directory present
> 2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/parse_data]
> 2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/parse_data/content/part-00000/data]: no data directory present
> 2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment: [/.../segments/20150915195411/crawl_parse]
> 2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment: [/.../segments/20150915195411/crawl_parse/content/part-00000/data]: no data directory present
> 2015-09-15 20:10:57,059 INFO tools.FileDumper - Dumper File Stats:
> TOTAL Stats:
> [
> ]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)