You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/09/21 18:53:33 UTC
[jira] [Comment Edited] (NUTCH-1844) testresources/testcrawl not
referenced anywhere in code.
[ https://issues.apache.org/jira/browse/NUTCH-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142510#comment-14142510 ]
Chris A. Mattmann edited comment on NUTCH-1844 at 9/21/14 4:52 PM:
-------------------------------------------------------------------
After examining the Nutch 1.2 CrawlDbConverter:
http://nutch.apache.org/apidocs/apidocs-1.2/org/apache/nutch/tools/compat/CrawlDbConverter.html
And running it:
{noformat}
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib org.apache.nutch.tools.compat.CrawlDbConverter ../nutch/src/testresources/testcrawl/crawldb foo -withMetadata
[chipotle:~/tmp/nutch1.2] mattmann% ls
CHANGES.txt LICENSE.txt README.txt build/ conf/ default.properties hadoop.log lib/ src/
KEYS NOTICE.txt bin/ build.xml contrib/ docs/ index.html site/
[chipotle:~/tmp/nutch1.2] mattmann% ls foo
ls: foo: No such file or directory
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/ index/ indexes/ linkdb/ segments/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/ testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/ testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib org.apache.nutch.tools.compat.CrawlDbConverter ../nutch/src/testresources/testcrawl/crawldb ../nutch/src/testresources/testcrawl/crawldb2 -withMetadata
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/ testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/test
ls: ../nutch/src/testresources/test: No such file or directory
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/ index/ indexes/ linkdb/ segments/
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib org.apache.nutch.tools.compat.CrawlDbConverter ../nutch/src/testresources/testcrawl/crawldb ../nutch/src/testresources/testcrawl/crawldb2
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/ index/ indexes/ linkdb/ segments/
{noformat}
Both against:
* crawldb
* whole crawl dir
* segments
etc., it produces no output and I can't seem to figure out how to use it. So, rather than invest more time here, I am going to suggest that if in 48 hours, I don't hear objections, I'm going to delete the testresources/testcrawl since it's not referenced anywhere in the code.
was (Author: chrismattmann):
After examining the Nutch 1.2 CrawlDbConverter:
http://nutch.apache.org/apidocs/apidocs-1.2/org/apache/nutch/tools/compat/CrawlDbConverter.html
And running it:
{noformat}
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib org.apache.nutch.tools.compat.CrawlDbConverter ../nutch/src/testresources/testcrawl/crawldb foo -withMetadata
[chipotle:~/tmp/nutch1.2] mattmann% ls
CHANGES.txt LICENSE.txt README.txt build/ conf/ default.properties hadoop.log lib/ src/
KEYS NOTICE.txt bin/ build.xml contrib/ docs/ index.html site/
[chipotle:~/tmp/nutch1.2] mattmann% ls foo
ls: foo: No such file or directory
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/ index/ indexes/ linkdb/ segments/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/ testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/ testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib org.apache.nutch.tools.compat.CrawlDbConverter ../nutch/src/testresources/testcrawl/crawldb ../nutch/src/testresources/testcrawl/crawldb2 -withMetadata
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/
fetch-test-site/ test-mime-util/ testcrawl/
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/test
ls: ../nutch/src/testresources/test: No such file or directory
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/ index/ indexes/ linkdb/ segments/
[chipotle:~/tmp/nutch1.2] mattmann% java -Djava.ext.dirs=build:lib org.apache.nutch.tools.compat.CrawlDbConverter ../nutch/src/testresources/testcrawl/crawldb ../nutch/src/testresources/testcrawl/crawldb2
[chipotle:~/tmp/nutch1.2] mattmann% ls ../nutch/src/testresources/testcrawl/
crawldb/ index/ indexes/ linkdb/ segments/
{noformat}
Both against:
*crawldb
*whole crawl dir
* segments
etc., it produces no output and I can't seem to figure out how to use it. So, rather than invest more time here, I am going to suggest that if in 48 hours, I don't hear objections, I'm going to delete the testresources/testcrawl since it's not referenced anywhere in the code.
> testresources/testcrawl not referenced anywhere in code.
> --------------------------------------------------------
>
> Key: NUTCH-1844
> URL: https://issues.apache.org/jira/browse/NUTCH-1844
> Project: Nutch
> Issue Type: Bug
> Components: test
> Reporter: Chris A. Mattmann
> Assignee: Chris A. Mattmann
> Fix For: 1.10
>
>
> While working on NUTCH-1526 in Review Board https://reviews.apache.org/r/9119/ [~lewismc] tried to test out the ./bin/nutch dump tool on src/testresources/testcrawl and found that it failed due to an old o.a.h.io.UTF8 key type (instead of the o.a.h.io.Text) type.
> I looked into this - how were Nutch tests passing using this old code? I found that Andrzej a long time ago wrote a tool to update the index from the old UFT8 key format to Text - I also found that *no where in the Nutch code* is the testcrawl referenced.
> My suggestion:
> * we remove the testcrawl (it's not used)
> * if we don't remove it, we at least run Andrzej's tool on it and then upgrade it to use o.a.h.io.Text keys.
> I'll take care of this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)