You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by kan001 <ka...@yahoo.com> on 2007/03/06 05:48:56 UTC

Re: [SOLVED] moving crawled db from windows to linux

Thanks for the immediate reply.

please find the result from du -h crawl/ command  and the logs below:
32K     crawl/crawldb/current/part-00000
36K     crawl/crawldb/current
40K     crawl/crawldb
120K    crawl/index
128K    crawl/indexes/part-00000
132K    crawl/indexes
52K     crawl/linkdb/current/part-00000
56K     crawl/linkdb/current
60K     crawl/linkdb
40K     crawl/segments/20070228143239/content/part-00000
44K     crawl/segments/20070228143239/content
20K     crawl/segments/20070228143239/crawl_fetch/part-00000
24K     crawl/segments/20070228143239/crawl_fetch
12K     crawl/segments/20070228143239/crawl_generate
12K     crawl/segments/20070228143239/crawl_parse
20K     crawl/segments/20070228143239/parse_data/part-00000
24K     crawl/segments/20070228143239/parse_data
24K     crawl/segments/20070228143239/parse_text/part-00000
28K     crawl/segments/20070228143239/parse_text
148K    crawl/segments/20070228143239
136K    crawl/segments/20070228143249/content/part-00000
140K    crawl/segments/20070228143249/content
20K     crawl/segments/20070228143249/crawl_fetch/part-00000
24K     crawl/segments/20070228143249/crawl_fetch
12K     crawl/segments/20070228143249/crawl_generate
28K     crawl/segments/20070228143249/crawl_parse
32K     crawl/segments/20070228143249/parse_data/part-00000
36K     crawl/segments/20070228143249/parse_data
44K     crawl/segments/20070228143249/parse_text/part-00000
48K     crawl/segments/20070228143249/parse_text
292K    crawl/segments/20070228143249
20K     crawl/segments/20070228143327/content/part-00000
24K     crawl/segments/20070228143327/content
20K     crawl/segments/20070228143327/crawl_fetch/part-00000
24K     crawl/segments/20070228143327/crawl_fetch
16K     crawl/segments/20070228143327/crawl_generate
12K     crawl/segments/20070228143327/crawl_parse
20K     crawl/segments/20070228143327/parse_data/part-00000
24K     crawl/segments/20070228143327/parse_data
20K     crawl/segments/20070228143327/parse_text/part-00000
24K     crawl/segments/20070228143327/parse_text
128K    crawl/segments/20070228143327
20K     crawl/segments/20070228143434/content/part-00000
24K     crawl/segments/20070228143434/content
20K     crawl/segments/20070228143434/crawl_fetch/part-00000
24K     crawl/segments/20070228143434/crawl_fetch
16K     crawl/segments/20070228143434/crawl_generate
12K     crawl/segments/20070228143434/crawl_parse
20K     crawl/segments/20070228143434/parse_data/part-00000
24K     crawl/segments/20070228143434/parse_data
20K     crawl/segments/20070228143434/parse_text/part-00000
24K     crawl/segments/20070228143434/parse_text
128K    crawl/segments/20070228143434
700K    crawl/segments
1.1M    crawl/

 INFO [TP-Processor1] (Configuration.java:397) - parsing
jar:file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/lib/hadoop-0.4.0.jar!/hadoop-default.xml
 INFO [TP-Processor1] (Configuration.java:397) - parsing
file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-default.xml
 INFO [TP-Processor1] (Configuration.java:397) - parsing
file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/nutch-site.xml
 INFO [TP-Processor1] (Configuration.java:397) - parsing
file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/hadoop-site.xml
 INFO [TP-Processor1] (PluginManifestParser.java:81) - Plugins: looking in:
/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/plugins
 INFO [TP-Processor1] (PluginRepository.java:333) - Plugin Auto-activation
mode: [true]
 INFO [TP-Processor1] (PluginRepository.java:334) - Registered Plugins:
 INFO [TP-Processor1] (PluginRepository.java:341) -     CyberNeko HTML
Parser (lib-nekohtml)
 INFO [TP-Processor1] (PluginRepository.java:341) -     Site Query Filter
(query-site)
 INFO [TP-Processor1] (PluginRepository.java:341) -     Html Parse Plug-in
(parse-html)
 INFO [TP-Processor1] (PluginRepository.java:341) -     Regex URL Filter
Framework (lib-regex-filter)
 INFO [TP-Processor1] (PluginRepository.java:341) -     Basic Indexing
Filter (index-basic)
 INFO [TP-Processor1] (PluginRepository.java:341) -     Basic Summarizer
Plug-in (summary-basic)
 INFO [TP-Processor1] (PluginRepository.java:341) -     Text Parse Plug-in
(parse-text)
 INFO [TP-Processor1] (PluginRepository.java:341) -     JavaScript Parser
(parse-js)
 INFO [TP-Processor1] (PluginRepository.java:341) -     Regex URL Filter
(urlfilter-regex)
 INFO [TP-Processor1] (PluginRepository.java:341) -     Basic Query Filter
(query-basic)
 INFO [TP-Processor1] (PluginRepository.java:341) -     HTTP Framework
(lib-http)
 INFO [TP-Processor1] (PluginRepository.java:341) -     URL Query Filter
(query-url)
 INFO [TP-Processor1] (PluginRepository.java:341) -     Http Protocol
Plug-in (protocol-http)
 INFO [TP-Processor1] (PluginRepository.java:341) -     the nutch core
extension points (nutch-extensionpoints)
 INFO [TP-Processor1] (PluginRepository.java:341) -     OPIC Scoring Plug-in
(scoring-opic)
 INFO [TP-Processor1] (PluginRepository.java:345) - Registered
Extension-Points:
 INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Summarizer
(org.apache.nutch.searcher.Summarizer)
 INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
 INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Protocol
(org.apache.nutch.protocol.Protocol)
 INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch URL Filter
(org.apache.nutch.net.URLFilter)
 INFO [TP-Processor1] (PluginRepository.java:352) -     HTML Parse Filter
(org.apache.nutch.parse.HtmlParseFilter)
 INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Online Search
Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
 INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Indexing
Filter (org.apache.nutch.indexer.IndexingFilter)
 INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Content Parser
(org.apache.nutch.parse.Parser)
 INFO [TP-Processor1] (PluginRepository.java:352) -     Ontology Model
Loader (org.apache.nutch.ontology.Ontology)
 INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Analysis
(org.apache.nutch.analysis.NutchAnalyzer)
 INFO [TP-Processor1] (PluginRepository.java:352) -     Nutch Query Filter
(org.apache.nutch.searcher.QueryFilter)
 INFO [TP-Processor1] (NutchBean.java:69) - creating new bean
 INFO [TP-Processor1] (NutchBean.java:121) - opening indexes in
/home/nutch-0.8/crawl/indexes
 INFO [TP-Processor1] (Configuration.java:360) - found resource
common-terms.utf8 at
file:/usr/java/tomcat-5.5/webapps/ROOT/WEB-INF/classes/common-terms.utf8
 INFO [TP-Processor1] (NutchBean.java:143) - opening segments in
/home/nutch-0.8/crawl/segments
 INFO [TP-Processor1] (SummarizerFactory.java:52) - Using the first
summarizer extension found: Basic Summarizer
 INFO [TP-Processor1] (NutchBean.java:154) - opening linkdb in
/home/nutch-0.8/crawl/linkdb
 INFO [TP-Processor1] (search_jsp.java:108) - query request from
192.168.1.64
 INFO [TP-Processor1] (search_jsp.java:151) - query:
 INFO [TP-Processor1] (search_jsp.java:152) - lang:
 INFO [TP-Processor1] (NutchBean.java:247) - searching for 20 raw hits
 INFO [TP-Processor1] (search_jsp.java:337) - total hits: 0
 
 INFO [TP-Processor5] (search_jsp.java:108) - query request from
192.168.1.64
 INFO [TP-Processor5] (search_jsp.java:151) - query: ads
 INFO [TP-Processor5] (search_jsp.java:152) - lang: en
 INFO [TP-Processor5] (NutchBean.java:247) - searching for 20 raw hits
 INFO [TP-Processor5] (search_jsp.java:337) - total hits: 0

 



kan001 wrote:
> 
> When I copied crawled db from windows to linux and trying to search
> through tomcat in linux - it returns 0 hits.
> But in windows its getting results from search screen. Any idea?? I have
> given root permissions to the crawled db.
> In the logs it is showing - oening segments.... But hits 0!!!
> 

-- 
View this message in context: http://www.nabble.com/moving-crawled-db-from-windows-to-linux-tf3350448.html#a9326034
Sent from the Nutch - User mailing list archive at Nabble.com.