You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Uygar BAYAR <uy...@beriltech.com> on 2008/01/31 10:49:02 UTC

linkdb problem

hi i got latest nutch release trough the svn.. i crawled and indexed some
sites without problem. When i tried to extract links into the linkdb i saw
that these lines in 

cat linkdb/current/part-00000/data  
SEQorg.apache.hadoop.io.Text
org.apache.nutch.crawl.Inlinks*org.apache.hadoop.io.compress.DefaultCodec�'(&����ytT��7T
cat /linkdb/current/part-00000/index
SEQorg.apache.hadoop.io.Text!org.apache.hadoop.io.LongWritable*org.apache.hadoop.io.compress.DefaultCodec�X�A�x��Q�nern#

-- 
View this message in context: http://www.nabble.com/linkdb-problem-tp15201252p15201252.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: linkdb problem

Posted by Uygar BAYAR <uy...@beriltech.com>.
problem is when i try to  ../bin/nutch readlinkdb ready/otomotiv/linkdb/
-dump alo there is nothing in the cat alo/part-00000
i fetched 20.000 urls without problem..


Dennis Kubes-2 wrote:
> 
> You are showing the cat output of linkdb which is composed of binary 
> files.  What is your problem?
> 
> Uygar BAYAR wrote:
>> hi i got latest nutch release trough the svn.. i crawled and indexed some
>> sites without problem. When i tried to extract links into the linkdb i
>> saw
>> that these lines in 
>> 
>> cat linkdb/current/part-00000/data  
>> SEQorg.apache.hadoop.io.Text
>> org.apache.nutch.crawl.Inlinks*org.apache.hadoop.io.compress.DefaultCodec�'(&����ytT��7T
>> cat /linkdb/current/part-00000/index
>> SEQorg.apache.hadoop.io.Text!org.apache.hadoop.io.LongWritable*org.apache.hadoop.io.compress.DefaultCodec�X�A�x��Q�nern#
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/linkdb-problem-tp15201252p15207266.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: linkdb problem

Posted by Dennis Kubes <ku...@apache.org>.
You are showing the cat output of linkdb which is composed of binary 
files.  What is your problem?

Uygar BAYAR wrote:
> hi i got latest nutch release trough the svn.. i crawled and indexed some
> sites without problem. When i tried to extract links into the linkdb i saw
> that these lines in 
> 
> cat linkdb/current/part-00000/data  
> SEQorg.apache.hadoop.io.Text
> org.apache.nutch.crawl.Inlinks*org.apache.hadoop.io.compress.DefaultCodec�'(&����ytT��7T
> cat /linkdb/current/part-00000/index
> SEQorg.apache.hadoop.io.Text!org.apache.hadoop.io.LongWritable*org.apache.hadoop.io.compress.DefaultCodec�X�A�x��Q�nern#
>