You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Richard Braman <rb...@bramantax.com> on 2006/03/05 02:45:19 UTC

RE: url shown instead of title.

whoops i hit send by accident :(
 
any idea why 
http://24.75.221.234:8080/search.jsp?query=e-file+site%3Awww.irs.gov
<http://24.75.221.234:8080/search.jsp?query=e-file+site%3Awww.irs.gov&hi
tsPerPage=10&hitsPerSite=0&clustering>
&hitsPerPage=10&hitsPerSite=0&clustering=
returns a list of hits where the title of the page is not shown , but
instead the url is shown.  The pages do have titles.
 
 

-----Original Message-----
From: Richard Braman [mailto:rbraman@bramantax.com] 
Sent: Saturday, March 04, 2006 8:43 PM
To: 'nutch-user@lucene.apache.org'
Subject: 


Any idea why

 

Richard Braman
mailto:rbraman@taxcodesoftware.org
561.748.4002 (voice) 

http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/> 
Free Open Source Tax Software

 


Re: url shown instead of title.

Posted by Doug Cutting <cu...@apache.org>.
Richard Braman wrote:
> any idea why 
> http://24.75.221.234:8080/search.jsp?query=e-file+site%3Awww.irs.gov
> <http://24.75.221.234:8080/search.jsp?query=e-file+site%3Awww.irs.gov&hi
> tsPerPage=10&hitsPerSite=0&clustering>
> &hitsPerPage=10&hitsPerSite=0&clustering=
> returns a list of hits where the title of the page is not shown , but
> instead the url is shown.  The pages do have titles.

The "explain" button also shows a null title and the cache does not 
include these files.  Are you sure they were fetched?  Perhaps they only 
have links.  What version of Nutch are you using?  0.8 does not support 
indexing pages with only links, but I think 0.7 may have.  If 0.8, then 
I'd suspect the parser.  Try re-parsing these pages (e.g., by crawling 
only these pages in a test crawl).  Maybe put some print statements in 
the parser to see what's going on?

Doug