You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Zein Shaheen <ze...@gmail.com> on 2015/03/09 00:12:07 UTC

title inside body problem

Hello
I am using nutch 2.3 and faced a problem with some arabic content sites
this url displays the title by a tag in the <body>
and getTitle code will stop after </head> and consider that there is no
title
I thought many times of a good way to get this title and figure out that I
can modify  "getTextHelper" in parser-html plugin to make it return two
StringBuilder content and title and make no need for getTitle function ...
I thought that I have to report this for you
thank you for everything

Re: title inside body problem

Posted by Zein Shaheen <ze...@gmail.com>.
http://www.syriapath.com/forum/archive/index.php/t-23381.html
no title problem

On 9 March 2015 at 01:12, Zein Shaheen <ze...@gmail.com> wrote:

> Hello
> I am using nutch 2.3 and faced a problem with some arabic content sites
> this url displays the title by a tag in the <body>
> and getTitle code will stop after </head> and consider that there is no
> title
> I thought many times of a good way to get this title and figure out that I
> can modify  "getTextHelper" in parser-html plugin to make it return two
> StringBuilder content and title and make no need for getTitle function ...
> I thought that I have to report this for you
> thank you for everything
>