You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by axi <ax...@gmail.com> on 2010/01/20 17:16:27 UTC

Alt text of images as anchor text

after several test, I have noticed that nutch ignores alt text of images
inside <a href=" tags. 
So, this feature isn't implemented yet right?


thanks in advance,
-- 
View this message in context: http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27244358.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Re: Alt text of images as anchor text

Posted by axi <ax...@gmail.com>.
I'll try that, 
but the real anchor text is in  
On Wed, Jan 20, 2010 at 8:11 PM, axi <ax...@gmail.com> wrote:
>
> If you put image as link, is commonly known that alt text of that image is
> equivalent to the anchor text of text link. Now if you put an image with
> alt
> text inside a link, anchor text for that link is empty and no image alt
> text
> is counted.

are you crawling for images? or

http://svn.apache.org/repos/asf/lucene/nutch/trunk/conf/crawl-urlfilter.txt.template

# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$

>
> Nutch Newbie wrote:
>>
>> On Wed, Jan 20, 2010 at 4:16 PM, axi <ax...@gmail.com> wrote:
>>>
>>> after several test, I have noticed that nutch ignores alt text of images
>>> inside  " tags.
>  So, this feature isn't implemented yet right?
>>
>> what exactly you want nutch should do to the "alt text" index it?
>> tokenize it? make this field available as query i.e. "img_alt:my alt
>> tags" or?
>>
>>
>>>
>>>
>>> thanks in advance,
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27244358.html
>>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27247820.html
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>



-- 
View this message in context: http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27249488.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Re: Alt text of images as anchor text

Posted by Nutch Newbie <nu...@gmail.com>.
On Wed, Jan 20, 2010 at 8:11 PM, axi <ax...@gmail.com> wrote:
>
> If you put image as link, is commonly known that alt text of that image is
> equivalent to the anchor text of text link. Now if you put an image with alt
> text inside a link, anchor text for that link is empty and no image alt text
> is counted.

are you crawling for images? or

http://svn.apache.org/repos/asf/lucene/nutch/trunk/conf/crawl-urlfilter.txt.template

# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$

>
> Nutch Newbie wrote:
>>
>> On Wed, Jan 20, 2010 at 4:16 PM, axi <ax...@gmail.com> wrote:
>>>
>>> after several test, I have noticed that nutch ignores alt text of images
>>> inside  " tags.
>  So, this feature isn't implemented yet right?
>>
>> what exactly you want nutch should do to the "alt text" index it?
>> tokenize it? make this field available as query i.e. "img_alt:my alt
>> tags" or?
>>
>>
>>>
>>>
>>> thanks in advance,
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27244358.html
>>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27247820.html
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>

Re: Alt text of images as anchor text

Posted by axi <ax...@gmail.com>.
If you put image as link, is commonly known that alt text of that image is
equivalent to the anchor text of text link. Now if you put an image with alt
text inside a link, anchor text for that link is empty and no image alt text
is counted.


Nutch Newbie wrote:
> 
> On Wed, Jan 20, 2010 at 4:16 PM, axi <ax...@gmail.com> wrote:
>>
>> after several test, I have noticed that nutch ignores alt text of images
>> inside  " tags.
  So, this feature isn't implemented yet right?
> 
> what exactly you want nutch should do to the "alt text" index it?
> tokenize it? make this field available as query i.e. "img_alt:my alt
> tags" or?
> 
> 
>>
>>
>> thanks in advance,
>> --
>> View this message in context:
>> http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27244358.html
>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27247820.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Re: Alt text of images as anchor text

Posted by Nutch Newbie <nu...@gmail.com>.
On Wed, Jan 20, 2010 at 4:16 PM, axi <ax...@gmail.com> wrote:
>
> after several test, I have noticed that nutch ignores alt text of images
> inside <a href=" tags.
> So, this feature isn't implemented yet right?

what exactly you want nutch should do to the "alt text" index it?
tokenize it? make this field available as query i.e. "img_alt:my alt
tags" or?


>
>
> thanks in advance,
> --
> View this message in context: http://old.nabble.com/Alt-text-of-images-as-anchor-text-tp27244358p27244358.html
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>