You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@any23.apache.org by Tim Potter <te...@yahoo-inc.com> on 2012/04/30 13:15:59 UTC

Mircodata empty content handling.

Hi,
   We've noticed that while extracting Microdata propItems, elements with missing, empty or whitespace string values results in an IllegalArgumentException being thrown.

HTML that trigger this could be one of:

<span itemprop="nameA"></span>


<meta itemprop="nameB">

<meta itemprop="nameC" content="  ">

The code explicitly check for this condition in ItemPropValue.itemPropValue():


if(content instanceof String && ((String) content).trim().length() == 0) {

    throw new IllegalArgumentException("Invalid content '" + content + "'");

}

Is this correct behavior?  Is it not possible that some properties could legally have an empty string as a value?  I looked at the RFC and it seems that having no value should be treated as an empty string, but it doesn't state that empty values are illegal.

http://www.w3.org/TR/html5/microdata.html#values

Regards,
  Tim P.

Re: Mircodata empty content handling.

Posted by Michele Mostarda <mi...@gmail.com>.
Hi Tim,


On 30 April 2012 13:15, Tim Potter <te...@yahoo-inc.com> wrote:

> Hi,
>    We've noticed that while extracting Microdata propItems, elements with
> missing, empty or whitespace string values results in an
> IllegalArgumentException being thrown.
>

Thanks for reporting.


> HTML that trigger this could be one of:
>
> <span itemprop="nameA"></span>
>
> <meta itemprop="nameB">
>
> <meta itemprop="nameC" content="  ">
>
> The code explicitly check for this condition in
> ItemPropValue.itemPropValue():
>
> if(content instanceof String && ((String) content).trim().length() == 0) {
>
>     throw new IllegalArgumentException("Invalid content '" + content + "'"
> );
>
> }
>
> Is this correct behavior?  Is it not possible that some properties could
> legally have an empty string as a value?  I looked at the RFC and it seems
> that having no value should be treated as an empty string, but it doesn't
> state that empty values are illegal.
>

As stressed by [1] the Microdata extractor must be updated to the latest
RDF mapping specification (which evolved after the first MicrodataExtractor
implementation).
 Adding a note to [1].


> http://www.w3.org/TR/html5/microdata.html#values
>
> Regards,
>   Tim P.
>


The best.

Mic

[1] https://issues.apache.org/jira/browse/ANY23-67

-- 
Michele Mostarda
Senior Software Engineer
skype: michele.mostarda
twitter: micmos
mail: me@michelemostarda.com
site : http://www.michelemostarda.com