You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2013/08/26 09:22:52 UTC

[jira] [Commented] (ANY23-115) Empty spans seem to break ANY23

    [ https://issues.apache.org/jira/browse/ANY23-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13749880#comment-13749880 ] 

Hudson commented on ANY23-115:
------------------------------

SUCCESS: Integrated in Any23-trunk #764 (See [https://builds.apache.org/job/Any23-trunk/764/])
ANY23-115 Empty spans seem to break ANY23 (lewismc: https://git-wip-us.apache.org/repos/asf?p=any23.git&a=commit&h=5195ebaa806d108791bb7ce449644ed93b62e882)
* core/src/main/java/org/apache/any23/extractor/microdata/ItemPropValue.java

                
> Empty spans seem to break ANY23
> -------------------------------
>
>                 Key: ANY23-115
>                 URL: https://issues.apache.org/jira/browse/ANY23-115
>             Project: Apache Any23
>          Issue Type: Bug
>          Components: html-scraper, microdata
>    Affects Versions: 0.7.0
>         Environment: Any23.org public scraper
>            Reporter: Christophe Dupriez
>             Fix For: 0.9.0
>
>         Attachments: 0001-ANY23-115-Empty-spans-seem-to-break-ANY23.patch, json-pretty-printer.html
>
>
> One of the 2 thousand URLs with the problem:
> http://www.oceanexpert.net/viewMemberRecord.php?&memberID=20045
> The piece of HTML creating the problem seems to be:
> <h1>
> 				Details of<span itemprop="name"> <span itemprop="honorificPrefix"></span>&nbsp;<span itemprop="givenName">Laury</span>&nbsp; <span itemprop="familyName">Miller</span></span>
> 							</h1>
> (this may disappear as we may workaround the problem)
> Error message:
> Internal error.
> ================================================================
> java.lang.IllegalArgumentException: Invalid content ''
> 	at org.apache.any23.extractor.microdata.ItemPropValue.<init>(ItemPropValue.java:89)
> 	at org.apache.any23.extractor.microdata.MicrodataParser.getPropertyValue(MicrodataParser.java:341)
> 	at org.apache.any23.extractor.microdata.MicrodataParser.getItemProps(MicrodataParser.java:394)
> 	at org.apache.any23.extractor.microdata.MicrodataParser.getItemScope(MicrodataParser.java:471)
> 	at org.apache.any23.extractor.microdata.MicrodataParser.getMicrodata(MicrodataParser.java:186)
> 	at org.apache.any23.extractor.microdata.MicrodataParser.getMicrodata(MicrodataParser.java:203)
> 	at org.apache.any23.extractor.microdata.MicrodataExtractor.run(MicrodataExtractor.java:100)
> 	at org.apache.any23.extractor.microdata.MicrodataExtractor.run(MicrodataExtractor.java:62)
> 	at org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:477)
> 	at org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:260)
> 	at org.apache.any23.Any23.extract(Any23.java:294)
> 	at org.apache.any23.Any23.extract(Any23.java:446)
> 	at org.apache.any23.servlet.WebResponder.runExtraction(WebResponder.java:113)
> 	at org.apache.any23.servlet.Servlet.doGet(Servlet.java:74)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
> 	at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
> 	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
> 	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> 	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> 	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> 	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> 	at com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30)
> 	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> 	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> 	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> 	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
> 	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
> 	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> 	at java.lang.Thread.run(Thread.java:662)
> ================================================================

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira