You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2013/02/28 14:25:12 UTC

[jira] [Commented] (SOLR-772) malformed XML updates w/Resin's Stax parser doesn't trigger errors

    [ https://issues.apache.org/jira/browse/SOLR-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589530#comment-13589530 ] 

Jan Høydahl commented on SOLR-772:
----------------------------------

Anyone running Solr in Resin who can do a quick test and (un)confirm this ancient bug running one of the curl commands above?
                
> malformed XML updates w/Resin's Stax parser doesn't trigger errors
> ------------------------------------------------------------------
>
>                 Key: SOLR-772
>                 URL: https://issues.apache.org/jira/browse/SOLR-772
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>
> Originally noted by yonik on the mailing list...
> {quote}
> Then I tried Resin 3.1.1 and 3.1.6....
> Things *seem* to mostly work... until you get to updating:
>    ...
> Now here is another really weird thing... post any garbage to the
> update URL, and you still get a success!  It successfully fails on
> jetty.  Mangled query requests correctly fail.  This perhaps initially
> points to something specific to the XML config in jetty?
> {quote}
> Followup from Hoss...
> {quote}
> Skimming the code in XmlUpdateRequestHandler, and testing out various inputs, this seems like a bug in com.caucho.xml.stream.XMLStreamReaderImpl.
> Using curl as yonik described...
> curl -i http://localhost:8080/solr/update --data-binary 'crap' -H 'Content-type:text/xml; charset=utf-8'
> ...resin-3.1.6 (on Linux) returns a success (incorrectly) but the request 
> handler doesn't log any action taken. if we alter they payload ('crap') 
> above we can see some different behaviors...
> 1) 'crap<add><doc><field name="id">hoss</field></doc></add>'
> Solr adds the doc, ignorant of the crap before the add command
> 2) 'crap<add><doc></doc></add>'
> Solr correctly complains about the missing id field (example configs require it)
> 3) 'crap<add>'
> Solr returns success even though it's not legal XML
> 4) 'crap<add'
> Get the following exception...
> {noformat}
> javax.xml.stream.XMLStreamException: :1:7 Expected > at 0xffffffff
>         at com.caucho.xml.stream.XMLStreamReaderImpl.error(XMLStreamReaderImpl.java:1268)
>         at com.caucho.xml.stream.XMLStreamReaderImpl.readElementBegin(XMLStreamReaderImpl.java:689)
>         at com.caucho.xml.stream.XMLStreamReaderImpl.readNext(XMLStreamReaderImpl.java:653)
>         at com.caucho.xml.stream.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594)
>         at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:148)
> {noformat}
> 5) '<add><doc>'
> This appears to hang ... the connection seems to be left open as if it's waiting for more data.
> ...
> None of these 5 things happen when testing with Jetty.
> I'm not really very familiar with this StaX stuff -- but I suspect what's happening here is that on "wacky" input Caucho's XMLStreamReaderImpl.next() is returning values we're not expecting instead of throwing exceptions ... and depending on the input, this is either causing the XmlUpdateRequestHandler.processUpdate loop/switch to ignore the garbage data, or get stuck in an infinite loop (when there is no END_DOCUMENT)
> The question is: Are we doing the right thing, and com.caucho.xml.stream.XMLStreamReaderImpl is broken; or is XMLStreamReaderImpl producing a legal sequence of parse events for those bad inputs and we're not dealing with it properly?
> FWIW: adding the following line to our web.xml seems to make everything "work" (by which i mean: "fail") as expected...
> <system-property javax.xml.stream.XMLInputFactory="com.ctc.wstx.stax.WstxInputFactory" />
> ...do we want commit this?  
> (It wouldn't be the first time we've had to put in settings to force Resin to use the XML Library we want because something doesn't work with theirs.)
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org