You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Kevin Cunningham <kc...@telligent.com> on 2013/10/01 00:06:46 UTC

No longer allowed to store html in a 'string' type

We have been using Solr for a while now, went from 1.4 -> 3.6.  While running some tests in 4.4 we are no longer allowed to store raw html in a documents field with a type of 'string', which we used to be able to do. Has something changed here?  Now we get the following error: Undeclared general entity \"nbsp\"\r\n at [row,col {unknown-source}]: [11,53]

I understand what its saying and can change the way we store and extract it if it's a must but would like to understand what changed.  Sounds like something just became more strict to adhering to rules.

<doc>
<str name="rawcontent">
<p>Testing <a href="/sample_group/b/sample_weblog/archive/tags/bananas/default.aspx" class="tag hash-tag" data-tags="bananas">#bananas</a>&nbsp;tag</p> <p></p> <p>document document document document document document</p><div style="clear:both;"></div>
</str>
<str name="type">blog</str>
</doc>



RE: No longer allowed to store html in a 'string' type

Posted by Uwe Schindler <uw...@thetaphi.de>.
You have to correctly escape your xml-like HTML inside the XML you send to SOLR (using <!CDATA[ … ]]> or via escaping with &lt; &gt; &quot;). Otherwise Solr would be attackable using HTML-injection.

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Kevin Cunningham [mailto:kcunningham@telligent.com] 
Sent: Tuesday, October 01, 2013 12:07 AM
To: dev@lucene.apache.org
Subject: No longer allowed to store html in a 'string' type

 

We have been using Solr for a while now, went from 1.4 -> 3.6.  While running some tests in 4.4 we are no longer allowed to store raw html in a documents field with a type of ‘string’, which we used to be able to do. Has something changed here?  Now we get the following error: Undeclared general entity \"nbsp\"\r\n at [row,col {unknown-source}]: [11,53]


I understand what its saying and can change the way we store and extract it if it’s a must but would like to understand what changed.  Sounds like something just became more strict to adhering to rules.

 

<doc>

<str name="rawcontent">

<p>Testing <a href="/sample_group/b/sample_weblog/archive/tags/bananas/default.aspx" class="tag hash-tag" data-tags="bananas">#bananas</a>&nbsp;tag</p> <p></p> <p>document document document document document document</p><div style="clear:both;"></div>

</str>

<str name="type">blog</str>

</doc>

 

 


RE: No longer allowed to store html in a 'string' type

Posted by Kevin Cunningham <kc...@telligent.com>.
Wooops, wrong alias.  Posted to user instead.

From: Kevin Cunningham [mailto:kcunningham@telligent.com]
Sent: Monday, September 30, 2013 5:07 PM
To: dev@lucene.apache.org
Subject: No longer allowed to store html in a 'string' type

We have been using Solr for a while now, went from 1.4 -> 3.6.  While running some tests in 4.4 we are no longer allowed to store raw html in a documents field with a type of 'string', which we used to be able to do. Has something changed here?  Now we get the following error: Undeclared general entity \"nbsp\"\r\n at [row,col {unknown-source}]: [11,53]

I understand what its saying and can change the way we store and extract it if it's a must but would like to understand what changed.  Sounds like something just became more strict to adhering to rules.

<doc>
<str name="rawcontent">
<p>Testing <a href="/sample_group/b/sample_weblog/archive/tags/bananas/default.aspx" class="tag hash-tag" data-tags="bananas">#bananas</a>&nbsp;tag</p> <p></p> <p>document document document document document document</p><div style="clear:both;"></div>
</str>
<str name="type">blog</str>
</doc>