You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Shawn Heisey <ap...@elyograg.org> on 2022/12/01 04:44:38 UTC

Re: SOLR adding ,​ to strings erroneously

On 11/30/22 13:44, Matthew Castrigno wrote:
> Using SOLR 9.0 and the ScriptUpdatProcesor, it appears SOLR is erroneously adding "  ,&#8203 " in the middle of a string field.
>
> The script just logs the fields. If you compare the curl request with what is logged you see the addition of many instances of ,&#8203  in the content field.

This just happens on the logging tab of the admin UI.  In the javascript 
file at server/solr-webapp/webapp/js/angular/controllers/logging.js I 
found the following line:

           event.message = event.message.replace(/,/g, ',&#8203;');

HTML character code 8203 is the unicode "zero width space" character.  I 
think the admin UI code is trying to make long comma separated lists in 
log entries word-wrap better, and somehow the browser is treating that 
as literal text rather than an HTML entity.  This is NOT in the data 
being indexed, it is just in the log.  It's definitely a display bug, 
but doesn't affect the data being indexes.

Here you can see the same thing happening with my server running 
9.2.0-SNAPSHOT:

https://www.dropbox.com/s/77yc9bovxwaauu6/solr-logging-html-8203.png?dl=0

I checked solr.log and that text is NOT there.  I bet if you check 
solr.log you will also find that it is not there.

Requests to the URL in my screenshot that do not come from specific IP 
addresses are blocked.  Those requests never get beyond the reverse proxy.

Thanks,
Shawn


Re: SOLR adding ,​ to strings erroneously

Posted by Thomas Corthals <th...@klascement.net>.
Op do 1 dec. 2022 om 17:12 schreef dmitri maziuk <dm...@gmail.com>:

> On 2022-12-01 6:41 AM, Eric Pugh wrote:
> > Shawn,
> >
> > Have we received a couple of mentions of this?  Or am I misremembering?
> Do we need to open a JIRA and change how logging.js works?
>
>
> https://issues.apache.org/jira/projects/SOLR/issues/SOLR-16469
>
> It's interesting that the browser doesn't interpret the character
> correctly, but why not just add an actual space instead of a zero-length
> space?
>
> Dima
>
>
If the browser doesn't interpret it, it's probably used in a place
that expects text content instead of HTML.

The fix could be as simple as replacing it with ',\u200B' to insert the
actual Unicode character.

Thomas

Re: SOLR adding ,​ to strings erroneously

Posted by Michael Conrad <mi...@newsrx.com>.
We use the zero-width space trick here to ensure links break properly 
when formatting them for PDF and HTML display in our articles with extra 
long URLs. Inserting a regular space would work, but also would display 
incorrectly for human parsing.

On 12/1/22 11:12, dmitri maziuk wrote:
> On 2022-12-01 6:41 AM, Eric Pugh wrote:
>> Shawn,
>>
>> Have we received a couple of mentions of this?  Or am I 
>> misremembering?  Do we need to open a JIRA and change how logging.js 
>> works?
>
>
> https://issues.apache.org/jira/projects/SOLR/issues/SOLR-16469
>
> It's interesting that the browser doesn't interpret the character 
> correctly, but why not just add an actual space instead of a 
> zero-length space?
>
> Dima
>

Re: SOLR adding ,​ to strings erroneously

Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-12-01 6:41 AM, Eric Pugh wrote:
> Shawn,
> 
> Have we received a couple of mentions of this?  Or am I misremembering?  Do we need to open a JIRA and change how logging.js works?


https://issues.apache.org/jira/projects/SOLR/issues/SOLR-16469

It's interesting that the browser doesn't interpret the character 
correctly, but why not just add an actual space instead of a zero-length 
space?

Dima


Re: SOLR adding ,​ to strings erroneously

Posted by Thomas Corthals <th...@klascement.net>.
Op za 3 dec. 2022 om 18:47 schreef Shawn Heisey <ap...@elyograg.org>:

> On 12/3/22 10:38, dmitri maziuk wrote:
> > On 2022-12-02 7:41 PM, Shawn Heisey wrote:
> >
> >> I'm curious as to why those entities are displaying as text instead
> >> of being interpreted by the browser as a zero-width space.
> >
> > I am curious as to why Matthew and I are apparently the only people
> > seeing it.
>
> I see it on my install, 9.2.0-SNAPSHOT compiled 2022/11/30, and it was
> also happening on a version compiled a few days earlier.  I have no idea
> when it first started happening.  I tend to glance at the logs every now
> and then, and only look closer at logs that pertain to whatever I am
> working on at that moment.  And I use solr.log a lot more than the
> logging tab in the UI ... this problem does not occur in the actual
> logfile.
>
> Thanks,
> Shawn
>

Earliest version I have running is 8.4.0 and I'm seeing it there in the
admin UI as well.

Thomas

Re: SOLR adding ,​ to strings erroneously

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/3/22 10:38, dmitri maziuk wrote:
> On 2022-12-02 7:41 PM, Shawn Heisey wrote:
>
>> I'm curious as to why those entities are displaying as text instead 
>> of being interpreted by the browser as a zero-width space.
>
> I am curious as to why Matthew and I are apparently the only people 
> seeing it.

I see it on my install, 9.2.0-SNAPSHOT compiled 2022/11/30, and it was 
also happening on a version compiled a few days earlier.  I have no idea 
when it first started happening.  I tend to glance at the logs every now 
and then, and only look closer at logs that pertain to whatever I am 
working on at that moment.  And I use solr.log a lot more than the 
logging tab in the UI ... this problem does not occur in the actual logfile.

Thanks,
Shawn


Re: SOLR adding ,​ to strings erroneously

Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-12-02 7:41 PM, Shawn Heisey wrote:

> I'm curious as to why those entities are displaying as text instead of 
> being interpreted by the browser as a zero-width space.

I am curious as to why Matthew and I are apparently the only people 
seeing it.

Dima


Re: SOLR adding ,​ to strings erroneously

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/1/22 05:41, Eric Pugh wrote:
> Shawn,
>
> Have we received a couple of mentions of this?  Or am I misremembering?  Do we need to open a JIRA and change how logging.js works?

This is the first mention of this that I can recall seeing, apparently I 
missed Dmitri's issue.

I'm curious as to why those entities are displaying as text instead of 
being interpreted by the browser as a zero-width space.

Thanks,
Shawn


Re: SOLR adding ,​ to strings erroneously

Posted by Eric Pugh <ep...@opensourceconnections.com>.
Shawn, 

Have we received a couple of mentions of this?  Or am I misremembering?  Do we need to open a JIRA and change how logging.js works?

ERic


> On Nov 30, 2022, at 11:44 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
> On 11/30/22 13:44, Matthew Castrigno wrote:
>> Using SOLR 9.0 and the ScriptUpdatProcesor, it appears SOLR is erroneously adding "  ,&#8203 " in the middle of a string field.
>> 
>> The script just logs the fields. If you compare the curl request with what is logged you see the addition of many instances of ,&#8203  in the content field.
> 
> This just happens on the logging tab of the admin UI.  In the javascript file at server/solr-webapp/webapp/js/angular/controllers/logging.js I found the following line:
> 
>           event.message = event.message.replace(/,/g, ',&#8203;');
> 
> HTML character code 8203 is the unicode "zero width space" character.  I think the admin UI code is trying to make long comma separated lists in log entries word-wrap better, and somehow the browser is treating that as literal text rather than an HTML entity.  This is NOT in the data being indexed, it is just in the log.  It's definitely a display bug, but doesn't affect the data being indexes.
> 
> Here you can see the same thing happening with my server running 9.2.0-SNAPSHOT:
> 
> https://www.dropbox.com/s/77yc9bovxwaauu6/solr-logging-html-8203.png?dl=0
> 
> I checked solr.log and that text is NOT there.  I bet if you check solr.log you will also find that it is not there.
> 
> Requests to the URL in my screenshot that do not come from specific IP addresses are blocked.  Those requests never get beyond the reverse proxy.
> 
> Thanks,
> Shawn
> 

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.