You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Shawn Heisey <ap...@elyograg.org> on 2022/12/01 04:44:38 UTC
Re: SOLR adding , to strings erroneously
On 11/30/22 13:44, Matthew Castrigno wrote:
> Using SOLR 9.0 and the ScriptUpdatProcesor, it appears SOLR is erroneously adding " ,​ " in the middle of a string field.
>
> The script just logs the fields. If you compare the curl request with what is logged you see the addition of many instances of ,​ in the content field.
This just happens on the logging tab of the admin UI. In the javascript
file at server/solr-webapp/webapp/js/angular/controllers/logging.js I
found the following line:
event.message = event.message.replace(/,/g, ',​');
HTML character code 8203 is the unicode "zero width space" character. I
think the admin UI code is trying to make long comma separated lists in
log entries word-wrap better, and somehow the browser is treating that
as literal text rather than an HTML entity. This is NOT in the data
being indexed, it is just in the log. It's definitely a display bug,
but doesn't affect the data being indexes.
Here you can see the same thing happening with my server running
9.2.0-SNAPSHOT:
https://www.dropbox.com/s/77yc9bovxwaauu6/solr-logging-html-8203.png?dl=0
I checked solr.log and that text is NOT there. I bet if you check
solr.log you will also find that it is not there.
Requests to the URL in my screenshot that do not come from specific IP
addresses are blocked. Those requests never get beyond the reverse proxy.
Thanks,
Shawn
Re: SOLR adding , to strings erroneously
Posted by Thomas Corthals <th...@klascement.net>.
Op do 1 dec. 2022 om 17:12 schreef dmitri maziuk <dm...@gmail.com>:
> On 2022-12-01 6:41 AM, Eric Pugh wrote:
> > Shawn,
> >
> > Have we received a couple of mentions of this? Or am I misremembering?
> Do we need to open a JIRA and change how logging.js works?
>
>
> https://issues.apache.org/jira/projects/SOLR/issues/SOLR-16469
>
> It's interesting that the browser doesn't interpret the character
> correctly, but why not just add an actual space instead of a zero-length
> space?
>
> Dima
>
>
If the browser doesn't interpret it, it's probably used in a place
that expects text content instead of HTML.
The fix could be as simple as replacing it with ',\u200B' to insert the
actual Unicode character.
Thomas
Re: SOLR adding , to strings erroneously
Posted by Michael Conrad <mi...@newsrx.com>.
We use the zero-width space trick here to ensure links break properly
when formatting them for PDF and HTML display in our articles with extra
long URLs. Inserting a regular space would work, but also would display
incorrectly for human parsing.
On 12/1/22 11:12, dmitri maziuk wrote:
> On 2022-12-01 6:41 AM, Eric Pugh wrote:
>> Shawn,
>>
>> Have we received a couple of mentions of this? Or am I
>> misremembering? Do we need to open a JIRA and change how logging.js
>> works?
>
>
> https://issues.apache.org/jira/projects/SOLR/issues/SOLR-16469
>
> It's interesting that the browser doesn't interpret the character
> correctly, but why not just add an actual space instead of a
> zero-length space?
>
> Dima
>
Re: SOLR adding , to strings erroneously
Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-12-01 6:41 AM, Eric Pugh wrote:
> Shawn,
>
> Have we received a couple of mentions of this? Or am I misremembering? Do we need to open a JIRA and change how logging.js works?
https://issues.apache.org/jira/projects/SOLR/issues/SOLR-16469
It's interesting that the browser doesn't interpret the character
correctly, but why not just add an actual space instead of a zero-length
space?
Dima
Re: SOLR adding , to strings erroneously
Posted by Thomas Corthals <th...@klascement.net>.
Op za 3 dec. 2022 om 18:47 schreef Shawn Heisey <ap...@elyograg.org>:
> On 12/3/22 10:38, dmitri maziuk wrote:
> > On 2022-12-02 7:41 PM, Shawn Heisey wrote:
> >
> >> I'm curious as to why those entities are displaying as text instead
> >> of being interpreted by the browser as a zero-width space.
> >
> > I am curious as to why Matthew and I are apparently the only people
> > seeing it.
>
> I see it on my install, 9.2.0-SNAPSHOT compiled 2022/11/30, and it was
> also happening on a version compiled a few days earlier. I have no idea
> when it first started happening. I tend to glance at the logs every now
> and then, and only look closer at logs that pertain to whatever I am
> working on at that moment. And I use solr.log a lot more than the
> logging tab in the UI ... this problem does not occur in the actual
> logfile.
>
> Thanks,
> Shawn
>
Earliest version I have running is 8.4.0 and I'm seeing it there in the
admin UI as well.
Thomas
Re: SOLR adding , to strings erroneously
Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/3/22 10:38, dmitri maziuk wrote:
> On 2022-12-02 7:41 PM, Shawn Heisey wrote:
>
>> I'm curious as to why those entities are displaying as text instead
>> of being interpreted by the browser as a zero-width space.
>
> I am curious as to why Matthew and I are apparently the only people
> seeing it.
I see it on my install, 9.2.0-SNAPSHOT compiled 2022/11/30, and it was
also happening on a version compiled a few days earlier. I have no idea
when it first started happening. I tend to glance at the logs every now
and then, and only look closer at logs that pertain to whatever I am
working on at that moment. And I use solr.log a lot more than the
logging tab in the UI ... this problem does not occur in the actual logfile.
Thanks,
Shawn
Re: SOLR adding , to strings erroneously
Posted by dmitri maziuk <dm...@gmail.com>.
On 2022-12-02 7:41 PM, Shawn Heisey wrote:
> I'm curious as to why those entities are displaying as text instead of
> being interpreted by the browser as a zero-width space.
I am curious as to why Matthew and I are apparently the only people
seeing it.
Dima
Re: SOLR adding , to strings erroneously
Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/1/22 05:41, Eric Pugh wrote:
> Shawn,
>
> Have we received a couple of mentions of this? Or am I misremembering? Do we need to open a JIRA and change how logging.js works?
This is the first mention of this that I can recall seeing, apparently I
missed Dmitri's issue.
I'm curious as to why those entities are displaying as text instead of
being interpreted by the browser as a zero-width space.
Thanks,
Shawn
Re: SOLR adding , to strings erroneously
Posted by Eric Pugh <ep...@opensourceconnections.com>.
Shawn,
Have we received a couple of mentions of this? Or am I misremembering? Do we need to open a JIRA and change how logging.js works?
ERic
> On Nov 30, 2022, at 11:44 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>
> On 11/30/22 13:44, Matthew Castrigno wrote:
>> Using SOLR 9.0 and the ScriptUpdatProcesor, it appears SOLR is erroneously adding " ,​ " in the middle of a string field.
>>
>> The script just logs the fields. If you compare the curl request with what is logged you see the addition of many instances of ,​ in the content field.
>
> This just happens on the logging tab of the admin UI. In the javascript file at server/solr-webapp/webapp/js/angular/controllers/logging.js I found the following line:
>
> event.message = event.message.replace(/,/g, ',​');
>
> HTML character code 8203 is the unicode "zero width space" character. I think the admin UI code is trying to make long comma separated lists in log entries word-wrap better, and somehow the browser is treating that as literal text rather than an HTML entity. This is NOT in the data being indexed, it is just in the log. It's definitely a display bug, but doesn't affect the data being indexes.
>
> Here you can see the same thing happening with my server running 9.2.0-SNAPSHOT:
>
> https://www.dropbox.com/s/77yc9bovxwaauu6/solr-logging-html-8203.png?dl=0
>
> I checked solr.log and that text is NOT there. I bet if you check solr.log you will also find that it is not there.
>
> Requests to the URL in my screenshot that do not come from specific IP addresses are blocked. Those requests never get beyond the reverse proxy.
>
> Thanks,
> Shawn
>
_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.