You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Teresa McMains <te...@t14-consulting.com> on 2020/07/29 14:34:20 UTC

solr query returns items with spaces removed

I am sure I'm doing something silly. Basically it looks like my data is being altered upon search.

This is my fieldType:


<fieldType name="TrimmedString" class="solr.TextField" omitNorms="true">

    <analyzer>



        <!-- Removes anything that isn't a letter or digit -->

        <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([^A-Za-z0-9])" replacement=""/>



        <tokenizer class="solr.KeywordTokenizerFactory" />



        <!-- Normalizes token text to upper case -->

        <filter class="solr.UpperCaseFilterFactory" />



    </analyzer>
</fieldType>

I have a string field called "INSTRUCTIONS" using this field type that looks like this:

ABC_D= PAYMENT FOR CONTRACT AX3764-MP-000-37

With a URL like the one below, I return a bunch of columns of data:

/solr/aml/select?q=TRANSACTION_REFERENCE_NUMBER%3A%22${transactionReferenceNumber}%22&fq=doc_type%3Atrxn&wt=json&fl=_1_Trigger:def(TRIGGER_IND,%22N%22),_2_Transaction_No:TRANSACTION_REFERENCE_NUMBER,_3_Date:TRANSACTION_DATE,_4_Amount:CURRENCY_AMOUNT,_20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22),_21_Transaction_Type:def(TRANSACTION_CDI_DESC,%22%22)&rows=100000

But the data being returned for "INSTRUCTIONS" looks like this:
ABCDPAYMENTFORCONTRACTAX3764MP00037

All spaces and special characters removed.  I thought the field Type filters would impact the index and the query lookup but not the data.
What's even weirder is that other fields that also use this field type (like transaction reference number) do not show the same behavior.

For example a transaction_reference_number like 123-456-7890 is returned correctly.


Can anyone please help me understand or troubleshoot?

Thank you so much,
Teresa



Re: solr query returns items with spaces removed

Posted by Erick Erickson <er...@gmail.com>.
In high throughput situations that can be a problem. The entire
packet has to be assembled and transmitted over the network. This
can cause grief in many situations.

 Not to mention that for “regular” queries, say using the /select or /query
handlers and assuming you’re getting
one or more stored-but-not-docValues fields, that means 2+m seeks
of the disk, decompressing 2+m 16K blocks, creating the entier 2M+
packet in memory and transmitting it to the client.

This doesn’t apply to, say, the /export handler upon which streaming
is built.

For a low-query-volume situations where there are just a few simultaneous
queries you can get away with it. But it’s still an anti-pattern.

Again, though, none of that is relevant if you’re using anything built on
the /export handler, which includes almost all of the streaming capabilities.

Best,
Erick

> On Jul 29, 2020, at 4:59 PM, David Hastings <ha...@gmail.com> wrote:
> 
> "Oh, and returning 100K docs is an anti-pattern, if you really need that
> many docs consider cursorMark and/or Streaming."
> 
> er, i routinely ask for 2+ million records into a single file based on a
> query.  I mean not into a web application or anything, its meant to be
> processed after the fact, but solr has no issue doing this
> 
> 
> 
> On Wed, Jul 29, 2020 at 4:53 PM Erick Erickson <er...@gmail.com>
> wrote:
> 
>> I don’t think there’s really a canned way to do what you’re asking. A
>> custom DocTransformer would probably do the trick though.
>> 
>> You could also create a custom QueryComponent that examined the docs being
>> returned and inserted a blank field for a selected number of fields
>> (possibly configurable in solrconfig.xml).
>> 
>> Oh, and returning 100K docs is an anti-pattern, if you really need that
>> many docs consider cursorMark and/or Streaming.
>> 
>> Best,
>> Erick
>> 
>>> On Jul 29, 2020, at 2:55 PM, Teresa McMains <te...@t14-consulting.com>
>> wrote:
>>> 
>>> Thanks so much.  Is there any other way to return the data value if it
>> exists, otherwise an empty string?  I'm integrating this with a 3rd party
>> app which I can't change. When the field is null it isn't showing up in the
>> output.
>>> 
>>> -----Original Message-----
>>> From: Erick Erickson <er...@gmail.com>
>>> Sent: Wednesday, July 29, 2020 12:49 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: solr query returns items with spaces removed
>>> 
>>> The “def” function goes after the _indexed_ value, so that’s what you’re
>> getting back. Try just specifying “fl=INSTRUCTIONS”, and if the value is
>> stored that should return the original field value before any analysis is
>> done.
>>> 
>>> Why are you using the def function? If the field is absent from the doc,
>> nothing will be returned for that field, not even the name. Are you trying
>> to insure that a blank field is returned if the field isn’t in the
>> document? You can handle that on the client side if so…
>>> 
>>> Best,
>>> Erick
>>> 
>>>> On Jul 29, 2020, at 10:34 AM, Teresa McMains <te...@t14-consulting.com>
>> wrote:
>>>> 
>>>> _20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22)
>>> 
>> 
>> 


Re: solr query returns items with spaces removed

Posted by David Hastings <ha...@gmail.com>.
"Oh, and returning 100K docs is an anti-pattern, if you really need that
many docs consider cursorMark and/or Streaming."

er, i routinely ask for 2+ million records into a single file based on a
query.  I mean not into a web application or anything, its meant to be
processed after the fact, but solr has no issue doing this



On Wed, Jul 29, 2020 at 4:53 PM Erick Erickson <er...@gmail.com>
wrote:

> I don’t think there’s really a canned way to do what you’re asking. A
> custom DocTransformer would probably do the trick though.
>
> You could also create a custom QueryComponent that examined the docs being
> returned and inserted a blank field for a selected number of fields
> (possibly configurable in solrconfig.xml).
>
> Oh, and returning 100K docs is an anti-pattern, if you really need that
> many docs consider cursorMark and/or Streaming.
>
> Best,
> Erick
>
> > On Jul 29, 2020, at 2:55 PM, Teresa McMains <te...@t14-consulting.com>
> wrote:
> >
> > Thanks so much.  Is there any other way to return the data value if it
> exists, otherwise an empty string?  I'm integrating this with a 3rd party
> app which I can't change. When the field is null it isn't showing up in the
> output.
> >
> > -----Original Message-----
> > From: Erick Erickson <er...@gmail.com>
> > Sent: Wednesday, July 29, 2020 12:49 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: solr query returns items with spaces removed
> >
> > The “def” function goes after the _indexed_ value, so that’s what you’re
> getting back. Try just specifying “fl=INSTRUCTIONS”, and if the value is
> stored that should return the original field value before any analysis is
> done.
> >
> > Why are you using the def function? If the field is absent from the doc,
> nothing will be returned for that field, not even the name. Are you trying
> to insure that a blank field is returned if the field isn’t in the
> document? You can handle that on the client side if so…
> >
> > Best,
> > Erick
> >
> >> On Jul 29, 2020, at 10:34 AM, Teresa McMains <te...@t14-consulting.com>
> wrote:
> >>
> >> _20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22)
> >
>
>

Re: solr query returns items with spaces removed

Posted by Erick Erickson <er...@gmail.com>.
I don’t think there’s really a canned way to do what you’re asking. A custom DocTransformer would probably do the trick though.

You could also create a custom QueryComponent that examined the docs being returned and inserted a blank field for a selected number of fields (possibly configurable in solrconfig.xml).

Oh, and returning 100K docs is an anti-pattern, if you really need that many docs consider cursorMark and/or Streaming.

Best,
Erick

> On Jul 29, 2020, at 2:55 PM, Teresa McMains <te...@t14-consulting.com> wrote:
> 
> Thanks so much.  Is there any other way to return the data value if it exists, otherwise an empty string?  I'm integrating this with a 3rd party app which I can't change. When the field is null it isn't showing up in the output.
> 
> -----Original Message-----
> From: Erick Erickson <er...@gmail.com> 
> Sent: Wednesday, July 29, 2020 12:49 PM
> To: solr-user@lucene.apache.org
> Subject: Re: solr query returns items with spaces removed
> 
> The “def” function goes after the _indexed_ value, so that’s what you’re getting back. Try just specifying “fl=INSTRUCTIONS”, and if the value is stored that should return the original field value before any analysis is done.
> 
> Why are you using the def function? If the field is absent from the doc, nothing will be returned for that field, not even the name. Are you trying to insure that a blank field is returned if the field isn’t in the document? You can handle that on the client side if so…
> 
> Best,
> Erick
> 
>> On Jul 29, 2020, at 10:34 AM, Teresa McMains <te...@t14-consulting.com> wrote:
>> 
>> _20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22)
> 


RE: solr query returns items with spaces removed

Posted by Teresa McMains <te...@t14-consulting.com>.
Thanks so much.  Is there any other way to return the data value if it exists, otherwise an empty string?  I'm integrating this with a 3rd party app which I can't change. When the field is null it isn't showing up in the output.

-----Original Message-----
From: Erick Erickson <er...@gmail.com> 
Sent: Wednesday, July 29, 2020 12:49 PM
To: solr-user@lucene.apache.org
Subject: Re: solr query returns items with spaces removed

The “def” function goes after the _indexed_ value, so that’s what you’re getting back. Try just specifying “fl=INSTRUCTIONS”, and if the value is stored that should return the original field value before any analysis is done.

Why are you using the def function? If the field is absent from the doc, nothing will be returned for that field, not even the name. Are you trying to insure that a blank field is returned if the field isn’t in the document? You can handle that on the client side if so…

Best,
Erick

> On Jul 29, 2020, at 10:34 AM, Teresa McMains <te...@t14-consulting.com> wrote:
> 
> _20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22)


Re: solr query returns items with spaces removed

Posted by Erick Erickson <er...@gmail.com>.
The “def” function goes after the _indexed_ value, so that’s what you’re getting back. Try just specifying “fl=INSTRUCTIONS”, and if the value is stored that should return the original field value before any analysis is done.

Why are you using the def function? If the field is absent from the doc, nothing will be returned for that field, not even the name. Are you trying to insure that a blank field is returned if the field isn’t in the document? You can handle that on the client side if so…

Best,
Erick

> On Jul 29, 2020, at 10:34 AM, Teresa McMains <te...@t14-consulting.com> wrote:
> 
> _20_Instructions_And_Notes:def(INSTRUCTIONS,%22%22)