You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2017/07/12 06:31:03 UTC

Is JSON facet output removing characters like \t from output

Hi,

Would like to check, does JSON facet output remove characters like \t from
its output?

Currently, we found that if the result is not in the last result set, the
characters like \t will be removed from the output. However, if it is the
last result set, the \t will not be removed.

As there is discrepancy in the results being returned, is this considered a
bug in the output of the JSON facet?

I'm using Solr 6.5.1.

Snapshot of output when \t is not removed:

  "description":{
        "buckets":[{
           "val":"detaildescription\t\t\t\t",
            "count":1}]},

Snapshot of output when \t is removed:

  "description":{
        "buckets":[{
           "val":"detaildescription        ",
            "count":1}]},

Regards,
Edwin

Re: Is JSON facet output removing characters like \t from output

Posted by Erick Erickson <er...@gmail.com>.
Then you'll have to scrub the data on the way in.

Or change the type to something like KeywordTokenizer and use
PatternReplaceCharFilter(Factory) to get  rid of unwanted stuff.

Best,
Erick

On Wed, Jul 12, 2017 at 7:07 PM, Zheng Lin Edwin Yeo
<ed...@gmail.com> wrote:
> The field which I am bucketing is indexed using String field, and does not
> pass through any tokenizers.
>
> Regards,
> Edwin
>
> On 12 July 2017 at 21:52, Susheel Kumar <su...@gmail.com> wrote:
>
>> I checked on 6.6 and don't see any such issues. I assume the field you are
>> bucketing on is string/keywordtokenizer not text/analyzed field.
>>
>>
>> ===
>>
>> "facets":{
>>
>>     "count":5,
>>
>>     "myfacet":{
>>
>>       "buckets":[{
>>
>>           "val":"A\t\t\t",
>>
>>           "count":2},
>>
>>         {
>>
>>           "val":"L\t\t\t",
>>
>>           "count":1},
>>
>>         {
>>
>>           "val":"P\t\t\t",
>>
>>           "count":1},
>>
>>         {
>>
>>           "val":"Z\t\t\t",
>>
>>           "count":1}]}}}
>>
>> On Wed, Jul 12, 2017 at 2:31 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
>> >
>> wrote:
>>
>> > Hi,
>> >
>> > Would like to check, does JSON facet output remove characters like \t
>> from
>> > its output?
>> >
>> > Currently, we found that if the result is not in the last result set, the
>> > characters like \t will be removed from the output. However, if it is the
>> > last result set, the \t will not be removed.
>> >
>> > As there is discrepancy in the results being returned, is this
>> considered a
>> > bug in the output of the JSON facet?
>> >
>> > I'm using Solr 6.5.1.
>> >
>> > Snapshot of output when \t is not removed:
>> >
>> >   "description":{
>> >         "buckets":[{
>> >            "val":"detaildescription\t\t\t\t",
>> >             "count":1}]},
>> >
>> > Snapshot of output when \t is removed:
>> >
>> >   "description":{
>> >         "buckets":[{
>> >            "val":"detaildescription        ",
>> >             "count":1}]},
>> >
>> > Regards,
>> > Edwin
>> >
>>

Re: Is JSON facet output removing characters like \t from output

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
The field which I am bucketing is indexed using String field, and does not
pass through any tokenizers.

Regards,
Edwin

On 12 July 2017 at 21:52, Susheel Kumar <su...@gmail.com> wrote:

> I checked on 6.6 and don't see any such issues. I assume the field you are
> bucketing on is string/keywordtokenizer not text/analyzed field.
>
>
> ===
>
> "facets":{
>
>     "count":5,
>
>     "myfacet":{
>
>       "buckets":[{
>
>           "val":"A\t\t\t",
>
>           "count":2},
>
>         {
>
>           "val":"L\t\t\t",
>
>           "count":1},
>
>         {
>
>           "val":"P\t\t\t",
>
>           "count":1},
>
>         {
>
>           "val":"Z\t\t\t",
>
>           "count":1}]}}}
>
> On Wed, Jul 12, 2017 at 2:31 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > Would like to check, does JSON facet output remove characters like \t
> from
> > its output?
> >
> > Currently, we found that if the result is not in the last result set, the
> > characters like \t will be removed from the output. However, if it is the
> > last result set, the \t will not be removed.
> >
> > As there is discrepancy in the results being returned, is this
> considered a
> > bug in the output of the JSON facet?
> >
> > I'm using Solr 6.5.1.
> >
> > Snapshot of output when \t is not removed:
> >
> >   "description":{
> >         "buckets":[{
> >            "val":"detaildescription\t\t\t\t",
> >             "count":1}]},
> >
> > Snapshot of output when \t is removed:
> >
> >   "description":{
> >         "buckets":[{
> >            "val":"detaildescription        ",
> >             "count":1}]},
> >
> > Regards,
> > Edwin
> >
>

Re: Is JSON facet output removing characters like \t from output

Posted by Susheel Kumar <su...@gmail.com>.
I checked on 6.6 and don't see any such issues. I assume the field you are
bucketing on is string/keywordtokenizer not text/analyzed field.


===

"facets":{

    "count":5,

    "myfacet":{

      "buckets":[{

          "val":"A\t\t\t",

          "count":2},

        {

          "val":"L\t\t\t",

          "count":1},

        {

          "val":"P\t\t\t",

          "count":1},

        {

          "val":"Z\t\t\t",

          "count":1}]}}}

On Wed, Jul 12, 2017 at 2:31 AM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi,
>
> Would like to check, does JSON facet output remove characters like \t from
> its output?
>
> Currently, we found that if the result is not in the last result set, the
> characters like \t will be removed from the output. However, if it is the
> last result set, the \t will not be removed.
>
> As there is discrepancy in the results being returned, is this considered a
> bug in the output of the JSON facet?
>
> I'm using Solr 6.5.1.
>
> Snapshot of output when \t is not removed:
>
>   "description":{
>         "buckets":[{
>            "val":"detaildescription\t\t\t\t",
>             "count":1}]},
>
> Snapshot of output when \t is removed:
>
>   "description":{
>         "buckets":[{
>            "val":"detaildescription        ",
>             "count":1}]},
>
> Regards,
> Edwin
>