You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2017/07/12 06:31:03 UTC
Is JSON facet output removing characters like \t from output
Hi,
Would like to check, does JSON facet output remove characters like \t from
its output?
Currently, we found that if the result is not in the last result set, the
characters like \t will be removed from the output. However, if it is the
last result set, the \t will not be removed.
As there is discrepancy in the results being returned, is this considered a
bug in the output of the JSON facet?
I'm using Solr 6.5.1.
Snapshot of output when \t is not removed:
"description":{
"buckets":[{
"val":"detaildescription\t\t\t\t",
"count":1}]},
Snapshot of output when \t is removed:
"description":{
"buckets":[{
"val":"detaildescription ",
"count":1}]},
Regards,
Edwin
Re: Is JSON facet output removing characters like \t from output
Posted by Erick Erickson <er...@gmail.com>.
Then you'll have to scrub the data on the way in.
Or change the type to something like KeywordTokenizer and use
PatternReplaceCharFilter(Factory) to get rid of unwanted stuff.
Best,
Erick
On Wed, Jul 12, 2017 at 7:07 PM, Zheng Lin Edwin Yeo
<ed...@gmail.com> wrote:
> The field which I am bucketing is indexed using String field, and does not
> pass through any tokenizers.
>
> Regards,
> Edwin
>
> On 12 July 2017 at 21:52, Susheel Kumar <su...@gmail.com> wrote:
>
>> I checked on 6.6 and don't see any such issues. I assume the field you are
>> bucketing on is string/keywordtokenizer not text/analyzed field.
>>
>>
>> ===
>>
>> "facets":{
>>
>> "count":5,
>>
>> "myfacet":{
>>
>> "buckets":[{
>>
>> "val":"A\t\t\t",
>>
>> "count":2},
>>
>> {
>>
>> "val":"L\t\t\t",
>>
>> "count":1},
>>
>> {
>>
>> "val":"P\t\t\t",
>>
>> "count":1},
>>
>> {
>>
>> "val":"Z\t\t\t",
>>
>> "count":1}]}}}
>>
>> On Wed, Jul 12, 2017 at 2:31 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
>> >
>> wrote:
>>
>> > Hi,
>> >
>> > Would like to check, does JSON facet output remove characters like \t
>> from
>> > its output?
>> >
>> > Currently, we found that if the result is not in the last result set, the
>> > characters like \t will be removed from the output. However, if it is the
>> > last result set, the \t will not be removed.
>> >
>> > As there is discrepancy in the results being returned, is this
>> considered a
>> > bug in the output of the JSON facet?
>> >
>> > I'm using Solr 6.5.1.
>> >
>> > Snapshot of output when \t is not removed:
>> >
>> > "description":{
>> > "buckets":[{
>> > "val":"detaildescription\t\t\t\t",
>> > "count":1}]},
>> >
>> > Snapshot of output when \t is removed:
>> >
>> > "description":{
>> > "buckets":[{
>> > "val":"detaildescription ",
>> > "count":1}]},
>> >
>> > Regards,
>> > Edwin
>> >
>>
Re: Is JSON facet output removing characters like \t from output
Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
The field which I am bucketing is indexed using String field, and does not
pass through any tokenizers.
Regards,
Edwin
On 12 July 2017 at 21:52, Susheel Kumar <su...@gmail.com> wrote:
> I checked on 6.6 and don't see any such issues. I assume the field you are
> bucketing on is string/keywordtokenizer not text/analyzed field.
>
>
> ===
>
> "facets":{
>
> "count":5,
>
> "myfacet":{
>
> "buckets":[{
>
> "val":"A\t\t\t",
>
> "count":2},
>
> {
>
> "val":"L\t\t\t",
>
> "count":1},
>
> {
>
> "val":"P\t\t\t",
>
> "count":1},
>
> {
>
> "val":"Z\t\t\t",
>
> "count":1}]}}}
>
> On Wed, Jul 12, 2017 at 2:31 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > Would like to check, does JSON facet output remove characters like \t
> from
> > its output?
> >
> > Currently, we found that if the result is not in the last result set, the
> > characters like \t will be removed from the output. However, if it is the
> > last result set, the \t will not be removed.
> >
> > As there is discrepancy in the results being returned, is this
> considered a
> > bug in the output of the JSON facet?
> >
> > I'm using Solr 6.5.1.
> >
> > Snapshot of output when \t is not removed:
> >
> > "description":{
> > "buckets":[{
> > "val":"detaildescription\t\t\t\t",
> > "count":1}]},
> >
> > Snapshot of output when \t is removed:
> >
> > "description":{
> > "buckets":[{
> > "val":"detaildescription ",
> > "count":1}]},
> >
> > Regards,
> > Edwin
> >
>
Re: Is JSON facet output removing characters like \t from output
Posted by Susheel Kumar <su...@gmail.com>.
I checked on 6.6 and don't see any such issues. I assume the field you are
bucketing on is string/keywordtokenizer not text/analyzed field.
===
"facets":{
"count":5,
"myfacet":{
"buckets":[{
"val":"A\t\t\t",
"count":2},
{
"val":"L\t\t\t",
"count":1},
{
"val":"P\t\t\t",
"count":1},
{
"val":"Z\t\t\t",
"count":1}]}}}
On Wed, Jul 12, 2017 at 2:31 AM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:
> Hi,
>
> Would like to check, does JSON facet output remove characters like \t from
> its output?
>
> Currently, we found that if the result is not in the last result set, the
> characters like \t will be removed from the output. However, if it is the
> last result set, the \t will not be removed.
>
> As there is discrepancy in the results being returned, is this considered a
> bug in the output of the JSON facet?
>
> I'm using Solr 6.5.1.
>
> Snapshot of output when \t is not removed:
>
> "description":{
> "buckets":[{
> "val":"detaildescription\t\t\t\t",
> "count":1}]},
>
> Snapshot of output when \t is removed:
>
> "description":{
> "buckets":[{
> "val":"detaildescription ",
> "count":1}]},
>
> Regards,
> Edwin
>