You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2015/05/19 02:56:37 UTC

Is it possible to search for the empty string?

Can I search for the empty string?  This is distinct from searching for
documents that don't have a certain fieldat all, which I can already do
with a clause of "*:* -field:[*TO *]"in my query.

Thanks,
Shawn


Re: Is it possible to search for the empty string?

Posted by Lianyi Han <li...@gmail.com>.
Sounds interesting to test it out.   I only played with 5.10 recently on
empty string search,it seems that  *field:""* (double quotes) works, but
not *field:''* ( single quotes )

Best,

-lianyi
"Unity of knowing and doing"

On Mon, May 18, 2015 at 9:21 PM, Shawn Heisey <el...@elyograg.org> wrote:

> On 5/18/2015 7:16 PM, Lianyi Han wrote:
> > Would field:"" works in your case?
> >
> > Best,
> > On Mon, May 18, 2015 at 8:56 PM Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> >> Can I search for the empty string?  This is distinct from searching for
> >> documents that don't have a certain fieldat all, which I can already do
> >> with a clause of "*:* -field:[*TO *]"in my query.
> >>
> >> Thanks,
> >> Shawn
>
> I haven't tried it, because right now my indexing code barfs on an empty
> string, so there are no empty strings in the index.  I'll need to
> manually put a document in the index and try that.
>
> A possible further complication: The field is whitespace tokenized and
> multivalued, where one value may be the empty string but other values
> could be non-empty strings.
>
> Thanks,
> Shawn
>
>

Re: Is it possible to search for the empty string?

Posted by Shawn Heisey <el...@elyograg.org>.
On 5/18/2015 7:16 PM, Lianyi Han wrote:
> Would field:"" works in your case?
> 
> Best,
> On Mon, May 18, 2015 at 8:56 PM Shawn Heisey <ap...@elyograg.org> wrote:
> 
>> Can I search for the empty string?  This is distinct from searching for
>> documents that don't have a certain fieldat all, which I can already do
>> with a clause of "*:* -field:[*TO *]"in my query.
>>
>> Thanks,
>> Shawn

I haven't tried it, because right now my indexing code barfs on an empty
string, so there are no empty strings in the index.  I'll need to
manually put a document in the index and try that.

A possible further complication: The field is whitespace tokenized and
multivalued, where one value may be the empty string but other values
could be non-empty strings.

Thanks,
Shawn


Re: Is it possible to search for the empty string?

Posted by Lianyi Han <li...@gmail.com>.
Would field:"" works in your case?

Best,
On Mon, May 18, 2015 at 8:56 PM Shawn Heisey <ap...@elyograg.org> wrote:

> Can I search for the empty string?  This is distinct from searching for
> documents that don't have a certain fieldat all, which I can already do
> with a clause of "*:* -field:[*TO *]"in my query.
>
> Thanks,
> Shawn
>
>

Re: Is it possible to search for the empty string?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/18/2015 7:34 PM, Walter Underwood wrote:
> Not out of the box.
> 
> Fields are parsed into tokens and queries search on tokens. An empty string has no tokens for that field and a missing field has no tokens for that field.
> 
> If you really need to do this, then you’ll need to turn the empty string in a special token that means “empty string”, choosing a token that won’t conflict with any real token. At this point, we’re moving into Ugly Hack Land, but sometimes that is the best we can do.
> 
> For example, you could create an update request processor that checked for a field with an empty value, then replaced that with a rare character, perhaps a composed, compatibility Unicode character, like Angstrom (same as circle capital A), or one of the TV Guide symbols (numbers in TV-shaped surrounds), or my personal favorite, the IPA symbol for “audible gnashing of teeth”, which could be an appropriate response to this request.
> 
> That character is U+02AD, "LATIN LETTER BIDENTAL PERCUSSIVE” (http://www.fileformat.info/info/unicode/char/2ad/index.htm).
> 
> Then you would need to craft a query using this special token to mean “empty string”.
> 
> Of course, none of this works if some upstream processing of the update document strips fields with empty values.

This field is already being handled by a custom update processor that I
wrote.  The empty string in the source data is leading to an
IndexOutOfBounds exception because of the way my parsing works on the
info that is sent to Solr.  Instead of throwing the exception, I could
trap it and replace the empty string with something we can look for in a
query.

I have added an additional parameter to my update processor config that
makes it replace empty strings with a specific string.

Thanks,
Shawn


Re: Is it possible to search for the empty string?

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Re: Is it possible to search for the empty string?
: 
: Not out of the box.
: 
: Fields are parsed into tokens and queries search on tokens. An empty 
: string has no tokens for that field and a missing field has no tokens 
: for that field.

that's a missleading over simplification of what *normally* happens.

it is absolutely possible to have documents with fields whose indexed 
temrs consist of the empty string, and to search for those empty 
strings -- the most trivial way being with a simple StrField -- but using 
TExtField with some creative analyzers it's also very possible..


$ curl 'http://localhost:8983/solr/techproducts/select?q=*:*&facet=true&facet.field=foo_s&wt=json&indent=true&omitHeader=true'
{
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"foo_blank",
        "foo_s":"",
        "_version_":1501816569733316608},
      {
        "id":"foo_non_blank",
        "foo_s":"bar",
        "_version_":1501816583564034048},
      {
        "id":"foo_missing",
        "_version_":1501816591383265280}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "foo_s":[
        "",1,
        "bar",1]},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_heatmaps":{}}}

$ curl 'http://localhost:8983/solr/techproducts/select?q=foo_s:""&wt=json&indent=true&omitHeader=true'
{
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"foo_blank",
        "foo_s":"",
        "_version_":1501816569733316608}]
  }}

$ curl 'http://localhost:8983/solr/techproducts/select?q=foo_s:*&wt=json&indent=true&omitHeader=true'
{
  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"foo_blank",
        "foo_s":"",
        "_version_":1501816569733316608},
      {
        "id":"foo_non_blank",
        "foo_s":"bar",
        "_version_":1501816583564034048}]
  }}

$ curl 'http://localhost:8983/solr/techproducts/select?q=-foo_s:*&wt=json&indent=true&omitHeader=true'
{
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"foo_missing",
        "_version_":1501816591383265280}]
  }}


-Hoss
http://www.lucidworks.com/

Re: Is it possible to search for the empty string?

Posted by Walter Underwood <wu...@wunderwood.org>.
Not out of the box.

Fields are parsed into tokens and queries search on tokens. An empty string has no tokens for that field and a missing field has no tokens for that field.

If you really need to do this, then you’ll need to turn the empty string in a special token that means “empty string”, choosing a token that won’t conflict with any real token. At this point, we’re moving into Ugly Hack Land, but sometimes that is the best we can do.

For example, you could create an update request processor that checked for a field with an empty value, then replaced that with a rare character, perhaps a composed, compatibility Unicode character, like Angstrom (same as circle capital A), or one of the TV Guide symbols (numbers in TV-shaped surrounds), or my personal favorite, the IPA symbol for “audible gnashing of teeth”, which could be an appropriate response to this request.

That character is U+02AD, "LATIN LETTER BIDENTAL PERCUSSIVE” (http://www.fileformat.info/info/unicode/char/2ad/index.htm).

Then you would need to craft a query using this special token to mean “empty string”.

Of course, none of this works if some upstream processing of the update document strips fields with empty values.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On May 18, 2015, at 5:56 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> Can I search for the empty string?  This is distinct from searching for
> documents that don't have a certain fieldat all, which I can already do
> with a clause of "*:* -field:[*TO *]"in my query.
> 
> Thanks,
> Shawn
>