You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jeff Davis <ja...@gmail.com> on 2005/07/19 17:19:51 UTC

QueryParser handling of backslash characters

Hi,

I'm seeing some strange behavior in the way the QueryParser handles
consecutive backslash characters.  I know that backslash is the escape
character in Lucene, and so I would expect "\\\\" to match fields that
have two consecutive backslashes, but this does not seem to be the
case.

The fields I'm searching are UNC paths, e.g. "\\192.168.0.15\public". 
The only way I can get my query to find the record containing that
value is to type "FieldName:\\\192.168.0.15\\public" (three slashes). 
Why is the third backslash character not treated as an escape?  Is it
just that any backslash that is preceded by a backslash is interpreted
as a literal backslash character, regardless of whether the "escape"
backslash was itself escaped?

I can code around this, but it seems inconsistent with the way that
escape characters usually work.  Is this a bug, or is it intentional,
or am I missing something?

Thanks,
Jeff

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: QueryParser handling of backslash characters

Posted by Jeff Davis <ja...@gmail.com>.
That fix works perfectly, as far as I can tell.

As for the unit test, it should actually be:
assertEquals("\\\\192.168.0.15\\public", discardEscapeChar
("\\\\\\\\192.168.0.15\\\\public"));

Jeff


On 7/20/05, Eyal <ey...@gmail.com> wrote:
> I think this should work:
> 
> (Written in C# originally - so someone please check if it compiles - I don't
> have a java compiler here)
> 
>     private String discardEscapeChar(String input)
>     {
>       char[] caSource = input.toCharArray();
>       char[] caDest = new char[caSource.length];
>       int j = 0;
> 
>       for (int i = 0; i < caSource.length; i++)
>       {
>         if (caSource[i] == '\\')
>         {
>           if (caSource.length == ++i)
>             break;
>         }
>         caDest[j++]=caSource[i];
>       }
>       return new String(caDest, 0, j);
>     }
> 
> 
> Regarding your UnitTest - It think it's wrong:
> 
> >      assertEquals("\\\\\\\\192.168.0.15\\\\public",
> > discardEscapeChar ("\\\\192.168.0.15\\\\public"));
> 
> It should be: assertEquals("\\\\192.168.0.15\\\\public", discardEscapeChar
> ("\\\\\\\\192.168.0.15\\\\public"));
> 
> I would also suggest to add the following:
> String s="\\\\some.host.name\\dir+:+-!():^[]\{}~*?";
> assertEquals(s,discardEscapeChar(escape(s)));
> 
> Eyal
> 
> > -----Original Message-----
> > From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> > Sent: Wednesday, July 20, 2005 22:38 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: QueryParser handling of backslash characters
> >
> >
> > On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:
> >
> > > Hi,
> > >
> > > I'm seeing some strange behavior in the way the QueryParser handles
> > > consecutive backslash characters.  I know that backslash is
> > the escape
> > > character in Lucene, and so I would expect "\\\\" to match
> > fields that
> > > have two consecutive backslashes, but this does not seem to be the
> > > case.
> > >
> > > The fields I'm searching are UNC paths, e.g.
> > "\\192.168.0.15\public".
> > > The only way I can get my query to find the record containing that
> > > value is to type "FieldName:\\\192.168.0.15\\public" (three
> > slashes).
> > > Why is the third backslash character not treated as an
> > escape?  Is it
> > > just that any backslash that is preceded by a backslash is
> > interpreted
> > > as a literal backslash character, regardless of whether the "escape"
> > > backslash was itself escaped?
> > >
> > > I can code around this, but it seems inconsistent with the way that
> > > escape characters usually work.  Is this a bug, or is it
> > intentional,
> > > or am I missing something?
> >
> > I've waited until I had a chance to experiment with this
> > before replying.  I say that this is a bug.  There is a
> > private method in QueryParser called discardEscapeChar (shown
> > below).  I copied it to a JUnit test case and gave it this assert:
> >
> >      assertEquals("\\\\\\\\192.168.0.15\\\\public",
> > discardEscapeChar ("\\\\192.168.0.15\\\\public"));
> >
> > This test fails with:
> >
> >      Expected:\\\\192.168.0.15\\public
> >      Actual  :\192.168.0.15\public
> >
> > Which is wrong in my opinion.  (though my head hurts thinking
> > about metaescaping backslashes in Java code to make this a
> > proper test)
> >
> > The bug is isolated to the discardEscapeChar() method where
> > it eats too many backslashes.  Could you have a shot at
> > tweaking that method to do the right thing and submit a patch?
> >
> >    private String discardEscapeChar(String input) {
> >      char[] caSource = input.toCharArray();
> >      char[] caDest = new char[caSource.length];
> >      int j = 0;
> >      for (int i = 0; i < caSource.length; i++) {
> >        if ((caSource[i] != '\\') || (i > 0 && caSource[i-1]
> > == '\\')) {
> >          caDest[j++]=caSource[i];
> >        }
> >      }
> >      return new String(caDest, 0, j);
> >    }
> >
> > Erik
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: QueryParser handling of backslash characters

Posted by Eyal <ey...@gmail.com>.
I think this should work:

(Written in C# originally - so someone please check if it compiles - I don't
have a java compiler here)

    private String discardEscapeChar(String input) 
    {
      char[] caSource = input.toCharArray();
      char[] caDest = new char[caSource.length];
      int j = 0;

      for (int i = 0; i < caSource.length; i++) 
      {
        if (caSource[i] == '\\')
        {
          if (caSource.length == ++i)
            break;
        }
        caDest[j++]=caSource[i];
      }
      return new String(caDest, 0, j);
    }
 

Regarding your UnitTest - It think it's wrong:

>      assertEquals("\\\\\\\\192.168.0.15\\\\public", 
> discardEscapeChar ("\\\\192.168.0.15\\\\public"));

It should be: assertEquals("\\\\192.168.0.15\\\\public", discardEscapeChar
("\\\\\\\\192.168.0.15\\\\public"));

I would also suggest to add the following:
String s="\\\\some.host.name\\dir+:+-!():^[]\{}~*?";
assertEquals(s,discardEscapeChar(escape(s)));

Eyal

> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
> Sent: Wednesday, July 20, 2005 22:38 PM
> To: java-user@lucene.apache.org
> Subject: Re: QueryParser handling of backslash characters
> 
> 
> On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:
> 
> > Hi,
> >
> > I'm seeing some strange behavior in the way the QueryParser handles 
> > consecutive backslash characters.  I know that backslash is 
> the escape 
> > character in Lucene, and so I would expect "\\\\" to match 
> fields that 
> > have two consecutive backslashes, but this does not seem to be the 
> > case.
> >
> > The fields I'm searching are UNC paths, e.g. 
> "\\192.168.0.15\public".
> > The only way I can get my query to find the record containing that 
> > value is to type "FieldName:\\\192.168.0.15\\public" (three 
> slashes).
> > Why is the third backslash character not treated as an 
> escape?  Is it 
> > just that any backslash that is preceded by a backslash is 
> interpreted 
> > as a literal backslash character, regardless of whether the "escape"
> > backslash was itself escaped?
> >
> > I can code around this, but it seems inconsistent with the way that 
> > escape characters usually work.  Is this a bug, or is it 
> intentional, 
> > or am I missing something?
> 
> I've waited until I had a chance to experiment with this 
> before replying.  I say that this is a bug.  There is a 
> private method in QueryParser called discardEscapeChar (shown 
> below).  I copied it to a JUnit test case and gave it this assert:
> 
>      assertEquals("\\\\\\\\192.168.0.15\\\\public", 
> discardEscapeChar ("\\\\192.168.0.15\\\\public"));
> 
> This test fails with:
> 
>      Expected:\\\\192.168.0.15\\public
>      Actual  :\192.168.0.15\public
> 
> Which is wrong in my opinion.  (though my head hurts thinking 
> about metaescaping backslashes in Java code to make this a 
> proper test)
> 
> The bug is isolated to the discardEscapeChar() method where 
> it eats too many backslashes.  Could you have a shot at 
> tweaking that method to do the right thing and submit a patch?
> 
>    private String discardEscapeChar(String input) {
>      char[] caSource = input.toCharArray();
>      char[] caDest = new char[caSource.length];
>      int j = 0;
>      for (int i = 0; i < caSource.length; i++) {
>        if ((caSource[i] != '\\') || (i > 0 && caSource[i-1] 
> == '\\')) {
>          caDest[j++]=caSource[i];
>        }
>      }
>      return new String(caDest, 0, j);
>    }
> 
> Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: QueryParser handling of backslash characters

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 19, 2005, at 11:19 AM, Jeff Davis wrote:

> Hi,
>
> I'm seeing some strange behavior in the way the QueryParser handles
> consecutive backslash characters.  I know that backslash is the escape
> character in Lucene, and so I would expect "\\\\" to match fields that
> have two consecutive backslashes, but this does not seem to be the
> case.
>
> The fields I'm searching are UNC paths, e.g. "\\192.168.0.15\public".
> The only way I can get my query to find the record containing that
> value is to type "FieldName:\\\192.168.0.15\\public" (three slashes).
> Why is the third backslash character not treated as an escape?  Is it
> just that any backslash that is preceded by a backslash is interpreted
> as a literal backslash character, regardless of whether the "escape"
> backslash was itself escaped?
>
> I can code around this, but it seems inconsistent with the way that
> escape characters usually work.  Is this a bug, or is it intentional,
> or am I missing something?

I've waited until I had a chance to experiment with this before  
replying.  I say that this is a bug.  There is a private method in  
QueryParser called discardEscapeChar (shown below).  I copied it to a  
JUnit test case and gave it this assert:

     assertEquals("\\\\\\\\192.168.0.15\\\\public", discardEscapeChar 
("\\\\192.168.0.15\\\\public"));

This test fails with:

     Expected:\\\\192.168.0.15\\public
     Actual  :\192.168.0.15\public

Which is wrong in my opinion.  (though my head hurts thinking about  
metaescaping backslashes in Java code to make this a proper test)

The bug is isolated to the discardEscapeChar() method where it eats  
too many backslashes.  Could you have a shot at tweaking that method  
to do the right thing and submit a patch?

   private String discardEscapeChar(String input) {
     char[] caSource = input.toCharArray();
     char[] caDest = new char[caSource.length];
     int j = 0;
     for (int i = 0; i < caSource.length; i++) {
       if ((caSource[i] != '\\') || (i > 0 && caSource[i-1] == '\\')) {
         caDest[j++]=caSource[i];
       }
     }
     return new String(caDest, 0, j);
   }

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org