You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Donald Mackert <ma...@gmail.com> on 2020/03/18 23:22:32 UTC

Does Accumulo egrep support regex negative lookahead?

Hello,

    Does the accumulo egrep command support regex negative look ahead?

    We are trying to find all rows that do not have a UUID pattern using
the following sample command

   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$

  The following egrep returns all rows that match the pattern

  egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$

Thank you,

Don

Re: Does Accumulo egrep support regex negative lookahead?

Posted by Donald Mackert <ma...@gmail.com>.
Christopher,

    Thank you.  Will give these a try.

Don

On Thu, Mar 19, 2020 at 12:57 PM Christopher <ct...@apache.org> wrote:

> I inserted some sample data and was able to use a regex to find values
> that matched "itemId: " followed by a valid UUID, and using negative
> look ahead, "itemId: " followed by anything other than a valid UUID.
>
> See below:
>
> root@uno t1> scan
> a b:c []    itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc
> b b:c []    itemId: 11aa22bbd33d2abcav34-11d25d334455
> c b:c []    nope: 11aa22bbd33d2abcav34-11d25d334455
> d b:c []    nope: 11aa22bb-d33d-2abc-av34-11d25d334455
> root@uno t1> egrep '.*itemId:
> (?:[a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
> a b:c []    itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc
> root@uno t1> egrep '.*itemId:
> (?\![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
> egrep '.*itemId: (?![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
> b b:c []    itemId: 11aa22bbd33d2abcav34-11d25d334455
>
>
> On Thu, Mar 19, 2020 at 8:17 AM Donald Mackert <ma...@gmail.com> wrote:
> >
> > Christopher,
> >
> >         BLUF: We want to find all rows for a given column that do not
> have a valid UUID.
> >
> >         Here is an example of what we do not want to match, which is a
> UUID in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455
> >
> >         What looking for a column represented by the example column and
> the first eight characters of the UUID followed by a dash -
> >
> > Don
> >
> >
> > On Wed, Mar 18, 2020 at 11:25 PM Christopher <ct...@apache.org>
> wrote:
> >>
> >> The shell command, egrep, uses the RegExFilter[1] underneath. It
> >> supports Java regular expressions, which does support negative look
> >> ahead. So, it should be possible.
> >>
> >> However, it is possible there's some quoting issues... the shell
> >> itself uses backslash to escape, but it also uses JLine to parse
> >> output, and JLine might treat the exclamation point specially, so it
> >> might need to be escaped twice. However, this is just a guess.
> >>
> >> I would recommend trying to eliminate the shell variable, and scan
> >> using the Java API directly to test.
> >>
> >> If you can supply some examples on what you want to match, and those
> >> you don't want to match, I could probably try it myself to see if I
> >> can come up with a solution.
> >>
> >> [1]:
> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java
> >>
> >> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert <ma...@gmail.com>
> wrote:
> >> >
> >> > Hello,
> >> >
> >> >     Does the accumulo egrep command support regex negative look ahead?
> >> >
> >> >     We are trying to find all rows that do not have a UUID pattern
> using the following sample command
> >> >
> >> >    egrep -c column
> ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
> >> >
> >> >   The following egrep returns all rows that match the pattern
> >> >
> >> >   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
> >> >
> >> > Thank you,
> >> >
> >> > Don
>

Re: Does Accumulo egrep support regex negative lookahead?

Posted by Christopher <ct...@apache.org>.
I inserted some sample data and was able to use a regex to find values
that matched "itemId: " followed by a valid UUID, and using negative
look ahead, "itemId: " followed by anything other than a valid UUID.

See below:

root@uno t1> scan
a b:c []    itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc
b b:c []    itemId: 11aa22bbd33d2abcav34-11d25d334455
c b:c []    nope: 11aa22bbd33d2abcav34-11d25d334455
d b:c []    nope: 11aa22bb-d33d-2abc-av34-11d25d334455
root@uno t1> egrep '.*itemId: (?:[a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
a b:c []    itemId: 11aa22bb-d33d-e44e-f55f-6677889900cc
root@uno t1> egrep '.*itemId: (?\![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
egrep '.*itemId: (?![a-f0-9]{8}(?:-[a-f0-9]{4}){4}[a-f0-9]{8}).*'
b b:c []    itemId: 11aa22bbd33d2abcav34-11d25d334455


On Thu, Mar 19, 2020 at 8:17 AM Donald Mackert <ma...@gmail.com> wrote:
>
> Christopher,
>
>         BLUF: We want to find all rows for a given column that do not have a valid UUID.
>
>         Here is an example of what we do not want to match, which is a UUID in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455
>
>         What looking for a column represented by the example column and the first eight characters of the UUID followed by a dash -
>
> Don
>
>
> On Wed, Mar 18, 2020 at 11:25 PM Christopher <ct...@apache.org> wrote:
>>
>> The shell command, egrep, uses the RegExFilter[1] underneath. It
>> supports Java regular expressions, which does support negative look
>> ahead. So, it should be possible.
>>
>> However, it is possible there's some quoting issues... the shell
>> itself uses backslash to escape, but it also uses JLine to parse
>> output, and JLine might treat the exclamation point specially, so it
>> might need to be escaped twice. However, this is just a guess.
>>
>> I would recommend trying to eliminate the shell variable, and scan
>> using the Java API directly to test.
>>
>> If you can supply some examples on what you want to match, and those
>> you don't want to match, I could probably try it myself to see if I
>> can come up with a solution.
>>
>> [1]: https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java
>>
>> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert <ma...@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> >     Does the accumulo egrep command support regex negative look ahead?
>> >
>> >     We are trying to find all rows that do not have a UUID pattern using the following sample command
>> >
>> >    egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
>> >
>> >   The following egrep returns all rows that match the pattern
>> >
>> >   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
>> >
>> > Thank you,
>> >
>> > Don

Re: Does Accumulo egrep support regex negative lookahead?

Posted by Donald Mackert <ma...@gmail.com>.
Christopher,

        BLUF: We want to find all rows for a given column that do not have
a valid UUID.

        Here is an example of what we do not want to match, which is a UUID
in the itemId: 11aa22bb-d33d-2abc-av34-11d25d334455

        What looking for a column represented by the example column and the
first eight characters of the UUID followed by a dash -

Don


On Wed, Mar 18, 2020 at 11:25 PM Christopher <ct...@apache.org> wrote:

> The shell command, egrep, uses the RegExFilter[1] underneath. It
> supports Java regular expressions, which does support negative look
> ahead. So, it should be possible.
>
> However, it is possible there's some quoting issues... the shell
> itself uses backslash to escape, but it also uses JLine to parse
> output, and JLine might treat the exclamation point specially, so it
> might need to be escaped twice. However, this is just a guess.
>
> I would recommend trying to eliminate the shell variable, and scan
> using the Java API directly to test.
>
> If you can supply some examples on what you want to match, and those
> you don't want to match, I could probably try it myself to see if I
> can come up with a solution.
>
> [1]:
> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java
>
> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert <ma...@gmail.com> wrote:
> >
> > Hello,
> >
> >     Does the accumulo egrep command support regex negative look ahead?
> >
> >     We are trying to find all rows that do not have a UUID pattern using
> the following sample command
> >
> >    egrep -c column
> ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
> >
> >   The following egrep returns all rows that match the pattern
> >
> >   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
> >
> > Thank you,
> >
> > Don
>

Re: Does Accumulo egrep support regex negative lookahead?

Posted by Donald Mackert <ma...@gmail.com>.
Christopher,

     Thank you.  Will take a look.

Don

On Wed, Mar 18, 2020 at 11:25 PM Christopher <ct...@apache.org> wrote:

> The shell command, egrep, uses the RegExFilter[1] underneath. It
> supports Java regular expressions, which does support negative look
> ahead. So, it should be possible.
>
> However, it is possible there's some quoting issues... the shell
> itself uses backslash to escape, but it also uses JLine to parse
> output, and JLine might treat the exclamation point specially, so it
> might need to be escaped twice. However, this is just a guess.
>
> I would recommend trying to eliminate the shell variable, and scan
> using the Java API directly to test.
>
> If you can supply some examples on what you want to match, and those
> you don't want to match, I could probably try it myself to see if I
> can come up with a solution.
>
> [1]:
> https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java
>
> On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert <ma...@gmail.com> wrote:
> >
> > Hello,
> >
> >     Does the accumulo egrep command support regex negative look ahead?
> >
> >     We are trying to find all rows that do not have a UUID pattern using
> the following sample command
> >
> >    egrep -c column
> ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
> >
> >   The following egrep returns all rows that match the pattern
> >
> >   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
> >
> > Thank you,
> >
> > Don
>

Re: Does Accumulo egrep support regex negative lookahead?

Posted by Christopher <ct...@apache.org>.
The shell command, egrep, uses the RegExFilter[1] underneath. It
supports Java regular expressions, which does support negative look
ahead. So, it should be possible.

However, it is possible there's some quoting issues... the shell
itself uses backslash to escape, but it also uses JLine to parse
output, and JLine might treat the exclamation point specially, so it
might need to be escaped twice. However, this is just a guess.

I would recommend trying to eliminate the shell variable, and scan
using the Java API directly to test.

If you can supply some examples on what you want to match, and those
you don't want to match, I could probably try it myself to see if I
can come up with a solution.

[1]: https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/iterators/user/RegExFilter.java

On Wed, Mar 18, 2020 at 7:22 PM Donald Mackert <ma...@gmail.com> wrote:
>
> Hello,
>
>     Does the accumulo egrep command support regex negative look ahead?
>
>     We are trying to find all rows that do not have a UUID pattern using the following sample command
>
>    egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"((?\![0-9a-f]{8}-).)*$
>
>   The following egrep returns all rows that match the pattern
>
>   egrep -c column ^\\{\"value\"\\:\\{\"itemId\"\\:\"([0-9a-f]{8}-).*$
>
> Thank you,
>
> Don