You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Kristopher Kane <kk...@gmail.com> on 2012/07/27 19:02:06 UTC

egrep usage - 1.3.4

All,

Trying to use egrep in the shell on v1.3.4 - help says it is a java format
regex but what does that mean from the shell?  egrep "asdf" -c ... doesn't
match anything whilst grep "asdf" -c ... does.  What's required for the
regex argument?

Thanks,

-Kris

Re: egrep usage - 1.3.4

Posted by Keith Turner <ke...@deenlo.com>.
I think you need to do egrep .*adsf.*    to match the substring

On Fri, Jul 27, 2012 at 1:02 PM, Kristopher Kane <kk...@gmail.com> wrote:
> All,
>
> Trying to use egrep in the shell on v1.3.4 - help says it is a java format
> regex but what does that mean from the shell?  egrep "asdf" -c ... doesn't
> match anything whilst grep "asdf" -c ... does.  What's required for the
> regex argument?
>
> Thanks,
>
> -Kris

Re: egrep usage - 1.3.4

Posted by Kristopher Kane <kk...@gmail.com>.
I got it to work, but some interesting things.

My RHEL egrep will return for characters that exist after the match
automatically

For example:  Matching for "2009"  on "2009000000000" returns in RHEL but
does not in the shell.

To match the same in the shell I had to add \\d* for the remaining
characters.  Note the escaping of the meta character.

Is that just a Java regex standard?

-Kris

On Fri, Jul 27, 2012 at 1:20 PM, Kristopher Kane <kk...@gmail.com>wrote:

> The below was sent without qualification.
>
> I'm using egrep on RHEL 5 with a text file containing dates which is
> matching the way I want.  Trying to paste into the shell which didn't
> return. Still working on it as I made an initial mistake.
>
> -Kris
>
>
> On Fri, Jul 27, 2012 at 1:02 PM, Kristopher Kane <kk...@gmail.com>wrote:
>
>> All,
>>
>> Trying to use egrep in the shell on v1.3.4 - help says it is a java
>> format regex but what does that mean from the shell?  egrep "asdf" -c ...
>> doesn't match anything whilst grep "asdf" -c ... does.  What's required for
>> the regex argument?
>>
>> Thanks,
>>
>> -Kris
>>
>
>

Re: egrep usage - 1.3.4

Posted by Kristopher Kane <kk...@gmail.com>.
The below was sent without qualification.

I'm using egrep on RHEL 5 with a text file containing dates which is
matching the way I want.  Trying to paste into the shell which didn't
return. Still working on it as I made an initial mistake.

-Kris

On Fri, Jul 27, 2012 at 1:02 PM, Kristopher Kane <kk...@gmail.com>wrote:

> All,
>
> Trying to use egrep in the shell on v1.3.4 - help says it is a java format
> regex but what does that mean from the shell?  egrep "asdf" -c ... doesn't
> match anything whilst grep "asdf" -c ... does.  What's required for the
> regex argument?
>
> Thanks,
>
> -Kris
>

Re: egrep usage - 1.3.4

Posted by Christopher Tubbs <ct...@gmail.com>.
+1 for "-g" / "--global" option.

--L


On Mon, Aug 6, 2012 at 5:37 PM, David Medinets <da...@gmail.com> wrote:
> +1 to add an option instead of using egrep2.
>
> On Mon, Aug 6, 2012 at 3:41 PM, Keith Turner <ke...@deenlo.com> wrote:
>> Instead of a new command, we could add an option to the egrep command,
>> like -f.  When the -f option is present it will set the option on the
>> RegExFilter to use find().

Re: egrep usage - 1.3.4

Posted by David Medinets <da...@gmail.com>.
+1 to add an option instead of using egrep2.

On Mon, Aug 6, 2012 at 3:41 PM, Keith Turner <ke...@deenlo.com> wrote:
> Instead of a new command, we could add an option to the egrep command,
> like -f.  When the -f option is present it will set the option on the
> RegExFilter to use find().

Re: egrep usage - 1.3.4

Posted by Keith Turner <ke...@deenlo.com>.
On Mon, Aug 6, 2012 at 3:13 PM, John Vines <vi...@apache.org> wrote:
> Yeah, that was the case I thought of as well. However, I think it would be
> worthwhile to support the improved behavior. Unfortunately, I'm stuck on
> trying to think of a better command for it, since egrep itself is the
> appropriate command and we just have a bit of a misnomer.
>
> I hate this convention, but one option is to introduce egrep2 which is the
> improved behavior, and then put in warning messages informing users that the
> egrep command will be superceded by the egrep2 functionality in the
> following release. Or we could just stick with the two egrep commands in
> perpetuity.

I was mainly thinking of the iterator when thinking of preserving
behavior, because its used by code.   An option could be added to the
RegExFilter to support find().

If you assume that just people use the egrep command in the shell,
then it may be ok to change its behavior because a person could adapt.
 However, this is probably a poor assumption.  I try to think of the
shell as part of the public API.  Scripts could call the egrep
command, and scripts would not automatically adapt to a change in
behavior.  Also this would make it hard to use the same script that
uses egrep against Accumulo 1.4 and 1.5.

Instead of a new command, we could add an option to the egrep command,
like -f.  When the -f option is present it will set the option on the
RegExFilter to use find().

>
> John
>
>
> On Mon, Aug 6, 2012 at 3:01 PM, Michael Flester <fl...@gmail.com> wrote:
>>
>>
>>
>> You are right. I had inadvertently constrained my thinking
>> to patterns of the form match(".*{x}.*") == find(".*{x}.*") == find("{x}")
>> but that isn't everything someone
>> might be using it for.
>>
>>
>>
>> On Mon, Aug 6, 2012 at 9:26 AM, Keith Turner <ke...@deenlo.com> wrote:
>>>
>>> I was thinking find() will select everything that match() does and
>>> more.  So it may return data that someone used to the current behavior
>>> is not expecting, which could break existing code that uses it.   For
>>> example ".*foo" would select "cfooa" with find() but not with match().
>>>
>>> On Sun, Aug 5, 2012 at 7:16 PM, Michael Flester <fl...@gmail.com>
>>> wrote:
>>> > Keith --
>>> >
>>> > Switching from match to find should be no change for anyone that is
>>> > currently using it.
>>> > All patterns that "match" will equally "find". But new users would be
>>> > able
>>> > to take advantage
>>> > of not adding the wildcards on both ends.
>>> >
>>> > Mike
>>> >
>>> >
>>> > On Tue, Jul 31, 2012 at 11:21 AM, Keith Turner <ke...@deenlo.com>
>>> > wrote:
>>> >>
>>> >> On Sun, Jul 29, 2012 at 9:47 PM, Michael Flester <fl...@gmail.com>
>>> >> wrote:
>>> >> >
>>> >> >
>>> >> > On Sat, Jul 28, 2012 at 7:57 PM, John Vines <vi...@apache.org>
>>> >> > wrote:
>>> >> >>
>>> >> >> And when dealing with java, it does full matches, so adding the .*
>>> >> >> to
>>> >> >> start and end is necessary.
>>> >> >>
>>> >> >
>>> >> > Java has both Matcher#matches and Matcher#find. The latter would
>>> >> > operate
>>> >> > more
>>> >> > like the egrep(1) command without requiring the wildcards on both
>>> >> > ends.
>>> >>
>>> >> Ah, It should have used the find() call when it was first written.
>>> >> Changing it now would be tricky because people who expect the current
>>> >> behavior could get unexpected results.  I think we are kinda stuck
>>> >> with the current behavior.   Could possibly add an option to use
>>> >> find() instead of match().
>>> >
>>> >
>>
>>
>

Re: egrep usage - 1.3.4

Posted by John Vines <vi...@apache.org>.
Yeah, that was the case I thought of as well. However, I think it would be
worthwhile to support the improved behavior. Unfortunately, I'm stuck on
trying to think of a better command for it, since egrep itself is the
appropriate command and we just have a bit of a misnomer.

I hate this convention, but one option is to introduce egrep2 which is the
improved behavior, and then put in warning messages informing users that
the egrep command will be superceded by the egrep2 functionality in the
following release. Or we could just stick with the two egrep commands in
perpetuity.

John

On Mon, Aug 6, 2012 at 3:01 PM, Michael Flester <fl...@gmail.com> wrote:

>
>
> You are right. I had inadvertently constrained my thinking
> to patterns of the form match(".*{x}.*") == find(".*{x}.*") == find("{x}")
>
> but that isn't everything someone
> might be using it for.
>
>
>
> On Mon, Aug 6, 2012 at 9:26 AM, Keith Turner <ke...@deenlo.com> wrote:
>
>> I was thinking find() will select everything that match() does and
>> more.  So it may return data that someone used to the current behavior
>> is not expecting, which could break existing code that uses it.   For
>> example ".*foo" would select "cfooa" with find() but not with match().
>>
>> On Sun, Aug 5, 2012 at 7:16 PM, Michael Flester <fl...@gmail.com>
>> wrote:
>> > Keith --
>> >
>> > Switching from match to find should be no change for anyone that is
>> > currently using it.
>> > All patterns that "match" will equally "find". But new users would be
>> able
>> > to take advantage
>> > of not adding the wildcards on both ends.
>> >
>> > Mike
>> >
>> >
>> > On Tue, Jul 31, 2012 at 11:21 AM, Keith Turner <ke...@deenlo.com>
>> wrote:
>> >>
>> >> On Sun, Jul 29, 2012 at 9:47 PM, Michael Flester <fl...@gmail.com>
>> >> wrote:
>> >> >
>> >> >
>> >> > On Sat, Jul 28, 2012 at 7:57 PM, John Vines <vi...@apache.org>
>> wrote:
>> >> >>
>> >> >> And when dealing with java, it does full matches, so adding the .*
>> to
>> >> >> start and end is necessary.
>> >> >>
>> >> >
>> >> > Java has both Matcher#matches and Matcher#find. The latter would
>> operate
>> >> > more
>> >> > like the egrep(1) command without requiring the wildcards on both
>> ends.
>> >>
>> >> Ah, It should have used the find() call when it was first written.
>> >> Changing it now would be tricky because people who expect the current
>> >> behavior could get unexpected results.  I think we are kinda stuck
>> >> with the current behavior.   Could possibly add an option to use
>> >> find() instead of match().
>> >
>> >
>>
>
>

Re: egrep usage - 1.3.4

Posted by Michael Flester <fl...@gmail.com>.
You are right. I had inadvertently constrained my thinking
to patterns of the form match(".*{x}.*") == find(".*{x}.*") == find("{x}")
but that isn't everything someone
might be using it for.



On Mon, Aug 6, 2012 at 9:26 AM, Keith Turner <ke...@deenlo.com> wrote:

> I was thinking find() will select everything that match() does and
> more.  So it may return data that someone used to the current behavior
> is not expecting, which could break existing code that uses it.   For
> example ".*foo" would select "cfooa" with find() but not with match().
>
> On Sun, Aug 5, 2012 at 7:16 PM, Michael Flester <fl...@gmail.com> wrote:
> > Keith --
> >
> > Switching from match to find should be no change for anyone that is
> > currently using it.
> > All patterns that "match" will equally "find". But new users would be
> able
> > to take advantage
> > of not adding the wildcards on both ends.
> >
> > Mike
> >
> >
> > On Tue, Jul 31, 2012 at 11:21 AM, Keith Turner <ke...@deenlo.com> wrote:
> >>
> >> On Sun, Jul 29, 2012 at 9:47 PM, Michael Flester <fl...@gmail.com>
> >> wrote:
> >> >
> >> >
> >> > On Sat, Jul 28, 2012 at 7:57 PM, John Vines <vi...@apache.org> wrote:
> >> >>
> >> >> And when dealing with java, it does full matches, so adding the .* to
> >> >> start and end is necessary.
> >> >>
> >> >
> >> > Java has both Matcher#matches and Matcher#find. The latter would
> operate
> >> > more
> >> > like the egrep(1) command without requiring the wildcards on both
> ends.
> >>
> >> Ah, It should have used the find() call when it was first written.
> >> Changing it now would be tricky because people who expect the current
> >> behavior could get unexpected results.  I think we are kinda stuck
> >> with the current behavior.   Could possibly add an option to use
> >> find() instead of match().
> >
> >
>

Re: egrep usage - 1.3.4

Posted by Keith Turner <ke...@deenlo.com>.
I was thinking find() will select everything that match() does and
more.  So it may return data that someone used to the current behavior
is not expecting, which could break existing code that uses it.   For
example ".*foo" would select "cfooa" with find() but not with match().

On Sun, Aug 5, 2012 at 7:16 PM, Michael Flester <fl...@gmail.com> wrote:
> Keith --
>
> Switching from match to find should be no change for anyone that is
> currently using it.
> All patterns that "match" will equally "find". But new users would be able
> to take advantage
> of not adding the wildcards on both ends.
>
> Mike
>
>
> On Tue, Jul 31, 2012 at 11:21 AM, Keith Turner <ke...@deenlo.com> wrote:
>>
>> On Sun, Jul 29, 2012 at 9:47 PM, Michael Flester <fl...@gmail.com>
>> wrote:
>> >
>> >
>> > On Sat, Jul 28, 2012 at 7:57 PM, John Vines <vi...@apache.org> wrote:
>> >>
>> >> And when dealing with java, it does full matches, so adding the .* to
>> >> start and end is necessary.
>> >>
>> >
>> > Java has both Matcher#matches and Matcher#find. The latter would operate
>> > more
>> > like the egrep(1) command without requiring the wildcards on both ends.
>>
>> Ah, It should have used the find() call when it was first written.
>> Changing it now would be tricky because people who expect the current
>> behavior could get unexpected results.  I think we are kinda stuck
>> with the current behavior.   Could possibly add an option to use
>> find() instead of match().
>
>

Re: egrep usage - 1.3.4

Posted by Michael Flester <fl...@gmail.com>.
Keith --

Switching from match to find should be no change for anyone that is
currently using it.
All patterns that "match" will equally "find". But new users would be able
to take advantage
of not adding the wildcards on both ends.

Mike


On Tue, Jul 31, 2012 at 11:21 AM, Keith Turner <ke...@deenlo.com> wrote:

> On Sun, Jul 29, 2012 at 9:47 PM, Michael Flester <fl...@gmail.com>
> wrote:
> >
> >
> > On Sat, Jul 28, 2012 at 7:57 PM, John Vines <vi...@apache.org> wrote:
> >>
> >> And when dealing with java, it does full matches, so adding the .* to
> >> start and end is necessary.
> >>
> >
> > Java has both Matcher#matches and Matcher#find. The latter would operate
> > more
> > like the egrep(1) command without requiring the wildcards on both ends.
>
> Ah, It should have used the find() call when it was first written.
> Changing it now would be tricky because people who expect the current
> behavior could get unexpected results.  I think we are kinda stuck
> with the current behavior.   Could possibly add an option to use
> find() instead of match().
>

Re: egrep usage - 1.3.4

Posted by Keith Turner <ke...@deenlo.com>.
On Sun, Jul 29, 2012 at 9:47 PM, Michael Flester <fl...@gmail.com> wrote:
>
>
> On Sat, Jul 28, 2012 at 7:57 PM, John Vines <vi...@apache.org> wrote:
>>
>> And when dealing with java, it does full matches, so adding the .* to
>> start and end is necessary.
>>
>
> Java has both Matcher#matches and Matcher#find. The latter would operate
> more
> like the egrep(1) command without requiring the wildcards on both ends.

Ah, It should have used the find() call when it was first written.
Changing it now would be tricky because people who expect the current
behavior could get unexpected results.  I think we are kinda stuck
with the current behavior.   Could possibly add an option to use
find() instead of match().

Re: egrep usage - 1.3.4

Posted by Michael Flester <fl...@gmail.com>.
On Sat, Jul 28, 2012 at 7:57 PM, John Vines <vi...@apache.org> wrote:

> And when dealing with java, it does full matches, so adding the .* to
> start and end is necessary.
>
>
Java has both Matcher#matches and Matcher#find. The latter would operate
more
like the egrep(1) command without requiring the wildcards on both ends.

Re: egrep usage - 1.3.4

Posted by John Vines <vi...@apache.org>.
What Keith said. Grep is implemented as a simple string.contains whereas an
egrep will take the argument and compile it with Pattern.compile(arg). And
when dealing with java, it does full matches, so adding the .* to start and
end is necessary.

John

On Fri, Jul 27, 2012 at 2:20 PM, Keith Turner <ke...@deenlo.com> wrote:

> I think you need to do egrep .*adsf.*    to match the substring
>
> On Fri, Jul 27, 2012 at 1:02 PM, Kristopher Kane <kk...@gmail.com>
> wrote:
> > All,
> >
> > Trying to use egrep in the shell on v1.3.4 - help says it is a java
> format
> > regex but what does that mean from the shell?  egrep "asdf" -c ...
> doesn't
> > match anything whilst grep "asdf" -c ... does.  What's required for the
> > regex argument?
> >
> > Thanks,
> >
> > -Kris
>