You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stdcxx.apache.org by Martin Sebor <se...@roguewave.com> on 2008/02/19 21:00:45 UTC

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Travis Vitek wrote:
> 
> Martin,
> 
> Thank you for the additional testcases. They point out a few issues that I
> didn't interpret from the description in the Bash Reference Manual
> [http://www.gnu.org/software/bash/manual/bashref.html#Brace-Expansion]. Note
> that below I refer to paragraphs from this documentation.

In case it wasn't clear from my comments, most of the new test cases
(all those in run_bash_tests()), including the expected output, came
from the Bash test suite.

> 
> I do have a few issues with the expectations you've laid out. Comments
> follow...
> 
> 
> sebor-2 wrote:
>> +
>> +    TEST ("foo {1,2} bar", "foo 1 2 bar");
>> +
>>
> 
> This isn't a brace expansion. It is a literal string, followed by a brace
> expansion, followed by a literal string. When you run 'echo foo {1,2} bar'
> in the shell, each of the args are brace expanded individually, so the only
> thing that is brace expanded is the '{1,2}' and everything else is written
> literally. I believe this testcase is invalid.

So rw_brace_expand() only handles a single brace expression? I.e.,
a pattern like this:

  pattern  ::= [ <preamble> ] '{' <list> | <seq-exp> '}' [ <postfix> ]
  list     ::= string [ , <list> ] | string
  seq-expr ::= <char> .. <char> | <number> .. <number>

If that's so, what's the definition of <preamble> and <postfix>?

FWIW, the grammar I suggested here http://tinyurl.com/2rs3he allows
multiple brace expressions:

  string     ::= <brace-expr> | [ <chars> ]
  brace-expr ::= <string> '{' <brace-list> '}' <string> | <string>
  brace-list ::= <string> ',' <brace-list> | <string>
  chars      ::= <pcs-char> <string> | <pcs-char>
  pcs-char   ::= character in the Portable Character Set

We don't need to follow it but if we choose not to we should document
what grammar we do follow.

> 
> 
> sebor-2 wrote:
>> +    // we don't have eval
>> +    // TEST ("`zecho foo {1,2} bar`",  "foo 1 2 bar");
>> +    // TEST ("$(zecho foo {1,2} bar)", "foo 1 2 bar");
>>
> 
> Same problem here.

I left these in place just for completeness but I don't expect us to
ever implement eval. Otherwise, I agree they're the same as the test
case above.

> 
> 
> sebor-2 wrote:
>> +#if 0   // not implemented yet
>> +
>> +    // set the three variables
>> +    rw_putenv ("var=baz:varx=vx:vary=vy");
>> +
>> +    TEST ("foo{bar,${var}.}", "foobar foobaz.");
>> +    TEST ("foo{bar,${var}}",  "foobar foobaz");
>> +
>> +    TEST ("${var}\"{x,y}",    "bazx bazy");
>> +    TEST ("$var{x,y}",        "vx vy");
>> +    TEST ("${var}{x,y}",      "bazx bazy");
>> +
>> +    // unset all three variables
>> +    rw_putenv ("var=:varx=:vary=");
>> +
>> +#endif   // 0
>>
> 
> I don't expect this functionality to ever be implemented inside
> rw_brace_expand(). As mentioned in paragraph 4, the brace expansion itself
> is done before other expansions, and it does not interpret the text between
> the braces.
> 
> Given this, I feel that the environment variable expansion must done at some
> later stage, by some other function, and the above test block is
> inappropriate for this test.

Okay. Again, I included them for completeness (I didn't want to just
completely remove some test cases), but if it makes sense to do this
expansion at a later stage in some other function that calls
rw_brace_expand() that's fine with me

> 
> 
> sebor-2 wrote:
>> +
>> +    TEST ("{1..10}", "1 2 3 4 5 6 7 8 9 10");
>> +
>>
> 
> This is a case that I should be handling. I need to go back and add complete
> support for integer ranges, specifically ranges that include multidigit
> numbers and sign.
> 
> 
> sebor-2 wrote:
>> +    // this doesn't work in Bash 3.2
>> +    // TEST ("{0..10,braces}", "0 1 2 3 4 5 6 7 8 9 10 braces");
>> +
>>
> 
> I don't know how anyone could expect this to work. The first subexpression
> of the brace expansion list is '0..10', which itself is not a brace
> expansion, so it should not be expanded. It should be left as a literal.
> This happens to be the behavior I see with Bash 3.0.

Yes, but from the comment above the test case in the braces.test file
it looks like they plan to make it work. The comments says: # this
doesn't work yet. I don't think it's an important use case and I have
no problem with not implementing it (for now).

> 
> 
> sebor-2 wrote:
>> +    // but this does
>> +    TEST ("{{0..10},braces}", "0 1 2 3 4 5 6 7 8 9 10 braces");
>> +    TEST ("x{{0..10},braces}y",
>> +          "x0y x1y x2y x3y x4y x5y x6y x7y x8y x9y x10y xbracesy");
>> +
>>
> 
> Obviously, both of these are valid versions of the previous test expression.
> 
> 
> sebor-2 wrote:
>> +    TEST ("{a..A}",
>> +          "a ` _ ^ ]  [ Z Y X W V U T S R Q P O N M L K J I H G F E D C B
>> A");
>> +    TEST ("{A..a}",
>> +          "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [  ] ^ _ `
>> a");
>> +
>>
> 
> Interesting. I didn't think it would make sense to allow mixing of lower and
> uppercase characters in the sequence expression because of the characters
> between 'Z' and 'a'. Obviously I was wrong. BTW, any idea what happened to
> ASCII 92? It is the backslash character that should appear between '[' and
> ']'.

No idea. To me this is one of the most dubious of the features since
it assumes ASCII. I wouldn't be at all upset if we didn't implement
it (at least not until we actually need it for something ;-)

> 
> 
> sebor-2 wrote:
>> +
>> +    TEST ("0{1..9} {10..20}",
>> +          "01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20");
>>
> 
> This has the same problem as the first issue I brought up. This is actually
> two seperate brace expansions, the first is '0{1..9}' and the second is
> '{10..20}'. This is how the shell handles them, and this is how I handle
> them.
> 
> If they were treated as one brace expansion by the shell, I would expect the
> postscript '{10..20}' expanded for each prefix/body expansion, much like you
> would see if you escaped the space.

I realize those are two brace expressions but I don't understand why
the function shouldn't be able to handle them as such. That's what
the expected output assumes, isn't it? I expect this to come up in
our uses of the feature so if rw_brace_expand() doesn't implement
it we'll have to implement it somewhere else. Do you see a problem
with implementing it in rw_brace_expand()?

> 
> 
> sebor-2 wrote:
>> +    // weirdly-formed brace expansions -- fixed in post-bash-3.1
>> +    TEST ("a-{b{d,e}}-c",    "a-{bd}-c a-{be}-c");
>>
> 
> I don't understand how this could be interpreted as valid brace expansion at
> all. The body of the expansion is '{b{d,e}}'. Paragraph 5 [and paragraph 1
> for that matter] require a correctly-formed brace expansion have unquoted
> [unescaped?] opening and closing braces, and at least one unquoted comma or
> a valid sequence expression. The body does not meet either of these
> requirements, so it must be invalid.
> 
> To get the result shown, the obvious thing to do is to escape the outer
> braces. This would give us the valid expression 'a-\{b{d,e}\}-c', that
> happens to also work with previous versions of bash also.
> 
> 
> sebor-2 wrote:
>> +    TEST ("a-{bdef-{g,i}-c", "a-{bdef-g-c a-{bdef-i-c");
>>
> 
> Again, this does not seem correct according to the requirements of paragraph
> 5 [and 1].

Obviously, these last two are corner cases (as the comment above
them indicates). I don't think we'll ever try to do anything as
bizarre as this. What is interesting about them is the fact that
the Bash maintainers cared enough about them to include them in
their (otherwise pretty small) test suite.

> 
> If the body is supposed to be between a pair of braces, shouldn't the first
> unescaped opening brace match the first unescaped close brace at the same
> brace depth? If it is, then the outer brace expansion isn't valid because it
> doesn't have a terminating close brace. Even if one was added, the resulting
> expression has the same problem as the previous example. The nested
> expression 'bdef-{g,i}-c' isn't a series comma-seperated strings or a
> sequence expression. 
> 
> If you wanted the first brace to be ignored, as it is in the test, then it
> should be escaped. Then we would have 'a-\{bdef-{g,i}-c'. That expression
> follows the requirements outlined in the manual, and works with old versions
> of bash, and a human can pretty easily figure out what the expected result
> would be.
> 
> Now I suppose that since invalid brace expansions are to be left unchanged,
> you could say that the first brace expansion is copied literally because it
> is invalid, but the second is valid and should be expanded. This almost
> explains how bash 3.2 gets these results, but it still seems wrong. If a
> subexpression is invalid it seems that the whole expression is invalid.
> 
> 
> sebor-2 wrote:
>> +    TEST ("{",     "{");
>> +    TEST ("}",     "}");
>> +    TEST ("{}",    "{}");
>> +    TEST ("{ }",   "{ }");
>> +    TEST ("{  }",  "{  }");   // is this right?
>>
> 
> I sure think it is. Again, the requirements say that these are not valid
> brace expansions, so they should be left unchanged. I'm wondering if the
> shell is doing some sort of whitespace collapse. Everything seems to work
> fine if you escape the spaces, so I'm thinking that is why you see the
> behavior that you do.

Yes, I believe that's exactly what it does, and I wondered if
rw_brace_expand() should do the same thing. I didn't spend too much
time contemplating whether it makes sense at this level or if it's
just a consequence of the shell tokenization.

> 
> So, with all that said, I've got a few thoughts.
> 
> 1. I don't really like the idea of trying to emulate all behavior of the
> shell in rw_brace_expand. If we want that, then we should have made a bug
> entitled 'provide a complete implementation of bash'.
> 2. I don't feel comfortable trying to maintain compatibility with version
> 3.2 of bash. It doesn't seem to follow the documented requirements, and I
> believe that the odd behavior may be difficult to implement. The bash 3.0
> implementation seems much more sane and that is what I tried to emulate when
> writing this code.
> 3. If you, er, we want to do brace expansion exactly like you see within
> bash, then we should write another function that tokenizes a string on
> whitespace and does brace expansion on each token. I was expecting the
> caller of rw_brace_expand() to expect the function to do brace expansion,
> not complete shell emulation.

Well, I thought since we were implementing a Bash feature the Bash
test cases would be useful. If we make a conscious decision to either
deviate from the Bash behavior or to not implement the features we
don't expect to use that's okay with me as long as we document what
our behavior is. One (IMO easy) way to do it is in in the test suite
in the form of test cases, with an explanation for any differences
with Bash. I tend to use the test cases for the test driver when I
need to know how something works.

Martin

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Martin Sebor <se...@roguewave.com>.

Travis Vitek wrote:
> 
[...]
> The function rw_brace_expand() does brace expansion on whatever string you
> pass. There is no special treatment for whitespace [as required by the Bash
> Reference Manual]. That is a feature of the shell. The difference is very
> significant...

Okay, that was an important point I missed. The question then is:
which is more useful? Or, if we think both are, we might want to
consider implementing both.

> 
> If you pass the string "first {1,2} last" to the shell for expansion, you
> will get "first 1 2 last" as output. The shell is brace expanding each of
> the three tokens seperately. If you brace expand this as one string, which
> you can do by escaping the spaces, you get a different result. You should
> get "first 1 last first 2 last".
> 
> I think much of the confusion we are having is because we are testing our
> rw_brace_expand() function with the bash test suite. Our function only
> implements one of the many features of bash, but the test suite verifies
> several of the bash features are working together correctly.

You're probably right. My confusion, if you want to call it that,
probably does stem from my expectation to use the function the
same way I use the shell. In my view, the whole "raison d'etre"
for the function is that we don't have a UNIX shell on Windows.
If we did, some of the features in the driver, including the
locale query, and most of those in the harness (the exec utility)
would probably be relegated to it and we wouldn't need to
(re)implement it ourselves.

I wonder if it would help us move forward to outline the major
use cases for the function. Here are the ones at the top of my
list:

  *  to help with locale matching in the locale query string, i.e.,
     as part of the mechanism behind the rw_locale_query() function

  *  to help with the matching of platforms in the expected failures
     project, i.e., again, as one of the essential pieces, along with
     rw_fnmatch(), behind an rw_match_platform() or some such function
     to match the platform string from xfail.txt
     against the actual platform

  *  to help generate input strings for some of our tests (I don't
     have a clear picture of how to use it here)

Are there any others?

In the first two cases, I envision using the function pretty much
exactly how I would use the shell + grep (e.g., to process the
xfail.txt file). In the last case, I can imagine using it a little
or maybe even a lot differently, i.e., with various extensions or
differences from how the shell behaves.

How do you expect to use it?

In all cases, I think we will get the biggest bang for our buck if
we design and implement rw_brace_expand() in a way that will make
coding the bigger features as easy as possible by handling as much
of the shared use cases in it instead of duplicating the same thing
at the outer layers. Does this make sense?

> 
> 
[...]
> I wish I could get inside the heads of the guys writing this stuff. By
> introducing such a change, they are potentially breaking existing user code,
> and there appears to be no net benefit for making such a change. As we've
> already established, it is quite easy to stick an additional set of braces
> in there.

I can't help you there, but you can get some insight by reading the
archives of the bug-bash list or ask your questions on it:
   http://lists.gnu.org/archive/html/bug-bash/
But I don't think the corner cases are terribly important to get
exactly the same as in Bash or even implemented at all, just as
long as we have a solid implementation of the common ones.

> 
[...]
> I could, but I think it would be better to implement something on top of
> rw_brace_expand() that expands each of the whitespace seperated tokens
> seperately, just like bash does. If rw_brace_expand() becomes a private
> implementation function, then so be it.

Whatever works. It all depends on how we end up using it. We haven't
spent much time talking about the top level interface yet. Maybe it
would help if we did? What do you expect the rw_locale_query()
interface will look like? And how do you envision it to be implemented
in terms of rw_brace_expand() and rw_fnmatch()? (I should probably do
the same exercise for rw_match_platform().)

> 
[...]
> Unfortunately the bash testcases rely heavily on additional behavior
> implemented in bash [environment variable expansion, whitespace collapse,
> and more. I understand the motivation, I just want to make sure we all
> understand that some of the cases don't work because we're not implementing
> bash. We're implementing a single feature of bash.

Yes, but the only way to use that feature is in conjunction with at
least some of the others, such as collapsing whitespace. I do agree
that the environment variable expansion can be done separately. I'm
not 100% convinced that it necessarily has to be, but I'm not at all
opposed to it, certainly not if it makes the implementation cleaner
and easier to maintain and enhance.

Martin

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Travis Vitek <vi...@roguewave.com>.



Martin Sebor wrote:
> 
>> 
>> sebor-2 wrote:
>>> +
>>> +    TEST ("foo {1,2} bar", "foo 1 2 bar");
>>> +
>>>
>> 
>> This isn't a brace expansion. It is a literal string, followed by a brace
>> expansion, followed by a literal string. When you run 'echo foo {1,2}
>> bar'
>> in the shell, each of the args are brace expanded individually, so the
>> only
>> thing that is brace expanded is the '{1,2}' and everything else is
>> written
>> literally. I believe this testcase is invalid.
> 
> So rw_brace_expand() only handles a single brace expression? I.e.,
> a pattern like this:
> 
>   pattern  ::= [ <preamble> ] '{' <list> | <seq-exp> '}' [ <postfix> ]
>   list     ::= string [ , <list> ] | string
>   seq-expr ::= <char> .. <char> | <number> .. <number>
> 
> If that's so, what's the definition of <preamble> and <postfix>?
> 
> FWIW, the grammar I suggested here http://tinyurl.com/2rs3he allows
> multiple brace expressions:
> 
>   string     ::= <brace-expr> | [ <chars> ]
>   brace-expr ::= <string> '{' <brace-list> '}' <string> | <string>
>   brace-list ::= <string> ',' <brace-list> | <string>
>   chars      ::= <pcs-char> <string> | <pcs-char>
>   pcs-char   ::= character in the Portable Character Set
> 
> We don't need to follow it but if we choose not to we should document
> what grammar we do follow.
> 

Neither of the definitions you provide above are complete. We implement the
second grammar with the addition of the sequence expression that is shown
above.

The function rw_brace_expand() does brace expansion on whatever string you
pass. There is no special treatment for whitespace [as required by the Bash
Reference Manual]. That is a feature of the shell. The difference is very
significant...

If you pass the string "first {1,2} last" to the shell for expansion, you
will get "first 1 2 last" as output. The shell is brace expanding each of
the three tokens seperately. If you brace expand this as one string, which
you can do by escaping the spaces, you get a different result. You should
get "first 1 last first 2 last".

I think much of the confusion we are having is because we are testing our
rw_brace_expand() function with the bash test suite. Our function only
implements one of the many features of bash, but the test suite verifies
several of the bash features are working together correctly.


Martin Sebor wrote:
> 
>> sebor-2 wrote:
>>> +    // this doesn't work in Bash 3.2
>>> +    // TEST ("{0..10,braces}", "0 1 2 3 4 5 6 7 8 9 10 braces");
>>> +
>>>
>> 
>> I don't know how anyone could expect this to work. The first
>> subexpression
>> of the brace expansion list is '0..10', which itself is not a brace
>> expansion, so it should not be expanded. It should be left as a literal.
>> This happens to be the behavior I see with Bash 3.0.
> 
> Yes, but from the comment above the test case in the braces.test file
> it looks like they plan to make it work. The comments says: # this
> doesn't work yet. I don't think it's an important use case and I have
> no problem with not implementing it (for now).
> 

I wish I could get inside the heads of the guys writing this stuff. By
introducing such a change, they are potentially breaking existing user code,
and there appears to be no net benefit for making such a change. As we've
already established, it is quite easy to stick an additional set of braces
in there.


Martin Sebor wrote:
> 
>> 
>> 
>> sebor-2 wrote:
>>> +    TEST ("{a..A}",
>>> +          "a ` _ ^ ]  [ Z Y X W V U T S R Q P O N M L K J I H G F E D C
>>> B
>>> A");
>>> +    TEST ("{A..a}",
>>> +          "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [  ] ^ _
>>> `
>>> a");
>>> +
>>>
>> 
>> Interesting. I didn't think it would make sense to allow mixing of lower
>> and
>> uppercase characters in the sequence expression because of the characters
>> between 'Z' and 'a'. Obviously I was wrong. BTW, any idea what happened
>> to
>> ASCII 92? It is the backslash character that should appear between '['
>> and
>> ']'.
> 
> No idea. To me this is one of the most dubious of the features since
> it assumes ASCII. I wouldn't be at all upset if we didn't implement
> it (at least not until we actually need it for something ;-)
> 

Okay, I'll leave it out for now.


Martin Sebor wrote:
> 
>> 
>> 
>> sebor-2 wrote:
>>> +
>>> +    TEST ("0{1..9} {10..20}",
>>> +          "01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
>>> 20");
>>>
>> 
>> This has the same problem as the first issue I brought up. This is
>> actually
>> two seperate brace expansions, the first is '0{1..9}' and the second is
>> '{10..20}'. This is how the shell handles them, and this is how I handle
>> them.
>> 
>> If they were treated as one brace expansion by the shell, I would expect
>> the
>> postscript '{10..20}' expanded for each prefix/body expansion, much like
>> you
>> would see if you escaped the space.
> 
> I realize those are two brace expressions but I don't understand why
> the function shouldn't be able to handle them as such. That's what
> the expected output assumes, isn't it? I expect this to come up in
> our uses of the feature so if rw_brace_expand() doesn't implement
> it we'll have to implement it somewhere else. Do you see a problem
> with implementing it in rw_brace_expand()?
> 

I could, but I think it would be better to implement something on top of
rw_brace_expand() that expands each of the whitespace seperated tokens
seperately, just like bash does. If rw_brace_expand() becomes a private
implementation function, then so be it.


Martin Sebor wrote:
> 
>> 
>> So, with all that said, I've got a few thoughts.
>> 
>> 1. I don't really like the idea of trying to emulate all behavior of the
>> shell in rw_brace_expand. If we want that, then we should have made a bug
>> entitled 'provide a complete implementation of bash'.
>> 2. I don't feel comfortable trying to maintain compatibility with version
>> 3.2 of bash. It doesn't seem to follow the documented requirements, and I
>> believe that the odd behavior may be difficult to implement. The bash 3.0
>> implementation seems much more sane and that is what I tried to emulate
>> when
>> writing this code.
>> 3. If you, er, we want to do brace expansion exactly like you see within
>> bash, then we should write another function that tokenizes a string on
>> whitespace and does brace expansion on each token. I was expecting the
>> caller of rw_brace_expand() to expect the function to do brace expansion,
>> not complete shell emulation.
> 
> Well, I thought since we were implementing a Bash feature the Bash
> test cases would be useful. If we make a conscious decision to either
> deviate from the Bash behavior or to not implement the features we
> don't expect to use that's okay with me as long as we document what
> our behavior is. One (IMO easy) way to do it is in in the test suite
> in the form of test cases, with an explanation for any differences
> with Bash. I tend to use the test cases for the test driver when I
> need to know how something works.
> 

Unfortunately the bash testcases rely heavily on additional behavior
implemented in bash [environment variable expansion, whitespace collapse,
and more. I understand the motivation, I just want to make sure we all
understand that some of the cases don't work because we're not implementing
bash. We're implementing a single feature of bash.

Travis

-- 
View this message in context: http://www.nabble.com/Re%3A-svn-commit%3A-r628839----stdcxx-trunk-tests-self-0.braceexp.cpp-tp15574776p15580502.html
Sent from the stdcxx-dev mailing list archive at Nabble.com.

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Martin Sebor <se...@roguewave.com>.

Mark Brown wrote:
> On 2/19/08, Martin Sebor <se...@roguewave.com> wrote:
>> Travis Vitek wrote:
>>  > sebor-2 wrote:
>>  >> +    // weirdly-formed brace expansions -- fixed in post-bash-3.1
>>  >> +    TEST ("a-{b{d,e}}-c",    "a-{bd}-c a-{be}-c");
>>  >>
>>  >
>>  > I don't understand how this could be interpreted as valid brace expansion at
>>  > all. The body of the expansion is '{b{d,e}}'. Paragraph 5 [and paragraph 1
>>  > for that matter] require a correctly-formed brace expansion have unquoted
>>  > [unescaped?] opening and closing braces, and at least one unquoted comma or
>>  > a valid sequence expression. The body does not meet either of these
>>  > requirements, so it must be invalid.
>>  >
> 
> The C-Shell that had brace expansion long before Bash did outputs
> a-bd-c a-be-c as Martin expects. It doesn't require a comma at all.

Actually, the expected output (expected by the Bash test suite, not
necessarily by me :) is "a-{bd}-c a-{be}-c"

But it looks like the difference is in how each shell treats what
I called the "brace list", i.e., the text delimited by the pair of
braces: bash requires a comma while csh does not. I.e., bash doesn't
treat "{abc}" as a brace expression while csh expands it to "abc",
and so bash treats "a-{b{d,e}}-c" as a single brace list "{d,e}"
with a preamble of "a-{b" and a postfix of "}-c"

IMO, the C shell behavior makes more sense. I don't see how treating
single-element brace lists (i.e., "{abc}" vs "{abc,def}" as errors
is useful.

The C shell behavior also happens to correspond to the grammar I
proposed earlier:

   string     ::= <brace-expr> | [ <chars> ]
   brace-expr ::= <string> '{' <brace-list> '}' <string> | <string>
   brace-list ::= <string> ',' <brace-list> | <string>
   chars      ::= <pcs-char> <string> | <pcs-char>
   pcs-char   ::= character in the Portable Character Set

That being said, I downloaded and installed zsh which also supports
brace expansion. It behaves the same way as Bash, i.e., it requires
at least one comma to recognize a brace expression. Consequently, it
expands "a-{b{d,e}}-c" the same way Bash does, i.e., it produces
"a-{bd}-c a-{be}-c"

> 
>>  >
>>  >
>>  > sebor-2 wrote:
>>  >> +    TEST ("a-{bdef-{g,i}-c", "a-{bdef-g-c a-{bdef-i-c");
>>  >>
>>  >
>>  > Again, this does not seem correct according to the requirements of paragraph
>>  > 5 [and 1].
> 
> The C-Shell complains about a missing brace in this expression.

Here, Zsh produces the output expected by the Bash test suite, i.e.,
"a-{bdef-g-c a-{bdef-i-c"

But I think this one is just too weird and should be left unspecified.

Martin

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Martin Sebor <se...@roguewave.com>.

Travis Vitek wrote:
>  
> 
>> Martin Sebor wrote:
>>
>>
>> I agree. I just realized that the shell also allows spaces in brace
>> expansions, they just need to be escaped:
>>
>>     $ echo a{b,  }c " | " x{y,'  '}z
>>     a{b, }c  |  xyz x  z
>>
>>
> 
> You must be using bash, because csh pukes when brace expanding the token
> "a{b" because it has an unmatched unescaped open brace.

I think I was actually using zsh, but yes, the first one is
another one of those corner cases. The point of the example
was the second expansion with the embedded space, which
should be handled the same by all shells (with the space
being treated as a <string>).

Martin

RE: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Travis Vitek <Tr...@roguewave.com>.

>Martin Sebor wrote:
>
>
>I agree. I just realized that the shell also allows spaces in brace
>expansions, they just need to be escaped:
>
>     $ echo a{b,  }c " | " x{y,'  '}z
>     a{b, }c  |  xyz x  z
>
>

You must be using bash, because csh pukes when brace expanding the token
"a{b" because it has an unmatched unescaped open brace.

Travis

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Martin Sebor <se...@roguewave.com>.

Travis Vitek wrote:
>  
> 
>> Martin Sebor wrote:
>>
>> Travis Vitek wrote:
>>>  
>> [...]
>>> I'm honestly leaning toward implementing the behavior of 
>> either csh or
>>> ksh, and adding support for bash style sequences.
>> This sounds perfectly reasonable to me. IMO, these are all corner
>> cases that I suspect we're unlikely to run into in our limited uses
>> of the function. (As much fun as they are to talk about.)
>>
>> The one question that we do need to decide is how the function should
>> deal with whitespace. I.e., whether <pcs-char> in the grammar includes
>> the set of whitespace characters or whether they are treated as special
>> delimiters.
>>
>> Martin
>>
> 
> Unless there are objections, I'm leaving rw_brace_expand() as-is. That
> means that <pcs-char> includes all whitespace. I plan on adding a new
> function rw_shell_expand() that will tokenize the input string on
> whitespace and apply rw_brace_expand() to each whitespace delimited
> token and gather the results into a single buffer.

I agree. I just realized that the shell also allows spaces in brace
expansions, they just need to be escaped:

     $ echo a{b,  }c " | " x{y,'  '}z
     a{b, }c  |  xyz x  z

> 
> At some point in the future, I'd expect to see us implement
> rw_variable_expand() that would do bash shell variable expansion on an
> input string. When that happens, rw_shell_expand() would be augmented to
> apply variable expansion to the brace expanded string.
> 
> This allows for easy seperation of concerns, and should make testing and
> maintenance a bit simpler.

Sounds good.

Martin

RE: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Travis Vitek <Tr...@roguewave.com>.

>Martin Sebor wrote:
>
>Travis Vitek wrote:
>>  
>[...]
>> I'm honestly leaning toward implementing the behavior of 
>either csh or
>> ksh, and adding support for bash style sequences.
>
>This sounds perfectly reasonable to me. IMO, these are all corner
>cases that I suspect we're unlikely to run into in our limited uses
>of the function. (As much fun as they are to talk about.)
>
>The one question that we do need to decide is how the function should
>deal with whitespace. I.e., whether <pcs-char> in the grammar includes
>the set of whitespace characters or whether they are treated as special
>delimiters.
>
>Martin
>

Unless there are objections, I'm leaving rw_brace_expand() as-is. That
means that <pcs-char> includes all whitespace. I plan on adding a new
function rw_shell_expand() that will tokenize the input string on
whitespace and apply rw_brace_expand() to each whitespace delimited
token and gather the results into a single buffer.

At some point in the future, I'd expect to see us implement
rw_variable_expand() that would do bash shell variable expansion on an
input string. When that happens, rw_shell_expand() would be augmented to
apply variable expansion to the brace expanded string.

This allows for easy seperation of concerns, and should make testing and
maintenance a bit simpler.

Travis

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Martin Sebor <se...@roguewave.com>.

Travis Vitek wrote:
>  
[...]
> I'm honestly leaning toward implementing the behavior of either csh or
> ksh, and adding support for bash style sequences.

This sounds perfectly reasonable to me. IMO, these are all corner
cases that I suspect we're unlikely to run into in our limited uses
of the function. (As much fun as they are to talk about.)

The one question that we do need to decide is how the function should
deal with whitespace. I.e., whether <pcs-char> in the grammar includes
the set of whitespace characters or whether they are treated as special
delimiters.

Martin

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Mark Brown <ma...@gmail.com>.

On 2/21/08, Travis Vitek <Tr...@roguewave.com> wrote:
>
>
>  >Mark Brown wrote:
>  >
>  >On 2/19/08, Martin Sebor <se...@roguewave.com> wrote:
>  >>Travis Vitek wrote:
>  >>> sebor-2 wrote:
>  >>>> +    // weirdly-formed brace expansions -- fixed in post-bash-3.1
>  >>>> +    TEST ("a-{b{d,e}}-c",    "a-{bd}-c a-{be}-c");
>  >>>>
>  >>>
>  >>> I don't understand how this could be interpreted as valid
>  >>> brace expansion at all. The body of the expansion is '{b{d,e}}'.
>  >>> Paragraph 5 [and paragraph 1 for that matter] require a
>  >>> correctly-formed brace expansion have unquoted [unescaped?]
>  >>> opening and closing braces, and at least one unquoted comma or
>  >>> a valid sequence expression. The body does not meet either of
>  >>> these requirements, so it must be invalid.
>  >>>
>  >
>  >The C-Shell that had brace expansion long before Bash did outputs
>  >a-bd-c a-be-c as Martin expects. It doesn't require a comma at all.
>
>
> Yes, but "a-bd-c a-be-c" is very different from "a-{bd}-c a-{be}-c",
>  which the test expects.

Mea culpa! My eyesight must be going. I completely overlooked the braces.

>
>  Many of the shells implement brace expansion in one way or another. One
>  problem that I see with bash is that the documentation appears to be out
>  of date or incomplete. The man pages [and the reference manual]
>  explicitly say...
>
>     A correctly-formed brace expansion must contain unquoted opening
>
>     and closing braces, and at least one unquoted comma or a valid
>
>     sequence expression. Any incorrectly formed brace expansion is
>     left unchanged.

According to the Bash FAQ this is supposed to be the only difference.
http://www.unixguide.net/unix/bash/D2.shtml

-- Mark

RE: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Travis Vitek <Tr...@roguewave.com>.

>Mark Brown wrote:
>
>On 2/19/08, Martin Sebor <se...@roguewave.com> wrote:
>>Travis Vitek wrote:
>>> sebor-2 wrote:
>>>> +    // weirdly-formed brace expansions -- fixed in post-bash-3.1
>>>> +    TEST ("a-{b{d,e}}-c",    "a-{bd}-c a-{be}-c");
>>>>
>>>
>>> I don't understand how this could be interpreted as valid 
>>> brace expansion at all. The body of the expansion is '{b{d,e}}'.
>>> Paragraph 5 [and paragraph 1 for that matter] require a
>>> correctly-formed brace expansion have unquoted [unescaped?]
>>> opening and closing braces, and at least one unquoted comma or
>>> a valid sequence expression. The body does not meet either of
>>> these requirements, so it must be invalid.
>>>
>
>The C-Shell that had brace expansion long before Bash did outputs
>a-bd-c a-be-c as Martin expects. It doesn't require a comma at all.

Yes, but "a-bd-c a-be-c" is very different from "a-{bd}-c a-{be}-c",
which the test expects.

Many of the shells implement brace expansion in one way or another. One
problem that I see with bash is that the documentation appears to be out
of date or incomplete. The man pages [and the reference manual]
explicitly say...

    A correctly-formed brace expansion must contain unquoted opening
    and closing braces, and at least one unquoted comma or a valid
    sequence expression. Any incorrectly formed brace expansion is
    left unchanged.

The above example clearly violates the behavior that one would expect
after reading the documentation.

Not only that, but the bash behavior is inconsistent with similar
testcases. Here is the output of a few common shells...

    [csh-6.13.00]$ echo a-{b}-c
    a-b-c
    [csh-6.13.00]$ echo a-{b{d,e}}-c
    a-bd-c a-be-c

    [ksh-5.2.14]$ echo a-{b}-c
    a-{b}-c
    [ksh-5.2.14]$ echo a-{b{d,e}}-c
    a-{b{d,e}}-c

    [bash-3.00.15]$ echo a-{b}-c
    a-{b}-c
    [bash-3.00.15]$ echo a-{b{d,e}}-c
    a-bd-c a-be-c

The csh shell consistently expands the contents of the brace list
regardless of the number of subexpressions in the list, and the ksh
shell consistently rejects brace lists that don't have more than one
element. Finally, the bash shell swings both ways, and newer versions of
bash have even weirder behavior as they appear to expand the nested
subexpression, and leave the outer braces in place for one reason or
another.

I'm honestly leaning toward implementing the behavior of either csh or
ksh, and adding support for bash style sequences.

The problem isn't that I can't implement one or the other. Well,
honestly I'd like to avoid implementing the bash 3.2 behavior at all,
but that is another issue. The problem is that our measuring stick
[bash] is inconsistent, and it appears to be changing with every
version.

Travis

Re: svn commit: r628839 - /stdcxx/trunk/tests/self/0.braceexp.cpp

Posted by Mark Brown <ma...@gmail.com>.

On 2/19/08, Martin Sebor <se...@roguewave.com> wrote:
> Travis Vitek wrote:
>  > sebor-2 wrote:
>  >> +    // weirdly-formed brace expansions -- fixed in post-bash-3.1
>  >> +    TEST ("a-{b{d,e}}-c",    "a-{bd}-c a-{be}-c");
>  >>
>  >
>  > I don't understand how this could be interpreted as valid brace expansion at
>  > all. The body of the expansion is '{b{d,e}}'. Paragraph 5 [and paragraph 1
>  > for that matter] require a correctly-formed brace expansion have unquoted
>  > [unescaped?] opening and closing braces, and at least one unquoted comma or
>  > a valid sequence expression. The body does not meet either of these
>  > requirements, so it must be invalid.
>  >

The C-Shell that had brace expansion long before Bash did outputs
a-bd-c a-be-c as Martin expects. It doesn't require a comma at all.

>  >
>  >
>  > sebor-2 wrote:
>  >> +    TEST ("a-{bdef-{g,i}-c", "a-{bdef-g-c a-{bdef-i-c");
>  >>
>  >
>  > Again, this does not seem correct according to the requirements of paragraph
>  > 5 [and 1].

The C-Shell complains about a missing brace in this expression.

-- Mark