You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@polygene.apache.org by Niclas Hedhman <ni...@hedhman.org> on 2017/07/05 07:26:43 UTC

Why I don't like regular expressions...

We have

private static final Pattern DESCRIPTOR_TEXTUAL_REGEXP = Pattern.compile(
    "^"
    + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_START ) + "(.*)"
    + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END )
    + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_START ) + "(.*)"
    + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END )
    + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_START ) + "(" + "[^"
    + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END +
DESCRIPTOR_TYPE_SEPARATOR )
    + "]+)" + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END ) + "$" );


in org.apache.polygene.index.sql.support.skeletons.AbstractSQLStartup

And in method stringToCompositeDescriptor()

Matcher matcher = DESCRIPTOR_TEXTUAL_REGEXP.matcher( str );
if( !matcher.matches() )
{
    throw new IllegalArgumentException( "Descriptor textual description " + str
                                        + " was invalid." );
}

and of course it says that it doesn't match with

{Domain Layer}{Organization Module}{interface
com.sensetif.sink.model.organization.CreditLimit,interface
org.apache.polygene.api.value.ValueComposite}


so where is the problem?


The RegExp "Pattern" prints out to

^\Q{\E(.*)\Q}\E\Q{\E(.*)\Q}\E\Q{\E([^\Q},\E]+)\Q}\E$

as if that helps...

So I realize that the \Q and \E are escaping markers, so it is basically
saying

^{(.*)}{(.*)}{([^},]+)}$

(where comma and curlies being ordinary characters)

But the third group shouldn't work at all...

* ([^},]+)*

So, that is a group of one or more characters but not comma and not end
brace...


Has this ever worked? Because a few lines later, the sequence of types are
being extracted, so it was intended to have multiple types.


I am at loss.


Cheers
-- 
Niclas Hedhman, Software Developer
http://polygene.apache.org - New Energy for Java

Re: Why I don't like regular expressions...

Posted by Niclas Hedhman <ni...@hedhman.org>.
Stan, thanks for the clarification.
It seems to happen when the indexer starts up, and somehow determines that
re-indexing is needed, and that is probably why it has not been caught
before, as we don't have tests for that AFAIK.

I will investigate this extension further. I am not sure that it is
actually doing the right thing in respect to multi-layered apps, or perhaps
that is only during the re-indexing...

Worst case; If I can't get this operational in my trial app, then I would
like to drop it from 3.0 release, not to hold that up.


Cheers
Niclas

On Wed, Jul 5, 2017 at 4:14 PM, Stanislav Muhametsin <
stanislav.muhametsin@zest.mail.kapsi.fi> wrote:

> My ancient code, I think.
> Probably a brain fart, or some "preliminary version here and I fix it
> later" and 'later' never happened. :D
> Too many years has passed to remember the actual reason.
>
> But yeah, it is definetly not working for multiple types...
> The last group should be probably:
>
> {([^},]+)(,[^},]+)*}
>
>
>
> On 05/07/2017 10:26, Niclas Hedhman wrote:
>
>> We have
>>
>> private static final Pattern DESCRIPTOR_TEXTUAL_REGEXP = Pattern.compile(
>>      "^"
>>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_START ) + "(.*)"
>>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END )
>>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_START ) + "(.*)"
>>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END )
>>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_START ) + "(" + "[^"
>>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END +
>> DESCRIPTOR_TYPE_SEPARATOR )
>>      + "]+)" + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END ) + "$"
>> );
>>
>>
>> in org.apache.polygene.index.sql.support.skeletons.AbstractSQLStartup
>>
>> And in method stringToCompositeDescriptor()
>>
>> Matcher matcher = DESCRIPTOR_TEXTUAL_REGEXP.matcher( str );
>> if( !matcher.matches() )
>> {
>>      throw new IllegalArgumentException( "Descriptor textual description
>> " + str
>>                                          + " was invalid." );
>> }
>>
>> and of course it says that it doesn't match with
>>
>> {Domain Layer}{Organization Module}{interface
>> com.sensetif.sink.model.organization.CreditLimit,interface
>> org.apache.polygene.api.value.ValueComposite}
>>
>>
>> so where is the problem?
>>
>>
>> The RegExp "Pattern" prints out to
>>
>> ^\Q{\E(.*)\Q}\E\Q{\E(.*)\Q}\E\Q{\E([^\Q},\E]+)\Q}\E$
>>
>> as if that helps...
>>
>> So I realize that the \Q and \E are escaping markers, so it is basically
>> saying
>>
>> ^{(.*)}{(.*)}{([^},]+)}$
>>
>> (where comma and curlies being ordinary characters)
>>
>> But the third group shouldn't work at all...
>>
>> * ([^},]+)*
>>
>> So, that is a group of one or more characters but not comma and not end
>> brace...
>>
>>
>> Has this ever worked? Because a few lines later, the sequence of types are
>> being extracted, so it was intended to have multiple types.
>>
>>
>> I am at loss.
>>
>>
>> Cheers
>>
>
>


-- 
Niclas Hedhman, Software Developer
http://polygene.apache.org - New Energy for Java

Re: Why I don't like regular expressions...

Posted by Stanislav Muhametsin <st...@zest.mail.kapsi.fi>.
My ancient code, I think.
Probably a brain fart, or some "preliminary version here and I fix it 
later" and 'later' never happened. :D
Too many years has passed to remember the actual reason.

But yeah, it is definetly not working for multiple types...
The last group should be probably:

{([^},]+)(,[^},]+)*}


On 05/07/2017 10:26, Niclas Hedhman wrote:
> We have
>
> private static final Pattern DESCRIPTOR_TEXTUAL_REGEXP = Pattern.compile(
>      "^"
>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_START ) + "(.*)"
>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END )
>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_START ) + "(.*)"
>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END )
>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_START ) + "(" + "[^"
>      + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END +
> DESCRIPTOR_TYPE_SEPARATOR )
>      + "]+)" + Pattern.quote( DESCRIPTOR_COMPONENT_SEPARATOR_END ) + "$" );
>
>
> in org.apache.polygene.index.sql.support.skeletons.AbstractSQLStartup
>
> And in method stringToCompositeDescriptor()
>
> Matcher matcher = DESCRIPTOR_TEXTUAL_REGEXP.matcher( str );
> if( !matcher.matches() )
> {
>      throw new IllegalArgumentException( "Descriptor textual description " + str
>                                          + " was invalid." );
> }
>
> and of course it says that it doesn't match with
>
> {Domain Layer}{Organization Module}{interface
> com.sensetif.sink.model.organization.CreditLimit,interface
> org.apache.polygene.api.value.ValueComposite}
>
>
> so where is the problem?
>
>
> The RegExp "Pattern" prints out to
>
> ^\Q{\E(.*)\Q}\E\Q{\E(.*)\Q}\E\Q{\E([^\Q},\E]+)\Q}\E$
>
> as if that helps...
>
> So I realize that the \Q and \E are escaping markers, so it is basically
> saying
>
> ^{(.*)}{(.*)}{([^},]+)}$
>
> (where comma and curlies being ordinary characters)
>
> But the third group shouldn't work at all...
>
> * ([^},]+)*
>
> So, that is a group of one or more characters but not comma and not end
> brace...
>
>
> Has this ever worked? Because a few lines later, the sequence of types are
> being extracted, so it was intended to have multiple types.
>
>
> I am at loss.
>
>
> Cheers