You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cayenne.apache.org by Davide Vecchi <dv...@amc.dk> on 2014/12/01 11:10:56 UTC

RE: Extracting tokens from an expression and matching an object against that expression without parsing twice

Just for the record, I solved my problem by keeping the first step of creating the tokens, and in the second step (matching objects against that expression) I use the existing tokens to do the match myself if the expression is simple enough (basically if it doesn't contain parentheses grouping) otherwise I still call Expression.fromString which will recreate the tokens.

The execution time of the test suites in the Jenkins build went from ~45' back to the usual ~20', so in this specific case avoiding the double parsing was a substantial optimization.

> Yeah, I still don't understand why would the code care to poke inside the parser and deal directly with tokens.

I had not explained that.
As I said, the design of the application I'm modifying was already based on tokens and I was not supposed to redesign the application; I was just asked to improve the parsing, which in my opinion makes a lot of sense, whether this is right or wrong from one's perspective.

I don't think it should necessarily be considered wrong that an application that parses expressions also wants to f.ex. show or store the resulting tokens, especially without knowing the purpose of the application.
 
However I accept that although the token-related methods I'm calling are public (ExpressionParser.getToken(int) and ExpressionParser.getNextToken() ) they were not intended to be called by an application and I probably didn't read the Cayenne doc well enough so I didn't realize that soon enough. Next time I need just parsing and matching I will not use Cayenne which I realized is intended for a much wider purpose than that. But in this case I will keep the Cayenne-based solution because it's doing the job very well.



-----Original Message-----
From: Andrus Adamchik [mailto:andrus@objectstyle.org] 
Sent: Monday, November 17, 2014 13:13
To: user@cayenne.apache.org
Subject: Re: Extracting tokens from an expression and matching an object against that expression without parsing twice

> It's not easy to explain properly why I need the tokens; the general reason is that the preexisting application, written long ago by several other persons, is designed to use them, and changing its design would be too big an undertaking.

Yeah, I still don't understand why would the code care to poke inside the parser and deal directly with tokens.

> I will see if I can use Andrus' pointers to extract the tokens from the Expression instance.

I am afraid you won't find any *tokens* in an Expression instance. Expression is just a tree of objects that can be used to evaluate stuff. If you need it to match something, you can. But a parsed expression is devoid of any links to the original lexical structure. 

Andrus



> On Nov 17, 2014, at 11:46 AM, Davide Vecchi <dv...@amc.dk> wrote:
> 
> Thanks for your inputs.
> 
> I'm probably showing my technological age here, but I certainly admit that I have this tendency to avoid repeating complex operations as a matter of principle when it's known in advance that the second process will produce exactly the same result as the first one. When I catch myself doing that I always feel that my design is not OK.
> 
> However in this case I am quite sure I need to get rid of the double parsing, although I did not demonstrate in a particularly strict way that that's the cause of the slowdown. It's more like a qualified (in my opinion) guess, reinforced by the fact that method Expression.fromString(String) has a TODO saying "TODO: cache expression strings, since this operation is pretty slow" (I'm using version 3.0.2). So it looks like the Cayenne coders too had reasons to worry to some extent about optimization in this area.
> 
> I just used JVisualVM to profile the execution and two of the methods where by far most of the time is spent are Expression.fromString(String) and ExpressionParser.getNextToken() . Since I have to cut down the processing time I do have to focus on them first.
> 
> The situation here is that I modified a preexisting application which was doing some basic parsing, and after creating the tokens from the parsing it was using them to match the expression against objects. That parsing is basic in that it can only parse simple expressions, f.ex. it doesn't support parentheses grouping.
> 
> My changes consisted of removing that parsing code from the application and replacing it with calls to Cayenne, because we need real parsing. Of course the parsing done by Cayenne is way more powerful and that might be the real and fair reason why it takes longer, but even if this is the case it's important for me not to do that parsing twice.
> 
> It's not easy to explain properly why I need the tokens; the general reason is that the preexisting application, written long ago by several other persons, is designed to use them, and changing its design would be too big an undertaking. Since all that needs to be improved is the parsing and matching I thought I'd just use a powerful tool to replace only those parts.
> 
> I will see if I can use Andrus' pointers to extract the tokens from the Expression instance.
> 
> 
> 
> -----Original Message-----
> From: Andrus Adamchik [mailto:andrus@objectstyle.org]
> Sent: Sunday, November 16, 2014 14:57
> To: user@cayenne.apache.org
> Subject: Re: Extracting tokens from an expression and matching an 
> object against that expression without parsing twice
> 
> I second John's assessment. 
> 
> BTW, what are the tokens for? Do you actually need to have access to the lexical structure of the String? As of course parsed Expression object is a tree itself and gives you access to its own structure either directly ('getOperand(int)') or via 'traverse' and 'transform' methods.
> 
> Andrus
> 
>> On Nov 14, 2014, at 9:54 PM, John Huss <jo...@gmail.com> wrote:
>> 
>> This looks like a serious micro optimization.  Is the performance for 
>> this really that critical?  Have you demonstrated that this is your 
>> application's crucial hot spot?
>> 
>> On Fri, Nov 14, 2014 at 7:35 AM, Davide Vecchi <dv...@amc.dk> wrote:
>> 
>>> Hi all,
>>> 
>>> I have an expression in a string, and I use Cayenne to parse the 
>>> expression into tokens, which are needed for a specific purpose.
>>> 
>>> However in addition to having the tokens I also need to evaluate an 
>>> object against that expression, to see if that object matches the expression.
>>> 
>>> My problem is that the way I'm doing it causes the parsing to be 
>>> done twice on the same expression, and I would like to avoid to 
>>> parse the same expression twice.
>>> 
>>> The token creation I'm doing it like this:
>>> 
>>> -----------------------------------
>>> String where = "myField=0";
>>> 
>>> Reader reader = new StringReader(where);
>>> 
>>> ExpressionParser parser = new ExpressionParser(reader);
>>> 
>>> List<Token> tokens = new ArrayList<>();
>>> 
>>> Token token = parser.getNextToken();
>>> 
>>> while (token != null) {
>>> 
>>>    tokens.add(token);
>>> 
>>>    token = parser.getNextToken();
>>> }
>>> -----------------------------------
>>> 
>>> The object matching I'm doing it like this:
>>> 
>>> -----------------------------------
>>> String where = "myField=0";
>>> 
>>> Expression expression = Expression.fromString(where);
>>> 
>>> boolean matches = expression.match(object);
>>> -----------------------------------
>>> 
>>> The call to Expression.fromString made in the object matching 
>>> operation performs a parsing, but the parsing of the same expression 
>>> had already been done in the token creation operation.
>>> 
>>> Is there a way to redesign this process in order to get the tokens 
>>> and also match an object against the expression without parsing the 
>>> same expression twice ?
>>> 
>>> For example, I believe that the call to Expression.fromString must 
>>> have created the tokens, because it has parsed the string. So I 
>>> thought I could reverse the order and do the object matching first, 
>>> keep the Expression instance created in that process and use it to 
>>> extract the tokens. But I can't see how to extract the tokens from 
>>> an Expression instance instead of from an ExpressionParser instance as I'm currently doing.
>>> 
>>> Or another possibility could be that I keep creating the tokens 
>>> first, and then I match my object against them, instead of against 
>>> the string expression that generated those tokens. But I can't see 
>>> how to match an object against tokens.
>>> 
>>> So I'm looking for some ideas.
>>> 
>>> Thanks in advance.
>>> 
>>> Davide Vecchi
>>> 
> 
>

Re: Extracting tokens from an expression and matching an object against that expression without parsing twice

Posted by Andrus Adamchik <an...@objectstyle.org>.

To be clear, 'expression.match(..)' is a perfectly valid use of Cayenne. Anyways, good to hear that you found a solution.

Andrus
 

> On Dec 1, 2014, at 1:10 PM, Davide Vecchi <dv...@amc.dk> wrote:
> 
> Just for the record, I solved my problem by keeping the first step of creating the tokens, and in the second step (matching objects against that expression) I use the existing tokens to do the match myself if the expression is simple enough (basically if it doesn't contain parentheses grouping) otherwise I still call Expression.fromString which will recreate the tokens.
> 
> The execution time of the test suites in the Jenkins build went from ~45' back to the usual ~20', so in this specific case avoiding the double parsing was a substantial optimization.
> 
>> Yeah, I still don't understand why would the code care to poke inside the parser and deal directly with tokens.
> 
> I had not explained that.
> As I said, the design of the application I'm modifying was already based on tokens and I was not supposed to redesign the application; I was just asked to improve the parsing, which in my opinion makes a lot of sense, whether this is right or wrong from one's perspective.
> 
> I don't think it should necessarily be considered wrong that an application that parses expressions also wants to f.ex. show or store the resulting tokens, especially without knowing the purpose of the application.
> 
> However I accept that although the token-related methods I'm calling are public (ExpressionParser.getToken(int) and ExpressionParser.getNextToken() ) they were not intended to be called by an application and I probably didn't read the Cayenne doc well enough so I didn't realize that soon enough. Next time I need just parsing and matching I will not use Cayenne which I realized is intended for a much wider purpose than that. But in this case I will keep the Cayenne-based solution because it's doing the job very well.
> 
> 
> 
> -----Original Message-----
> From: Andrus Adamchik [mailto:andrus@objectstyle.org] 
> Sent: Monday, November 17, 2014 13:13
> To: user@cayenne.apache.org
> Subject: Re: Extracting tokens from an expression and matching an object against that expression without parsing twice
> 
>> It's not easy to explain properly why I need the tokens; the general reason is that the preexisting application, written long ago by several other persons, is designed to use them, and changing its design would be too big an undertaking.
> 
> Yeah, I still don't understand why would the code care to poke inside the parser and deal directly with tokens.
> 
>> I will see if I can use Andrus' pointers to extract the tokens from the Expression instance.
> 
> I am afraid you won't find any *tokens* in an Expression instance. Expression is just a tree of objects that can be used to evaluate stuff. If you need it to match something, you can. But a parsed expression is devoid of any links to the original lexical structure. 
> 
> Andrus
> 
> 
> 
>> On Nov 17, 2014, at 11:46 AM, Davide Vecchi <dv...@amc.dk> wrote:
>> 
>> Thanks for your inputs.
>> 
>> I'm probably showing my technological age here, but I certainly admit that I have this tendency to avoid repeating complex operations as a matter of principle when it's known in advance that the second process will produce exactly the same result as the first one. When I catch myself doing that I always feel that my design is not OK.
>> 
>> However in this case I am quite sure I need to get rid of the double parsing, although I did not demonstrate in a particularly strict way that that's the cause of the slowdown. It's more like a qualified (in my opinion) guess, reinforced by the fact that method Expression.fromString(String) has a TODO saying "TODO: cache expression strings, since this operation is pretty slow" (I'm using version 3.0.2). So it looks like the Cayenne coders too had reasons to worry to some extent about optimization in this area.
>> 
>> I just used JVisualVM to profile the execution and two of the methods where by far most of the time is spent are Expression.fromString(String) and ExpressionParser.getNextToken() . Since I have to cut down the processing time I do have to focus on them first.
>> 
>> The situation here is that I modified a preexisting application which was doing some basic parsing, and after creating the tokens from the parsing it was using them to match the expression against objects. That parsing is basic in that it can only parse simple expressions, f.ex. it doesn't support parentheses grouping.
>> 
>> My changes consisted of removing that parsing code from the application and replacing it with calls to Cayenne, because we need real parsing. Of course the parsing done by Cayenne is way more powerful and that might be the real and fair reason why it takes longer, but even if this is the case it's important for me not to do that parsing twice.
>> 
>> It's not easy to explain properly why I need the tokens; the general reason is that the preexisting application, written long ago by several other persons, is designed to use them, and changing its design would be too big an undertaking. Since all that needs to be improved is the parsing and matching I thought I'd just use a powerful tool to replace only those parts.
>> 
>> I will see if I can use Andrus' pointers to extract the tokens from the Expression instance.
>> 
>> 
>> 
>> -----Original Message-----
>> From: Andrus Adamchik [mailto:andrus@objectstyle.org]
>> Sent: Sunday, November 16, 2014 14:57
>> To: user@cayenne.apache.org
>> Subject: Re: Extracting tokens from an expression and matching an 
>> object against that expression without parsing twice
>> 
>> I second John's assessment. 
>> 
>> BTW, what are the tokens for? Do you actually need to have access to the lexical structure of the String? As of course parsed Expression object is a tree itself and gives you access to its own structure either directly ('getOperand(int)') or via 'traverse' and 'transform' methods.
>> 
>> Andrus
>> 
>>> On Nov 14, 2014, at 9:54 PM, John Huss <jo...@gmail.com> wrote:
>>> 
>>> This looks like a serious micro optimization.  Is the performance for 
>>> this really that critical?  Have you demonstrated that this is your 
>>> application's crucial hot spot?
>>> 
>>> On Fri, Nov 14, 2014 at 7:35 AM, Davide Vecchi <dv...@amc.dk> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I have an expression in a string, and I use Cayenne to parse the 
>>>> expression into tokens, which are needed for a specific purpose.
>>>> 
>>>> However in addition to having the tokens I also need to evaluate an 
>>>> object against that expression, to see if that object matches the expression.
>>>> 
>>>> My problem is that the way I'm doing it causes the parsing to be 
>>>> done twice on the same expression, and I would like to avoid to 
>>>> parse the same expression twice.
>>>> 
>>>> The token creation I'm doing it like this:
>>>> 
>>>> -----------------------------------
>>>> String where = "myField=0";
>>>> 
>>>> Reader reader = new StringReader(where);
>>>> 
>>>> ExpressionParser parser = new ExpressionParser(reader);
>>>> 
>>>> List<Token> tokens = new ArrayList<>();
>>>> 
>>>> Token token = parser.getNextToken();
>>>> 
>>>> while (token != null) {
>>>> 
>>>>   tokens.add(token);
>>>> 
>>>>   token = parser.getNextToken();
>>>> }
>>>> -----------------------------------
>>>> 
>>>> The object matching I'm doing it like this:
>>>> 
>>>> -----------------------------------
>>>> String where = "myField=0";
>>>> 
>>>> Expression expression = Expression.fromString(where);
>>>> 
>>>> boolean matches = expression.match(object);
>>>> -----------------------------------
>>>> 
>>>> The call to Expression.fromString made in the object matching 
>>>> operation performs a parsing, but the parsing of the same expression 
>>>> had already been done in the token creation operation.
>>>> 
>>>> Is there a way to redesign this process in order to get the tokens 
>>>> and also match an object against the expression without parsing the 
>>>> same expression twice ?
>>>> 
>>>> For example, I believe that the call to Expression.fromString must 
>>>> have created the tokens, because it has parsed the string. So I 
>>>> thought I could reverse the order and do the object matching first, 
>>>> keep the Expression instance created in that process and use it to 
>>>> extract the tokens. But I can't see how to extract the tokens from 
>>>> an Expression instance instead of from an ExpressionParser instance as I'm currently doing.
>>>> 
>>>> Or another possibility could be that I keep creating the tokens 
>>>> first, and then I match my object against them, instead of against 
>>>> the string expression that generated those tokens. But I can't see 
>>>> how to match an object against tokens.
>>>> 
>>>> So I'm looking for some ideas.
>>>> 
>>>> Thanks in advance.
>>>> 
>>>> Davide Vecchi
>>>> 
>> 
>> 
> 
>