You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@issues.apache.org on 2010/08/18 02:48:40 UTC

[Bug 6483] New: request to use RE2 in place of RE2C step

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6483

           Summary: request to use RE2 in place of RE2C step
           Product: Spamassassin
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: vijaye@google.com


Hi,

Google released RE2 library with capability to build DFA for large number of
regular expressions (RE2::Set) and do linear matching. It would be great if
this can be taken advantage in spamassassin since Google3 RE2 library supports
most constructs of PCRE and for those complicated things it wont, we can revert
back to normal process.

http://code.google.com/p/re2/
regards
vijay

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Re: [Bug 6483] request to use RE2 in place of RE2C step

Posted by Michael Parker <pa...@pobox.com>.
On Aug 18, 2010, at 10:31 AM, Matt Sergeant wrote:

> bugzilla-daemon@issues.apache.org wrote:
>> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6483
>> 
>> 
>> --- Comment #2 from Mark Martinec 
>> <Ma...@ijs.si>
>>  2010-08-18 10:46:44 UTC ---
>>   
>> 
>>> Is there a mature perl wrapper for RE2?
>>>     
>>> 
>> 
>> http://github.com/dgl/re-engine-RE2
>> 
>> 
>> (it's pretty fresh, don't know how mature it is)
>>   
>> 
> 
> It works, and allows it to be a drop-in replacement for perl's regexp engine, so all you'd need to do is check if it can be loaded, and the perl version is high enough (5.10 required), and support it in the rule compiler (not even the re2c stuff).
> 

Yeah, thats one way to use it but I think the more compelling feature would be use of RE2::Set for scanning.  That is what I'd like to see a wrapper built around, I don't think re::engine::RE2 would necessarily be the solution for that.

Michael



Re: [Bug 6483] request to use RE2 in place of RE2C step

Posted by Matt Sergeant <ms...@messagelabs.com>.
bugzilla-daemon@issues.apache.org wrote:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6483
>
> --- Comment #2 from Mark Martinec<Ma...@ijs.si>  2010-08-18 10:46:44 UTC ---
>    
>> Is there a mature perl wrapper for RE2?
>>      
>
> http://github.com/dgl/re-engine-RE2
>
> (it's pretty fresh, don't know how mature it is)
>    

It works, and allows it to be a drop-in replacement for perl's regexp 
engine, so all you'd need to do is check if it can be loaded, and the 
perl version is high enough (5.10 required), and support it in the rule 
compiler (not even the re2c stuff).

There's some minor bugs in the UTF-8 support apparently. I've asked 
David if he wants to comment here.

Matt.


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

[Bug 6483] request to use RE2 in place of RE2C step

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6483

--- Comment #2 from Mark Martinec <Ma...@ijs.si> 2010-08-18 10:46:44 UTC ---
> Is there a mature perl wrapper for RE2?

http://github.com/dgl/re-engine-RE2

(it's pretty fresh, don't know how mature it is)

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

RE: [Bug 6483] request to use RE2 in place of RE2C step

Posted by Giampaolo Tomassoni <Gi...@Tomassoni.biz>.
> I know the bug report says re2c but I don't think that using RE2 or
> RE2::Set would really be a component of sa-compile, so I'm not sure the
> sa-compile component is appropriate.

I agree with you. It seems to me that RE2 and RE2::Set can't actually be
used to generate a DFA parsing code: they internally build their DFA tables
and they don't seem to let "export" them into parsing C/C++ code.

However, the idea of using a DFA parser instead of an NDFA one is appealing
to me.

What about using flex instead of re2c, then?

Giampaolo


Re: [Bug 6483] request to use RE2 in place of RE2C step

Posted by Michael Parker <pa...@pobox.com>.
I know the bug report says re2c but I don't think that using RE2 or RE2::Set would really be a component of sa-compile, so I'm not sure the sa-compile component is appropriate.

IMO they would be mutually exclusive, although it would take some experimentation because in some cases RE2/RE2::Set might be a better choice than re2c.

Michael



On Aug 18, 2010, at 4:43 AM, bugzilla-daemon@issues.apache.org wrote:

> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6483
> 
> Karsten Bräckelmann <gu...@rudersport.de> changed:
> 
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>           Priority|P2                          |P5
>          Component|Libraries                   |sa-compile
> 
> -- 
> Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug.


[Bug 6483] request to use RE2 in place of RE2C step

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6483

Karsten Bräckelmann <gu...@rudersport.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P5
          Component|Libraries                   |sa-compile

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6483] request to use RE2 in place of RE2C step

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6483

--- Comment #1 from Michael Parker <pa...@pobox.com> 2010-08-17 23:18:15 UTC ---
Is there a mature perl wrapper for RE2?

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.