You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@druid.apache.org by Nagaraj Tantri <na...@gmail.com> on 2020/02/13 10:07:24 UTC

Regex Parser change requirements

Hi,

This is in reference to the ticket:
https://github.com/apache/druid/issues/8583

I see that there isn't any more update.

Here is  my update on the same again:

I have tried X number of ways, trying to copy the CSVParser kind of
> implementations. Pardon me if this is vague, but I see the following code
> is using: if (!matcher.matches()) {} in here
> <https://github.com/apache/incubator-druid/blob/master/core/src/main/java/org/apache/druid/java/util/common/parsers/RegexParser.java#L93> is
> used for matching entire text.
> I feel that beats the purpose of Regex parser, where if the pattern does
> not match until the entire text is matched as a whole. I was of the opinion
> it would best fit the use cases, where we use: while (matcher.find()) {},
> thus providing us with the ability to write regex with more flexibilities.
> With matcher.find() it's easier to replicate a regex pattern find and
> group. Adding a regex to match an entire string as is always ends up with
> using a global filter like (.*). There are so many variants of regex that
> would be missed because of this.


It would be good to get some suggestions around whether we can change the
implementation to actual use *matcher.find()*


Please let me know about this.


Appreciate your time.


Regards,

Nagaraj