You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Nicolas Paris <ni...@gmail.com> on 2016/02/02 13:03:32 UTC

REGEX search Operator

Hello,

I can't find any reference in the documentation about a regex operator.

I would like to be able to query this way :

SELECT *
FROM xxx
WHERE  text_field   regexOperator    'regex_pattern';

Thanks for helping,

Re: REGEX search Operator

Posted by Jason Altekruse <al...@gmail.com>.
Tip for navigating large Github repos. You can type 't' when looking at the
folder structure to open a fast global search. Searching for the functions
is a little extra-complicated in Drill because we actually generate a bunch
of them to cover all of the types. This means that source code templates,
not pure java source code are where you will find them in source control.

Most of the functions in Drill are in the exec.expr.fn.impl package. Here
is an example of functions that are not generated, you could add the
function to this class or make a new class in the same package [1]

[1] -
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java

On Thu, Feb 4, 2016 at 10:03 AM, Nicolas Paris <ni...@gmail.com> wrote:

> John, Jason,
>
> 2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:
>
> > I'd be curios on how you are implemeting the regex... using Java's regex
> > libraries? etc.
> >
> ​Yeah, I use
> java.util.regex
> ​
>
>
> > I know one thing with Hive that always bothered me was the need to double
> > escape things.
> >
> > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of we can
> > avoid that it would be AWESOME.
> >
> ​My guess is this comes from java way to handle strings. All langages I
> have used need to double escape.​
>
>
> > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> altekrusejason@gmail.com
> > >
> > wrote:
>
> ​code is here: https://github.com/parisni/drill-simple-contains
> It's disturbing how it is simple...
> ​
>
>
> > > I think you should actually just put the function in
> > ​​
> > Drill itself. System
> > > native functions are implemented in the same interface as UDFs, because
> > our
> > > mechanism for evaluating them is very efficient (we code generate code
> > > blocks by linking together the bodies of the individual functions to
> > > evaluate a complete expression).
> >
> ​well the folder tree is quite impressive (https://github.com/apache/drill
> ).
> ​
>
> ​what folder is supposed to be "
> ​
> Drill itself"
> ​ ?​
> ​
>
> > > You can open a JIRA, marking it a feature request. You can open a poll
> > > request against the apache github repo, making sure you follow the
> > standard
> > > format for your commit message, prefixing with the JIRA number in the
> > > format
> > > Example:
> > > DRILL-XXXX: Feature description
> > >
> > > This will automatically link the PR to your JIRA.
> >
> ​Ok I will try thanks​
>
> ​a lot​
>
> > > - Jason
> > >
> > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <ni...@gmail.com>
> > wrote:
> > >
> > > > Jason, I have it working,
> > > >
> > > > Just tell me the way to proceed to PR.
> > > > 1. where do I put my maven project ? Witch folder in my drill github
> > > fork?
> > > > 2. do I need a jira ? how proceed ?
> > > >
> > > > For now, I only published it on my github account in a separate
> project
> > > >
> > > > Thanks
> > > >
> > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <altekrusejason@gmail.com
> >:
> > > >
> > > > > Awesome, thanks!
> > > > >
> > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <niparisco@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Well I am creating a udf
> > > > > > good exercise
> > > > > > I hope a PR soon
> > > > > >
> > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > altekrusejason@gmail.com
> > > >:
> > > > > >
> > > > > > > I didn't realize that we were lacking this functionality. As
> the
> > > > > > > repeated_contains operator handles wildcards it makes sense to
> > add
> > > > > such a
> > > > > > > function to drill.
> > > > > > >
> > > > > > > It should be simple to implement, would someone like to open a
> > JIRA
> > > > and
> > > > > > > submit a PR for this?
> > > > > > >
> > > > > > > - Jason
> > > > > > >
> > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <john@omernik.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > I would like to see something like this as well, even if it's
> > an
> > > > > > included
> > > > > > > > UDF like REGEX(field, pattern) using Java's library for regex
> > > like
> > > > > Hive
> > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > > niparisco@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > > Drill neither.
> > > > > > > > > >
> > > > > > > > > ​Drill has SQL functions extension like
> "REPEATED_CONTAINS"​
> > > that
> > > > > > looks
> > > > > > > > to
> > > > > > > > > handle regex. regex operator could be replaced with one new
> > SQL
> > > > > > > > extension ?
> > > > > > > > > I guess I could create my own functions in java, right ?
> > Maybe
> > > > push
> > > > > > it
> > > > > > > > into
> > > > > > > > > github then ?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > ​Sadly not, I'am looking for complex pattern matching. ​
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > > Miura, Masahide
> > > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > I can't find any reference in the documentation about a
> > regex
> > > > > > > operator.
> > > > > > > > > >
> > > > > > > > > > I would like to be able to query this way :
> > > > > > > > > >
> > > > > > > > > > SELECT *
> > > > > > > > > > FROM xxx
> > > > > > > > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > > > > > > > >
> > > > > > > > > > Thanks for helping,
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by Nicolas Paris <ni...@gmail.com>.
John,
I realized I'd make a modification in order your query work. Then I updated
the github project.
select count(1) from view_mydata where srcday = '2016-02-05' and
contains(domain_name, '\\.com$'); will work now. (just redeploy the jars)

I will try to make :
select count(1) from view_mydata where srcday = '2016-02-05' and
contains(domain_name, '\.com$'); working too.

I keep you aware new version


2016-02-09 19:22 GMT+01:00 Nicolas Paris <ni...@gmail.com>:

> John,
>
> About the escape, I will explore that question.
> About your query, you may try this pattern :
> select count(1) from view_mydata where srcday = '2016-02-05' and
> contains(domain_name, '.*\\.com$');
>
>
> 2016-02-09 17:19 GMT+01:00 John Omernik <jo...@omernik.com>:
>
>> I copied both files and it appears to work, but after some testing, I am
>> getting inconsistent results, see below. I ran three queries. first a like
>> looking for domain names that end in .com (domain_name like '%.com' that
>> returned a count of 9.8 million.  Then I tried the contains, with '\.com$'
>> which is ends in dot com.... this failed (this goes to my earlier comments
>> about really wishing we did not do double escaping as normal... for users,
>> double escaping is NOT normal, thus doing that to meet Java's issues is
>> hard... not sure how to handle it, it may be a tough issue, but it really
>> seems like something worth exploring).
>>
>> I then did contains(domain_name, '\\.com$)  This took quite a bit longer,
>> and returned 0, so I am not really sure how the function is working at
>> this
>> point.  Thoughts?
>>
>> John
>>
>>
>>
>> > select count(1) from view_mydata where srcday = '2016-02-05' and
>> domain_name like '%.com';
>> +----------+
>> |  EXPR$0  |
>> +----------+
>> | 9810609  |
>> +----------+
>> 1 row selected (123.673 seconds)
>>
>>
>> > select count(1) from view_mydata where srcday = '2016-02-05' and
>> contains(domain_name, '\.com$');
>> Error: SYSTEM ERROR: ExpressionParsingException: Expression has syntax
>> error! line 1:79:mismatched input '<EOF>' expecting CParen
>>
>> Fragment 1:13
>>
>> [Error Id: 8e46bed4-f9ba-444f-a3aa-2f57db5ae34f on node3:31010]
>> (state=,code=0)
>>
>> > select count(1) from view_mydata where srcday = '2016-02-05' and
>> contains(domain_name, '\\.com$');
>> +---------+
>> | EXPR$0  |
>> +---------+
>> | 0       |
>> +---------+
>> 1 row selected (201.391 seconds)
>>
>>
>>
>> On Tue, Feb 9, 2016 at 9:34 AM, Nicolas Paris <ni...@gmail.com>
>> wrote:
>>
>> > Hi John,
>> >
>> > They are actualy two jars to put in the folder (generated in /target).
>> Have
>> > you restarted drill after ?
>> >
>> >
>> >
>> >
>> >
>> > 2016-02-09 16:20 GMT+01:00 John Omernik <jo...@omernik.com>:
>> >
>> > > Nicolas, not really sure what's happening here. it compiled fine, but
>> > when
>> > > I run it I get this error. The jar is distributed to my bits, I
>> validated
>> > > that... it's in the DRILL_HOME/jars/3rdparty folder on every bit...
>> do I
>> > > need to do something more than that?
>> > >
>> > >
>> > >
>> > > select count(1) from view_myview where srcday = '2016-02-05' and
>> > > contains(domain_name, 'com');
>> > > Error: SYSTEM ERROR: IllegalArgumentException: resource
>> > > /org/apache/drill/contrib/function/SimpleContains.java relative to
>> > > org.apache.drill.contrib.function.SimpleContains not found.
>> > >
>> > > Fragment 1:44
>> > >
>> > > [Error Id: 30c11047-9896-4e16-a207-e3cce79c9db5 on node1:31010]
>> > >
>> > >   (java.lang.IllegalArgumentException) resource
>> > > /org/apache/drill/contrib/function/SimpleContains.java relative to
>> > > org.apache.drill.contrib.function.SimpleContains not found.
>> > >     com.google.common.base.Preconditions.checkArgument():119
>> > >     com.google.common.io.Resources.getResource():203
>> > >     org.apache.drill.exec.expr.fn.FunctionInitializer.get():127
>> > >     org.apache.drill.exec.expr.fn.FunctionInitializer.checkInit():99
>> > >     org.apache.drill.exec.expr.fn.FunctionInitializer.getMethod():81
>> > >     org.apache.drill.exec.expr.fn.DrillFuncHolder.meth():94
>> > >     org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.setupBody():50
>> > >     org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.renderEnd():80
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression():203
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression():1078
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():816
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():796
>> > >
>> >  org.apache.drill.common.expression.FunctionHolderExpression.accept():47
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanAnd():690
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanOperator():172
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitBooleanOperator():1092
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():836
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():796
>> > >     org.apache.drill.common.expression.BooleanOperator.accept():36
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitReturnValueExpression():551
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown():344
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown():1328
>> > >
>> > >
>> >
>> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():1027
>> > >
>> > >
>> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():796
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.physical.impl.filter.ReturnValueExpression.accept():56
>> > >     org.apache.drill.exec.expr.EvaluationVisitor.addExpr():105
>> > >     org.apache.drill.exec.expr.ClassGenerator.addExpr():227
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():187
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():109
>> > >
>>  org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
>> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>> > >
>>  org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
>> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>> > >
>>  org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
>> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema():100
>> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():142
>> > >     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>> > >
>> > >
>> > >
>> >
>> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
>> > >     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>> > >     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
>> > >     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
>> > >     java.security.AccessController.doPrivileged():-2
>> > >     javax.security.auth.Subject.doAs():415
>> > >     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>> > >     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
>> > >     org.apache.drill.common.SelfCleaningRunnable.run():38
>> > >     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>> > >     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>> > >     java.lang.Thread.run():745 (state=,code=0)
>> > >
>> > > On Fri, Feb 5, 2016 at 2:39 AM, Nicolas Paris <ni...@gmail.com>
>> > wrote:
>> > >
>> > > > John,
>> > > >
>> > > > Sorry for that, this already work as expected.
>> > > > Give it a try, this is so easy to deploy
>> > > >
>> > > > SELECT first_name FROM cp.`employee.json` WHERE
>> > > contains(first_name,'\w+')
>> > > > LIMIT 5;
>> > > > first_name |
>> > > > -----------|
>> > > > Sheri      |
>> > > > Derrick    |
>> > > > Michael    |
>> > > > Maya       |
>> > > > Roberta    |
>> > > >
>> > > >
>> > > > 2016-02-04 20:41 GMT+01:00 John Omernik <jo...@omernik.com>:
>> > > >
>> > > > > Ya, do you see where I am coming from here? Let's let the users
>> > submit
>> > > > > regex in the pure form if possible, and code the nuances of java
>> > regex
>> > > > > behind the scenes. I think it would be a great way to make Drill
>> very
>> > > > > accessible and desirable.  I think what happened in Hive is the
>> regex
>> > > > > commands started with the users having the escape and now there
>> are
>> > > just
>> > > > to
>> > > > > many things that using the escaped regex and the project doesn't
>> want
>> > > to
>> > > > > adjust.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris <
>> niparisco@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > > You mean:
>> > > > > > userRegex=>javaRegex
>> > > > > > "\d" => "\\d"
>> > > > > > "\w" => "\\w"
>> > > > > > "\n" => "\n"
>> > > > > > I can do that thanks to regex I guess.
>> > > > > > I will give a try
>> > > > > >
>> > > > > >
>> > > > > > 2016-02-04 19:37 GMT+01:00 John Omernik <jo...@omernik.com>:
>> > > > > >
>> > > > > > > So my question on the double escape, is there no way to handle
>> > that
>> > > > so
>> > > > > > the
>> > > > > > > user can use single escaped regex? I know many folks who use
>> big
>> > > data
>> > > > > > > platform to test large complex regexes for things like
>> security
>> > > > > > appliances,
>> > > > > > > and having to convert the regex seems like a lot of work if
>> you
>> > > > > consider
>> > > > > > > every user has to do that.  If there was a way to do it in
>> Drill,
>> > > > that
>> > > > > > > would save countless people hours and save many mistakes.
>> > > > > > >
>> > > > > > > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <
>> > > niparisco@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > John, Jason,
>> > > > > > > >
>> > > > > > > > 2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:
>> > > > > > > >
>> > > > > > > > > I'd be curios on how you are implemeting the regex...
>> using
>> > > > Java's
>> > > > > > > regex
>> > > > > > > > > libraries? etc.
>> > > > > > > > >
>> > > > > > > > ​Yeah, I use
>> > > > > > > > java.util.regex
>> > > > > > > > ​
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > I know one thing with Hive that always bothered me was the
>> > need
>> > > > to
>> > > > > > > double
>> > > > > > > > > escape things.
>> > > > > > > > >
>> > > > > > > > > '\d\d\d\d-\d\d-\d\d'  needed to be
>> > '\\d\\d\\d\\d-\\d\\d-\\d\\d'
>> > > > of
>> > > > > we
>> > > > > > > can
>> > > > > > > > > avoid that it would be AWESOME.
>> > > > > > > > >
>> > > > > > > > ​My guess is this comes from java way to handle strings. All
>> > > > > langages I
>> > > > > > > > have used need to double escape.​
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
>> > > > > > > > altekrusejason@gmail.com
>> > > > > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > ​code is here:
>> > https://github.com/parisni/drill-simple-contains
>> > > > > > > > It's disturbing how it is simple...
>> > > > > > > > ​
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > > > I think you should actually just put the function in
>> > > > > > > > > ​​
>> > > > > > > > > Drill itself. System
>> > > > > > > > > > native functions are implemented in the same interface
>> as
>> > > UDFs,
>> > > > > > > because
>> > > > > > > > > our
>> > > > > > > > > > mechanism for evaluating them is very efficient (we code
>> > > > generate
>> > > > > > > code
>> > > > > > > > > > blocks by linking together the bodies of the individual
>> > > > functions
>> > > > > > to
>> > > > > > > > > > evaluate a complete expression).
>> > > > > > > > >
>> > > > > > > > ​well the folder tree is quite impressive (
>> > > > > > > https://github.com/apache/drill
>> > > > > > > > ).
>> > > > > > > > ​
>> > > > > > > >
>> > > > > > > > ​what folder is supposed to be "
>> > > > > > > > ​
>> > > > > > > > Drill itself"
>> > > > > > > > ​ ?​
>> > > > > > > > ​
>> > > > > > > >
>> > > > > > > > > > You can open a JIRA, marking it a feature request. You
>> can
>> > > > open a
>> > > > > > > poll
>> > > > > > > > > > request against the apache github repo, making sure you
>> > > follow
>> > > > > the
>> > > > > > > > > standard
>> > > > > > > > > > format for your commit message, prefixing with the JIRA
>> > > number
>> > > > in
>> > > > > > the
>> > > > > > > > > > format
>> > > > > > > > > > Example:
>> > > > > > > > > > DRILL-XXXX: Feature description
>> > > > > > > > > >
>> > > > > > > > > > This will automatically link the PR to your JIRA.
>> > > > > > > > >
>> > > > > > > > ​Ok I will try thanks​
>> > > > > > > >
>> > > > > > > > ​a lot​
>> > > > > > > >
>> > > > > > > > > > - Jason
>> > > > > > > > > >
>> > > > > > > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <
>> > > > > niparisco@gmail.com
>> > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Jason, I have it working,
>> > > > > > > > > > >
>> > > > > > > > > > > Just tell me the way to proceed to PR.
>> > > > > > > > > > > 1. where do I put my maven project ? Witch folder in
>> my
>> > > drill
>> > > > > > > github
>> > > > > > > > > > fork?
>> > > > > > > > > > > 2. do I need a jira ? how proceed ?
>> > > > > > > > > > >
>> > > > > > > > > > > For now, I only published it on my github account in a
>> > > > separate
>> > > > > > > > project
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks
>> > > > > > > > > > >
>> > > > > > > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
>> > > > > > > altekrusejason@gmail.com
>> > > > > > > > >:
>> > > > > > > > > > >
>> > > > > > > > > > > > Awesome, thanks!
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <
>> > > > > > > niparisco@gmail.com
>> > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Well I am creating a udf
>> > > > > > > > > > > > > good exercise
>> > > > > > > > > > > > > I hope a PR soon
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
>> > > > > > > > > altekrusejason@gmail.com
>> > > > > > > > > > >:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > I didn't realize that we were lacking this
>> > > > functionality.
>> > > > > > As
>> > > > > > > > the
>> > > > > > > > > > > > > > repeated_contains operator handles wildcards it
>> > makes
>> > > > > sense
>> > > > > > > to
>> > > > > > > > > add
>> > > > > > > > > > > > such a
>> > > > > > > > > > > > > > function to drill.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > It should be simple to implement, would someone
>> > like
>> > > to
>> > > > > > open
>> > > > > > > a
>> > > > > > > > > JIRA
>> > > > > > > > > > > and
>> > > > > > > > > > > > > > submit a PR for this?
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > - Jason
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <
>> > > > > > > john@omernik.com
>> > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I would like to see something like this as
>> well,
>> > > even
>> > > > > if
>> > > > > > > it's
>> > > > > > > > > an
>> > > > > > > > > > > > > included
>> > > > > > > > > > > > > > > UDF like REGEX(field, pattern) using Java's
>> > library
>> > > > for
>> > > > > > > regex
>> > > > > > > > > > like
>> > > > > > > > > > > > Hive
>> > > > > > > > > > > > > > > does.  That would be EXTREMELY helpful.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris
>> <
>> > > > > > > > > > niparisco@gmail.com
>> > > > > > > > > > > >
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > ANSI SQL doesn't define regex operator.
>> > > > > > > > > > > > > > > > > Drill neither.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > ​Drill has SQL functions extension like
>> > > > > > > > "REPEATED_CONTAINS"​
>> > > > > > > > > > that
>> > > > > > > > > > > > > looks
>> > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > handle regex. regex operator could be
>> replaced
>> > > with
>> > > > > one
>> > > > > > > new
>> > > > > > > > > SQL
>> > > > > > > > > > > > > > > extension ?
>> > > > > > > > > > > > > > > > I guess I could create my own functions in
>> > java,
>> > > > > right
>> > > > > > ?
>> > > > > > > > > Maybe
>> > > > > > > > > > > push
>> > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > into
>> > > > > > > > > > > > > > > > github then ?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Doesn't it enough 'LIKE' operator?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > ​Sadly not, I'am looking for complex pattern
>> > > > > matching.
>> > > > > > ​
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > --
>> > > > > > > > > > > > > > > > > Miura, Masahide
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > -----Original Message-----
>> > > > > > > > > > > > > > > > > From: Nicolas Paris [mailto:
>> > > niparisco@gmail.com]
>> > > > > > > > > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
>> > > > > > > > > > > > > > > > > To: user@drill.apache.org
>> > > > > > > > > > > > > > > > > Subject: REGEX search Operator
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Hello,
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I can't find any reference in the
>> > documentation
>> > > > > > about a
>> > > > > > > > > regex
>> > > > > > > > > > > > > > operator.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I would like to be able to query this way
>> :
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > SELECT *
>> > > > > > > > > > > > > > > > > FROM xxx
>> > > > > > > > > > > > > > > > > WHERE  text_field   regexOperator
>> > > > > 'regex_pattern';
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Thanks for helping,
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: REGEX search Operator

Posted by Nicolas Paris <ni...@gmail.com>.
John,

About the escape, I will explore that question.
About your query, you may try this pattern :
select count(1) from view_mydata where srcday = '2016-02-05' and
contains(domain_name, '.*\\.com$');


2016-02-09 17:19 GMT+01:00 John Omernik <jo...@omernik.com>:

> I copied both files and it appears to work, but after some testing, I am
> getting inconsistent results, see below. I ran three queries. first a like
> looking for domain names that end in .com (domain_name like '%.com' that
> returned a count of 9.8 million.  Then I tried the contains, with '\.com$'
> which is ends in dot com.... this failed (this goes to my earlier comments
> about really wishing we did not do double escaping as normal... for users,
> double escaping is NOT normal, thus doing that to meet Java's issues is
> hard... not sure how to handle it, it may be a tough issue, but it really
> seems like something worth exploring).
>
> I then did contains(domain_name, '\\.com$)  This took quite a bit longer,
> and returned 0, so I am not really sure how the function is working at this
> point.  Thoughts?
>
> John
>
>
>
> > select count(1) from view_mydata where srcday = '2016-02-05' and
> domain_name like '%.com';
> +----------+
> |  EXPR$0  |
> +----------+
> | 9810609  |
> +----------+
> 1 row selected (123.673 seconds)
>
>
> > select count(1) from view_mydata where srcday = '2016-02-05' and
> contains(domain_name, '\.com$');
> Error: SYSTEM ERROR: ExpressionParsingException: Expression has syntax
> error! line 1:79:mismatched input '<EOF>' expecting CParen
>
> Fragment 1:13
>
> [Error Id: 8e46bed4-f9ba-444f-a3aa-2f57db5ae34f on node3:31010]
> (state=,code=0)
>
> > select count(1) from view_mydata where srcday = '2016-02-05' and
> contains(domain_name, '\\.com$');
> +---------+
> | EXPR$0  |
> +---------+
> | 0       |
> +---------+
> 1 row selected (201.391 seconds)
>
>
>
> On Tue, Feb 9, 2016 at 9:34 AM, Nicolas Paris <ni...@gmail.com> wrote:
>
> > Hi John,
> >
> > They are actualy two jars to put in the folder (generated in /target).
> Have
> > you restarted drill after ?
> >
> >
> >
> >
> >
> > 2016-02-09 16:20 GMT+01:00 John Omernik <jo...@omernik.com>:
> >
> > > Nicolas, not really sure what's happening here. it compiled fine, but
> > when
> > > I run it I get this error. The jar is distributed to my bits, I
> validated
> > > that... it's in the DRILL_HOME/jars/3rdparty folder on every bit... do
> I
> > > need to do something more than that?
> > >
> > >
> > >
> > > select count(1) from view_myview where srcday = '2016-02-05' and
> > > contains(domain_name, 'com');
> > > Error: SYSTEM ERROR: IllegalArgumentException: resource
> > > /org/apache/drill/contrib/function/SimpleContains.java relative to
> > > org.apache.drill.contrib.function.SimpleContains not found.
> > >
> > > Fragment 1:44
> > >
> > > [Error Id: 30c11047-9896-4e16-a207-e3cce79c9db5 on node1:31010]
> > >
> > >   (java.lang.IllegalArgumentException) resource
> > > /org/apache/drill/contrib/function/SimpleContains.java relative to
> > > org.apache.drill.contrib.function.SimpleContains not found.
> > >     com.google.common.base.Preconditions.checkArgument():119
> > >     com.google.common.io.Resources.getResource():203
> > >     org.apache.drill.exec.expr.fn.FunctionInitializer.get():127
> > >     org.apache.drill.exec.expr.fn.FunctionInitializer.checkInit():99
> > >     org.apache.drill.exec.expr.fn.FunctionInitializer.getMethod():81
> > >     org.apache.drill.exec.expr.fn.DrillFuncHolder.meth():94
> > >     org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.setupBody():50
> > >     org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.renderEnd():80
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression():203
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression():1078
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():816
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():796
> > >
> >  org.apache.drill.common.expression.FunctionHolderExpression.accept():47
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanAnd():690
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanOperator():172
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitBooleanOperator():1092
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():836
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():796
> > >     org.apache.drill.common.expression.BooleanOperator.accept():36
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitReturnValueExpression():551
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown():344
> > >
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown():1328
> > >
> > >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():1027
> > >
> > >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():796
> > >
> > >
> > >
> >
> org.apache.drill.exec.physical.impl.filter.ReturnValueExpression.accept():56
> > >     org.apache.drill.exec.expr.EvaluationVisitor.addExpr():105
> > >     org.apache.drill.exec.expr.ClassGenerator.addExpr():227
> > >
> > >
> > >
> >
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():187
> > >
> > >
> > >
> >
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():109
> > >
>  org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
> > >
>  org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> > >
> > >
> > >
> >
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
> > >
>  org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> > >
> > >
> > >
> >
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
> > >
> > >
> > >
> >
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema():100
> > >     org.apache.drill.exec.record.AbstractRecordBatch.next():142
> > >     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> > >
> > >
> > >
> >
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
> > >     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> > >     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
> > >     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
> > >     java.security.AccessController.doPrivileged():-2
> > >     javax.security.auth.Subject.doAs():415
> > >     org.apache.hadoop.security.UserGroupInformation.doAs():1595
> > >     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
> > >     org.apache.drill.common.SelfCleaningRunnable.run():38
> > >     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> > >     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> > >     java.lang.Thread.run():745 (state=,code=0)
> > >
> > > On Fri, Feb 5, 2016 at 2:39 AM, Nicolas Paris <ni...@gmail.com>
> > wrote:
> > >
> > > > John,
> > > >
> > > > Sorry for that, this already work as expected.
> > > > Give it a try, this is so easy to deploy
> > > >
> > > > SELECT first_name FROM cp.`employee.json` WHERE
> > > contains(first_name,'\w+')
> > > > LIMIT 5;
> > > > first_name |
> > > > -----------|
> > > > Sheri      |
> > > > Derrick    |
> > > > Michael    |
> > > > Maya       |
> > > > Roberta    |
> > > >
> > > >
> > > > 2016-02-04 20:41 GMT+01:00 John Omernik <jo...@omernik.com>:
> > > >
> > > > > Ya, do you see where I am coming from here? Let's let the users
> > submit
> > > > > regex in the pure form if possible, and code the nuances of java
> > regex
> > > > > behind the scenes. I think it would be a great way to make Drill
> very
> > > > > accessible and desirable.  I think what happened in Hive is the
> regex
> > > > > commands started with the users having the escape and now there are
> > > just
> > > > to
> > > > > many things that using the escaped regex and the project doesn't
> want
> > > to
> > > > > adjust.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris <niparisco@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > You mean:
> > > > > > userRegex=>javaRegex
> > > > > > "\d" => "\\d"
> > > > > > "\w" => "\\w"
> > > > > > "\n" => "\n"
> > > > > > I can do that thanks to regex I guess.
> > > > > > I will give a try
> > > > > >
> > > > > >
> > > > > > 2016-02-04 19:37 GMT+01:00 John Omernik <jo...@omernik.com>:
> > > > > >
> > > > > > > So my question on the double escape, is there no way to handle
> > that
> > > > so
> > > > > > the
> > > > > > > user can use single escaped regex? I know many folks who use
> big
> > > data
> > > > > > > platform to test large complex regexes for things like security
> > > > > > appliances,
> > > > > > > and having to convert the regex seems like a lot of work if you
> > > > > consider
> > > > > > > every user has to do that.  If there was a way to do it in
> Drill,
> > > > that
> > > > > > > would save countless people hours and save many mistakes.
> > > > > > >
> > > > > > > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <
> > > niparisco@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > John, Jason,
> > > > > > > >
> > > > > > > > 2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:
> > > > > > > >
> > > > > > > > > I'd be curios on how you are implemeting the regex... using
> > > > Java's
> > > > > > > regex
> > > > > > > > > libraries? etc.
> > > > > > > > >
> > > > > > > > ​Yeah, I use
> > > > > > > > java.util.regex
> > > > > > > > ​
> > > > > > > >
> > > > > > > >
> > > > > > > > > I know one thing with Hive that always bothered me was the
> > need
> > > > to
> > > > > > > double
> > > > > > > > > escape things.
> > > > > > > > >
> > > > > > > > > '\d\d\d\d-\d\d-\d\d'  needed to be
> > '\\d\\d\\d\\d-\\d\\d-\\d\\d'
> > > > of
> > > > > we
> > > > > > > can
> > > > > > > > > avoid that it would be AWESOME.
> > > > > > > > >
> > > > > > > > ​My guess is this comes from java way to handle strings. All
> > > > > langages I
> > > > > > > > have used need to double escape.​
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> > > > > > > > altekrusejason@gmail.com
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > >
> > > > > > > > ​code is here:
> > https://github.com/parisni/drill-simple-contains
> > > > > > > > It's disturbing how it is simple...
> > > > > > > > ​
> > > > > > > >
> > > > > > > >
> > > > > > > > > > I think you should actually just put the function in
> > > > > > > > > ​​
> > > > > > > > > Drill itself. System
> > > > > > > > > > native functions are implemented in the same interface as
> > > UDFs,
> > > > > > > because
> > > > > > > > > our
> > > > > > > > > > mechanism for evaluating them is very efficient (we code
> > > > generate
> > > > > > > code
> > > > > > > > > > blocks by linking together the bodies of the individual
> > > > functions
> > > > > > to
> > > > > > > > > > evaluate a complete expression).
> > > > > > > > >
> > > > > > > > ​well the folder tree is quite impressive (
> > > > > > > https://github.com/apache/drill
> > > > > > > > ).
> > > > > > > > ​
> > > > > > > >
> > > > > > > > ​what folder is supposed to be "
> > > > > > > > ​
> > > > > > > > Drill itself"
> > > > > > > > ​ ?​
> > > > > > > > ​
> > > > > > > >
> > > > > > > > > > You can open a JIRA, marking it a feature request. You
> can
> > > > open a
> > > > > > > poll
> > > > > > > > > > request against the apache github repo, making sure you
> > > follow
> > > > > the
> > > > > > > > > standard
> > > > > > > > > > format for your commit message, prefixing with the JIRA
> > > number
> > > > in
> > > > > > the
> > > > > > > > > > format
> > > > > > > > > > Example:
> > > > > > > > > > DRILL-XXXX: Feature description
> > > > > > > > > >
> > > > > > > > > > This will automatically link the PR to your JIRA.
> > > > > > > > >
> > > > > > > > ​Ok I will try thanks​
> > > > > > > >
> > > > > > > > ​a lot​
> > > > > > > >
> > > > > > > > > > - Jason
> > > > > > > > > >
> > > > > > > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <
> > > > > niparisco@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Jason, I have it working,
> > > > > > > > > > >
> > > > > > > > > > > Just tell me the way to proceed to PR.
> > > > > > > > > > > 1. where do I put my maven project ? Witch folder in my
> > > drill
> > > > > > > github
> > > > > > > > > > fork?
> > > > > > > > > > > 2. do I need a jira ? how proceed ?
> > > > > > > > > > >
> > > > > > > > > > > For now, I only published it on my github account in a
> > > > separate
> > > > > > > > project
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
> > > > > > > altekrusejason@gmail.com
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Awesome, thanks!
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <
> > > > > > > niparisco@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Well I am creating a udf
> > > > > > > > > > > > > good exercise
> > > > > > > > > > > > > I hope a PR soon
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > > > > > > > > altekrusejason@gmail.com
> > > > > > > > > > >:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I didn't realize that we were lacking this
> > > > functionality.
> > > > > > As
> > > > > > > > the
> > > > > > > > > > > > > > repeated_contains operator handles wildcards it
> > makes
> > > > > sense
> > > > > > > to
> > > > > > > > > add
> > > > > > > > > > > > such a
> > > > > > > > > > > > > > function to drill.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It should be simple to implement, would someone
> > like
> > > to
> > > > > > open
> > > > > > > a
> > > > > > > > > JIRA
> > > > > > > > > > > and
> > > > > > > > > > > > > > submit a PR for this?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > - Jason
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <
> > > > > > > john@omernik.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I would like to see something like this as
> well,
> > > even
> > > > > if
> > > > > > > it's
> > > > > > > > > an
> > > > > > > > > > > > > included
> > > > > > > > > > > > > > > UDF like REGEX(field, pattern) using Java's
> > library
> > > > for
> > > > > > > regex
> > > > > > > > > > like
> > > > > > > > > > > > Hive
> > > > > > > > > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > > > > > > > > > niparisco@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > > > > > > > > > Drill neither.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ​Drill has SQL functions extension like
> > > > > > > > "REPEATED_CONTAINS"​
> > > > > > > > > > that
> > > > > > > > > > > > > looks
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > handle regex. regex operator could be
> replaced
> > > with
> > > > > one
> > > > > > > new
> > > > > > > > > SQL
> > > > > > > > > > > > > > > extension ?
> > > > > > > > > > > > > > > > I guess I could create my own functions in
> > java,
> > > > > right
> > > > > > ?
> > > > > > > > > Maybe
> > > > > > > > > > > push
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > into
> > > > > > > > > > > > > > > > github then ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ​Sadly not, I'am looking for complex pattern
> > > > > matching.
> > > > > > ​
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > Miura, Masahide
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > > > > > From: Nicolas Paris [mailto:
> > > niparisco@gmail.com]
> > > > > > > > > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I can't find any reference in the
> > documentation
> > > > > > about a
> > > > > > > > > regex
> > > > > > > > > > > > > > operator.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I would like to be able to query this way :
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > SELECT *
> > > > > > > > > > > > > > > > > FROM xxx
> > > > > > > > > > > > > > > > > WHERE  text_field   regexOperator
> > > > > 'regex_pattern';
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for helping,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by John Omernik <jo...@omernik.com>.
I copied both files and it appears to work, but after some testing, I am
getting inconsistent results, see below. I ran three queries. first a like
looking for domain names that end in .com (domain_name like '%.com' that
returned a count of 9.8 million.  Then I tried the contains, with '\.com$'
which is ends in dot com.... this failed (this goes to my earlier comments
about really wishing we did not do double escaping as normal... for users,
double escaping is NOT normal, thus doing that to meet Java's issues is
hard... not sure how to handle it, it may be a tough issue, but it really
seems like something worth exploring).

I then did contains(domain_name, '\\.com$)  This took quite a bit longer,
and returned 0, so I am not really sure how the function is working at this
point.  Thoughts?

John



> select count(1) from view_mydata where srcday = '2016-02-05' and
domain_name like '%.com';
+----------+
|  EXPR$0  |
+----------+
| 9810609  |
+----------+
1 row selected (123.673 seconds)


> select count(1) from view_mydata where srcday = '2016-02-05' and
contains(domain_name, '\.com$');
Error: SYSTEM ERROR: ExpressionParsingException: Expression has syntax
error! line 1:79:mismatched input '<EOF>' expecting CParen

Fragment 1:13

[Error Id: 8e46bed4-f9ba-444f-a3aa-2f57db5ae34f on node3:31010]
(state=,code=0)

> select count(1) from view_mydata where srcday = '2016-02-05' and
contains(domain_name, '\\.com$');
+---------+
| EXPR$0  |
+---------+
| 0       |
+---------+
1 row selected (201.391 seconds)



On Tue, Feb 9, 2016 at 9:34 AM, Nicolas Paris <ni...@gmail.com> wrote:

> Hi John,
>
> They are actualy two jars to put in the folder (generated in /target). Have
> you restarted drill after ?
>
>
>
>
>
> 2016-02-09 16:20 GMT+01:00 John Omernik <jo...@omernik.com>:
>
> > Nicolas, not really sure what's happening here. it compiled fine, but
> when
> > I run it I get this error. The jar is distributed to my bits, I validated
> > that... it's in the DRILL_HOME/jars/3rdparty folder on every bit... do I
> > need to do something more than that?
> >
> >
> >
> > select count(1) from view_myview where srcday = '2016-02-05' and
> > contains(domain_name, 'com');
> > Error: SYSTEM ERROR: IllegalArgumentException: resource
> > /org/apache/drill/contrib/function/SimpleContains.java relative to
> > org.apache.drill.contrib.function.SimpleContains not found.
> >
> > Fragment 1:44
> >
> > [Error Id: 30c11047-9896-4e16-a207-e3cce79c9db5 on node1:31010]
> >
> >   (java.lang.IllegalArgumentException) resource
> > /org/apache/drill/contrib/function/SimpleContains.java relative to
> > org.apache.drill.contrib.function.SimpleContains not found.
> >     com.google.common.base.Preconditions.checkArgument():119
> >     com.google.common.io.Resources.getResource():203
> >     org.apache.drill.exec.expr.fn.FunctionInitializer.get():127
> >     org.apache.drill.exec.expr.fn.FunctionInitializer.checkInit():99
> >     org.apache.drill.exec.expr.fn.FunctionInitializer.getMethod():81
> >     org.apache.drill.exec.expr.fn.DrillFuncHolder.meth():94
> >     org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.setupBody():50
> >     org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.renderEnd():80
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression():203
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression():1078
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():816
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():796
> >
>  org.apache.drill.common.expression.FunctionHolderExpression.accept():47
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanAnd():690
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanOperator():172
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitBooleanOperator():1092
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():836
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():796
> >     org.apache.drill.common.expression.BooleanOperator.accept():36
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitReturnValueExpression():551
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown():344
> >
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown():1328
> >
> >
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():1027
> >
> > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():796
> >
> >
> >
> org.apache.drill.exec.physical.impl.filter.ReturnValueExpression.accept():56
> >     org.apache.drill.exec.expr.EvaluationVisitor.addExpr():105
> >     org.apache.drill.exec.expr.ClassGenerator.addExpr():227
> >
> >
> >
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():187
> >
> >
> >
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():109
> >     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
> >     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> >
> >
> >
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
> >     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> >
> >
> >
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
> >
> >
> >
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema():100
> >     org.apache.drill.exec.record.AbstractRecordBatch.next():142
> >     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> >
> >
> >
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
> >     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> >     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
> >     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
> >     java.security.AccessController.doPrivileged():-2
> >     javax.security.auth.Subject.doAs():415
> >     org.apache.hadoop.security.UserGroupInformation.doAs():1595
> >     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
> >     org.apache.drill.common.SelfCleaningRunnable.run():38
> >     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> >     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> >     java.lang.Thread.run():745 (state=,code=0)
> >
> > On Fri, Feb 5, 2016 at 2:39 AM, Nicolas Paris <ni...@gmail.com>
> wrote:
> >
> > > John,
> > >
> > > Sorry for that, this already work as expected.
> > > Give it a try, this is so easy to deploy
> > >
> > > SELECT first_name FROM cp.`employee.json` WHERE
> > contains(first_name,'\w+')
> > > LIMIT 5;
> > > first_name |
> > > -----------|
> > > Sheri      |
> > > Derrick    |
> > > Michael    |
> > > Maya       |
> > > Roberta    |
> > >
> > >
> > > 2016-02-04 20:41 GMT+01:00 John Omernik <jo...@omernik.com>:
> > >
> > > > Ya, do you see where I am coming from here? Let's let the users
> submit
> > > > regex in the pure form if possible, and code the nuances of java
> regex
> > > > behind the scenes. I think it would be a great way to make Drill very
> > > > accessible and desirable.  I think what happened in Hive is the regex
> > > > commands started with the users having the escape and now there are
> > just
> > > to
> > > > many things that using the escaped regex and the project doesn't want
> > to
> > > > adjust.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris <ni...@gmail.com>
> > > wrote:
> > > >
> > > > > You mean:
> > > > > userRegex=>javaRegex
> > > > > "\d" => "\\d"
> > > > > "\w" => "\\w"
> > > > > "\n" => "\n"
> > > > > I can do that thanks to regex I guess.
> > > > > I will give a try
> > > > >
> > > > >
> > > > > 2016-02-04 19:37 GMT+01:00 John Omernik <jo...@omernik.com>:
> > > > >
> > > > > > So my question on the double escape, is there no way to handle
> that
> > > so
> > > > > the
> > > > > > user can use single escaped regex? I know many folks who use big
> > data
> > > > > > platform to test large complex regexes for things like security
> > > > > appliances,
> > > > > > and having to convert the regex seems like a lot of work if you
> > > > consider
> > > > > > every user has to do that.  If there was a way to do it in Drill,
> > > that
> > > > > > would save countless people hours and save many mistakes.
> > > > > >
> > > > > > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <
> > niparisco@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > John, Jason,
> > > > > > >
> > > > > > > 2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:
> > > > > > >
> > > > > > > > I'd be curios on how you are implemeting the regex... using
> > > Java's
> > > > > > regex
> > > > > > > > libraries? etc.
> > > > > > > >
> > > > > > > ​Yeah, I use
> > > > > > > java.util.regex
> > > > > > > ​
> > > > > > >
> > > > > > >
> > > > > > > > I know one thing with Hive that always bothered me was the
> need
> > > to
> > > > > > double
> > > > > > > > escape things.
> > > > > > > >
> > > > > > > > '\d\d\d\d-\d\d-\d\d'  needed to be
> '\\d\\d\\d\\d-\\d\\d-\\d\\d'
> > > of
> > > > we
> > > > > > can
> > > > > > > > avoid that it would be AWESOME.
> > > > > > > >
> > > > > > > ​My guess is this comes from java way to handle strings. All
> > > > langages I
> > > > > > > have used need to double escape.​
> > > > > > >
> > > > > > >
> > > > > > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> > > > > > > altekrusejason@gmail.com
> > > > > > > > >
> > > > > > > > wrote:
> > > > > > >
> > > > > > > ​code is here:
> https://github.com/parisni/drill-simple-contains
> > > > > > > It's disturbing how it is simple...
> > > > > > > ​
> > > > > > >
> > > > > > >
> > > > > > > > > I think you should actually just put the function in
> > > > > > > > ​​
> > > > > > > > Drill itself. System
> > > > > > > > > native functions are implemented in the same interface as
> > UDFs,
> > > > > > because
> > > > > > > > our
> > > > > > > > > mechanism for evaluating them is very efficient (we code
> > > generate
> > > > > > code
> > > > > > > > > blocks by linking together the bodies of the individual
> > > functions
> > > > > to
> > > > > > > > > evaluate a complete expression).
> > > > > > > >
> > > > > > > ​well the folder tree is quite impressive (
> > > > > > https://github.com/apache/drill
> > > > > > > ).
> > > > > > > ​
> > > > > > >
> > > > > > > ​what folder is supposed to be "
> > > > > > > ​
> > > > > > > Drill itself"
> > > > > > > ​ ?​
> > > > > > > ​
> > > > > > >
> > > > > > > > > You can open a JIRA, marking it a feature request. You can
> > > open a
> > > > > > poll
> > > > > > > > > request against the apache github repo, making sure you
> > follow
> > > > the
> > > > > > > > standard
> > > > > > > > > format for your commit message, prefixing with the JIRA
> > number
> > > in
> > > > > the
> > > > > > > > > format
> > > > > > > > > Example:
> > > > > > > > > DRILL-XXXX: Feature description
> > > > > > > > >
> > > > > > > > > This will automatically link the PR to your JIRA.
> > > > > > > >
> > > > > > > ​Ok I will try thanks​
> > > > > > >
> > > > > > > ​a lot​
> > > > > > >
> > > > > > > > > - Jason
> > > > > > > > >
> > > > > > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <
> > > > niparisco@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Jason, I have it working,
> > > > > > > > > >
> > > > > > > > > > Just tell me the way to proceed to PR.
> > > > > > > > > > 1. where do I put my maven project ? Witch folder in my
> > drill
> > > > > > github
> > > > > > > > > fork?
> > > > > > > > > > 2. do I need a jira ? how proceed ?
> > > > > > > > > >
> > > > > > > > > > For now, I only published it on my github account in a
> > > separate
> > > > > > > project
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
> > > > > > altekrusejason@gmail.com
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Awesome, thanks!
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <
> > > > > > niparisco@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Well I am creating a udf
> > > > > > > > > > > > good exercise
> > > > > > > > > > > > I hope a PR soon
> > > > > > > > > > > >
> > > > > > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > > > > > > > altekrusejason@gmail.com
> > > > > > > > > >:
> > > > > > > > > > > >
> > > > > > > > > > > > > I didn't realize that we were lacking this
> > > functionality.
> > > > > As
> > > > > > > the
> > > > > > > > > > > > > repeated_contains operator handles wildcards it
> makes
> > > > sense
> > > > > > to
> > > > > > > > add
> > > > > > > > > > > such a
> > > > > > > > > > > > > function to drill.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It should be simple to implement, would someone
> like
> > to
> > > > > open
> > > > > > a
> > > > > > > > JIRA
> > > > > > > > > > and
> > > > > > > > > > > > > submit a PR for this?
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Jason
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <
> > > > > > john@omernik.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I would like to see something like this as well,
> > even
> > > > if
> > > > > > it's
> > > > > > > > an
> > > > > > > > > > > > included
> > > > > > > > > > > > > > UDF like REGEX(field, pattern) using Java's
> library
> > > for
> > > > > > regex
> > > > > > > > > like
> > > > > > > > > > > Hive
> > > > > > > > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > > > > > > > > niparisco@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > > > > > > > > Drill neither.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ​Drill has SQL functions extension like
> > > > > > > "REPEATED_CONTAINS"​
> > > > > > > > > that
> > > > > > > > > > > > looks
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > handle regex. regex operator could be replaced
> > with
> > > > one
> > > > > > new
> > > > > > > > SQL
> > > > > > > > > > > > > > extension ?
> > > > > > > > > > > > > > > I guess I could create my own functions in
> java,
> > > > right
> > > > > ?
> > > > > > > > Maybe
> > > > > > > > > > push
> > > > > > > > > > > > it
> > > > > > > > > > > > > > into
> > > > > > > > > > > > > > > github then ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ​Sadly not, I'am looking for complex pattern
> > > > matching.
> > > > > ​
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Miura, Masahide
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > > > > From: Nicolas Paris [mailto:
> > niparisco@gmail.com]
> > > > > > > > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I can't find any reference in the
> documentation
> > > > > about a
> > > > > > > > regex
> > > > > > > > > > > > > operator.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I would like to be able to query this way :
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > SELECT *
> > > > > > > > > > > > > > > > FROM xxx
> > > > > > > > > > > > > > > > WHERE  text_field   regexOperator
> > > > 'regex_pattern';
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for helping,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by Nicolas Paris <ni...@gmail.com>.
Hi John,

They are actualy two jars to put in the folder (generated in /target). Have
you restarted drill after ?





2016-02-09 16:20 GMT+01:00 John Omernik <jo...@omernik.com>:

> Nicolas, not really sure what's happening here. it compiled fine, but when
> I run it I get this error. The jar is distributed to my bits, I validated
> that... it's in the DRILL_HOME/jars/3rdparty folder on every bit... do I
> need to do something more than that?
>
>
>
> select count(1) from view_myview where srcday = '2016-02-05' and
> contains(domain_name, 'com');
> Error: SYSTEM ERROR: IllegalArgumentException: resource
> /org/apache/drill/contrib/function/SimpleContains.java relative to
> org.apache.drill.contrib.function.SimpleContains not found.
>
> Fragment 1:44
>
> [Error Id: 30c11047-9896-4e16-a207-e3cce79c9db5 on node1:31010]
>
>   (java.lang.IllegalArgumentException) resource
> /org/apache/drill/contrib/function/SimpleContains.java relative to
> org.apache.drill.contrib.function.SimpleContains not found.
>     com.google.common.base.Preconditions.checkArgument():119
>     com.google.common.io.Resources.getResource():203
>     org.apache.drill.exec.expr.fn.FunctionInitializer.get():127
>     org.apache.drill.exec.expr.fn.FunctionInitializer.checkInit():99
>     org.apache.drill.exec.expr.fn.FunctionInitializer.getMethod():81
>     org.apache.drill.exec.expr.fn.DrillFuncHolder.meth():94
>     org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.setupBody():50
>     org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.renderEnd():80
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression():203
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression():1078
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():816
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():796
>     org.apache.drill.common.expression.FunctionHolderExpression.accept():47
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanAnd():690
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanOperator():172
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitBooleanOperator():1092
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():836
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():796
>     org.apache.drill.common.expression.BooleanOperator.accept():36
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitReturnValueExpression():551
>
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown():344
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown():1328
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():1027
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():796
>
>
> org.apache.drill.exec.physical.impl.filter.ReturnValueExpression.accept():56
>     org.apache.drill.exec.expr.EvaluationVisitor.addExpr():105
>     org.apache.drill.exec.expr.ClassGenerator.addExpr():227
>
>
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():187
>
>
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
>
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>
>
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema():100
>     org.apache.drill.exec.record.AbstractRecordBatch.next():142
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
>
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745 (state=,code=0)
>
> On Fri, Feb 5, 2016 at 2:39 AM, Nicolas Paris <ni...@gmail.com> wrote:
>
> > John,
> >
> > Sorry for that, this already work as expected.
> > Give it a try, this is so easy to deploy
> >
> > SELECT first_name FROM cp.`employee.json` WHERE
> contains(first_name,'\w+')
> > LIMIT 5;
> > first_name |
> > -----------|
> > Sheri      |
> > Derrick    |
> > Michael    |
> > Maya       |
> > Roberta    |
> >
> >
> > 2016-02-04 20:41 GMT+01:00 John Omernik <jo...@omernik.com>:
> >
> > > Ya, do you see where I am coming from here? Let's let the users submit
> > > regex in the pure form if possible, and code the nuances of java regex
> > > behind the scenes. I think it would be a great way to make Drill very
> > > accessible and desirable.  I think what happened in Hive is the regex
> > > commands started with the users having the escape and now there are
> just
> > to
> > > many things that using the escaped regex and the project doesn't want
> to
> > > adjust.
> > >
> > >
> > >
> > >
> > > On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris <ni...@gmail.com>
> > wrote:
> > >
> > > > You mean:
> > > > userRegex=>javaRegex
> > > > "\d" => "\\d"
> > > > "\w" => "\\w"
> > > > "\n" => "\n"
> > > > I can do that thanks to regex I guess.
> > > > I will give a try
> > > >
> > > >
> > > > 2016-02-04 19:37 GMT+01:00 John Omernik <jo...@omernik.com>:
> > > >
> > > > > So my question on the double escape, is there no way to handle that
> > so
> > > > the
> > > > > user can use single escaped regex? I know many folks who use big
> data
> > > > > platform to test large complex regexes for things like security
> > > > appliances,
> > > > > and having to convert the regex seems like a lot of work if you
> > > consider
> > > > > every user has to do that.  If there was a way to do it in Drill,
> > that
> > > > > would save countless people hours and save many mistakes.
> > > > >
> > > > > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <
> niparisco@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > John, Jason,
> > > > > >
> > > > > > 2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:
> > > > > >
> > > > > > > I'd be curios on how you are implemeting the regex... using
> > Java's
> > > > > regex
> > > > > > > libraries? etc.
> > > > > > >
> > > > > > ​Yeah, I use
> > > > > > java.util.regex
> > > > > > ​
> > > > > >
> > > > > >
> > > > > > > I know one thing with Hive that always bothered me was the need
> > to
> > > > > double
> > > > > > > escape things.
> > > > > > >
> > > > > > > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d'
> > of
> > > we
> > > > > can
> > > > > > > avoid that it would be AWESOME.
> > > > > > >
> > > > > > ​My guess is this comes from java way to handle strings. All
> > > langages I
> > > > > > have used need to double escape.​
> > > > > >
> > > > > >
> > > > > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> > > > > > altekrusejason@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > >
> > > > > > ​code is here: https://github.com/parisni/drill-simple-contains
> > > > > > It's disturbing how it is simple...
> > > > > > ​
> > > > > >
> > > > > >
> > > > > > > > I think you should actually just put the function in
> > > > > > > ​​
> > > > > > > Drill itself. System
> > > > > > > > native functions are implemented in the same interface as
> UDFs,
> > > > > because
> > > > > > > our
> > > > > > > > mechanism for evaluating them is very efficient (we code
> > generate
> > > > > code
> > > > > > > > blocks by linking together the bodies of the individual
> > functions
> > > > to
> > > > > > > > evaluate a complete expression).
> > > > > > >
> > > > > > ​well the folder tree is quite impressive (
> > > > > https://github.com/apache/drill
> > > > > > ).
> > > > > > ​
> > > > > >
> > > > > > ​what folder is supposed to be "
> > > > > > ​
> > > > > > Drill itself"
> > > > > > ​ ?​
> > > > > > ​
> > > > > >
> > > > > > > > You can open a JIRA, marking it a feature request. You can
> > open a
> > > > > poll
> > > > > > > > request against the apache github repo, making sure you
> follow
> > > the
> > > > > > > standard
> > > > > > > > format for your commit message, prefixing with the JIRA
> number
> > in
> > > > the
> > > > > > > > format
> > > > > > > > Example:
> > > > > > > > DRILL-XXXX: Feature description
> > > > > > > >
> > > > > > > > This will automatically link the PR to your JIRA.
> > > > > > >
> > > > > > ​Ok I will try thanks​
> > > > > >
> > > > > > ​a lot​
> > > > > >
> > > > > > > > - Jason
> > > > > > > >
> > > > > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <
> > > niparisco@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Jason, I have it working,
> > > > > > > > >
> > > > > > > > > Just tell me the way to proceed to PR.
> > > > > > > > > 1. where do I put my maven project ? Witch folder in my
> drill
> > > > > github
> > > > > > > > fork?
> > > > > > > > > 2. do I need a jira ? how proceed ?
> > > > > > > > >
> > > > > > > > > For now, I only published it on my github account in a
> > separate
> > > > > > project
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
> > > > > altekrusejason@gmail.com
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Awesome, thanks!
> > > > > > > > > >
> > > > > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <
> > > > > niparisco@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Well I am creating a udf
> > > > > > > > > > > good exercise
> > > > > > > > > > > I hope a PR soon
> > > > > > > > > > >
> > > > > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > > > > > > altekrusejason@gmail.com
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > I didn't realize that we were lacking this
> > functionality.
> > > > As
> > > > > > the
> > > > > > > > > > > > repeated_contains operator handles wildcards it makes
> > > sense
> > > > > to
> > > > > > > add
> > > > > > > > > > such a
> > > > > > > > > > > > function to drill.
> > > > > > > > > > > >
> > > > > > > > > > > > It should be simple to implement, would someone like
> to
> > > > open
> > > > > a
> > > > > > > JIRA
> > > > > > > > > and
> > > > > > > > > > > > submit a PR for this?
> > > > > > > > > > > >
> > > > > > > > > > > > - Jason
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <
> > > > > john@omernik.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I would like to see something like this as well,
> even
> > > if
> > > > > it's
> > > > > > > an
> > > > > > > > > > > included
> > > > > > > > > > > > > UDF like REGEX(field, pattern) using Java's library
> > for
> > > > > regex
> > > > > > > > like
> > > > > > > > > > Hive
> > > > > > > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > > > > > > > niparisco@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > > > > > > > Drill neither.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > ​Drill has SQL functions extension like
> > > > > > "REPEATED_CONTAINS"​
> > > > > > > > that
> > > > > > > > > > > looks
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > handle regex. regex operator could be replaced
> with
> > > one
> > > > > new
> > > > > > > SQL
> > > > > > > > > > > > > extension ?
> > > > > > > > > > > > > > I guess I could create my own functions in java,
> > > right
> > > > ?
> > > > > > > Maybe
> > > > > > > > > push
> > > > > > > > > > > it
> > > > > > > > > > > > > into
> > > > > > > > > > > > > > github then ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ​Sadly not, I'am looking for complex pattern
> > > matching.
> > > > ​
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Miura, Masahide
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > > > From: Nicolas Paris [mailto:
> niparisco@gmail.com]
> > > > > > > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I can't find any reference in the documentation
> > > > about a
> > > > > > > regex
> > > > > > > > > > > > operator.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I would like to be able to query this way :
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > SELECT *
> > > > > > > > > > > > > > > FROM xxx
> > > > > > > > > > > > > > > WHERE  text_field   regexOperator
> > > 'regex_pattern';
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for helping,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by John Omernik <jo...@omernik.com>.
Nicolas, not really sure what's happening here. it compiled fine, but when
I run it I get this error. The jar is distributed to my bits, I validated
that... it's in the DRILL_HOME/jars/3rdparty folder on every bit... do I
need to do something more than that?



select count(1) from view_myview where srcday = '2016-02-05' and
contains(domain_name, 'com');
Error: SYSTEM ERROR: IllegalArgumentException: resource
/org/apache/drill/contrib/function/SimpleContains.java relative to
org.apache.drill.contrib.function.SimpleContains not found.

Fragment 1:44

[Error Id: 30c11047-9896-4e16-a207-e3cce79c9db5 on node1:31010]

  (java.lang.IllegalArgumentException) resource
/org/apache/drill/contrib/function/SimpleContains.java relative to
org.apache.drill.contrib.function.SimpleContains not found.
    com.google.common.base.Preconditions.checkArgument():119
    com.google.common.io.Resources.getResource():203
    org.apache.drill.exec.expr.fn.FunctionInitializer.get():127
    org.apache.drill.exec.expr.fn.FunctionInitializer.checkInit():99
    org.apache.drill.exec.expr.fn.FunctionInitializer.getMethod():81
    org.apache.drill.exec.expr.fn.DrillFuncHolder.meth():94
    org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.setupBody():50
    org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.renderEnd():80

org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression():203

org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression():1078

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():816

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():796
    org.apache.drill.common.expression.FunctionHolderExpression.accept():47

org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanAnd():690

org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanOperator():172

org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitBooleanOperator():1092

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():836

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():796
    org.apache.drill.common.expression.BooleanOperator.accept():36

org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitReturnValueExpression():551

org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown():344

org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown():1328

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():1027

org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():796

org.apache.drill.exec.physical.impl.filter.ReturnValueExpression.accept():56
    org.apache.drill.exec.expr.EvaluationVisitor.addExpr():105
    org.apache.drill.exec.expr.ClassGenerator.addExpr():227

org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():187

org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109

org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema():100
    org.apache.drill.exec.record.AbstractRecordBatch.next():142
    org.apache.drill.exec.physical.impl.BaseRootExec.next():104

org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():415
    org.apache.hadoop.security.UserGroupInformation.doAs():1595
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1145
    java.util.concurrent.ThreadPoolExecutor$Worker.run():615
    java.lang.Thread.run():745 (state=,code=0)

On Fri, Feb 5, 2016 at 2:39 AM, Nicolas Paris <ni...@gmail.com> wrote:

> John,
>
> Sorry for that, this already work as expected.
> Give it a try, this is so easy to deploy
>
> SELECT first_name FROM cp.`employee.json` WHERE contains(first_name,'\w+')
> LIMIT 5;
> first_name |
> -----------|
> Sheri      |
> Derrick    |
> Michael    |
> Maya       |
> Roberta    |
>
>
> 2016-02-04 20:41 GMT+01:00 John Omernik <jo...@omernik.com>:
>
> > Ya, do you see where I am coming from here? Let's let the users submit
> > regex in the pure form if possible, and code the nuances of java regex
> > behind the scenes. I think it would be a great way to make Drill very
> > accessible and desirable.  I think what happened in Hive is the regex
> > commands started with the users having the escape and now there are just
> to
> > many things that using the escaped regex and the project doesn't want to
> > adjust.
> >
> >
> >
> >
> > On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris <ni...@gmail.com>
> wrote:
> >
> > > You mean:
> > > userRegex=>javaRegex
> > > "\d" => "\\d"
> > > "\w" => "\\w"
> > > "\n" => "\n"
> > > I can do that thanks to regex I guess.
> > > I will give a try
> > >
> > >
> > > 2016-02-04 19:37 GMT+01:00 John Omernik <jo...@omernik.com>:
> > >
> > > > So my question on the double escape, is there no way to handle that
> so
> > > the
> > > > user can use single escaped regex? I know many folks who use big data
> > > > platform to test large complex regexes for things like security
> > > appliances,
> > > > and having to convert the regex seems like a lot of work if you
> > consider
> > > > every user has to do that.  If there was a way to do it in Drill,
> that
> > > > would save countless people hours and save many mistakes.
> > > >
> > > > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <ni...@gmail.com>
> > > > wrote:
> > > >
> > > > > John, Jason,
> > > > >
> > > > > 2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:
> > > > >
> > > > > > I'd be curios on how you are implemeting the regex... using
> Java's
> > > > regex
> > > > > > libraries? etc.
> > > > > >
> > > > > ​Yeah, I use
> > > > > java.util.regex
> > > > > ​
> > > > >
> > > > >
> > > > > > I know one thing with Hive that always bothered me was the need
> to
> > > > double
> > > > > > escape things.
> > > > > >
> > > > > > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d'
> of
> > we
> > > > can
> > > > > > avoid that it would be AWESOME.
> > > > > >
> > > > > ​My guess is this comes from java way to handle strings. All
> > langages I
> > > > > have used need to double escape.​
> > > > >
> > > > >
> > > > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> > > > > altekrusejason@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > >
> > > > > ​code is here: https://github.com/parisni/drill-simple-contains
> > > > > It's disturbing how it is simple...
> > > > > ​
> > > > >
> > > > >
> > > > > > > I think you should actually just put the function in
> > > > > > ​​
> > > > > > Drill itself. System
> > > > > > > native functions are implemented in the same interface as UDFs,
> > > > because
> > > > > > our
> > > > > > > mechanism for evaluating them is very efficient (we code
> generate
> > > > code
> > > > > > > blocks by linking together the bodies of the individual
> functions
> > > to
> > > > > > > evaluate a complete expression).
> > > > > >
> > > > > ​well the folder tree is quite impressive (
> > > > https://github.com/apache/drill
> > > > > ).
> > > > > ​
> > > > >
> > > > > ​what folder is supposed to be "
> > > > > ​
> > > > > Drill itself"
> > > > > ​ ?​
> > > > > ​
> > > > >
> > > > > > > You can open a JIRA, marking it a feature request. You can
> open a
> > > > poll
> > > > > > > request against the apache github repo, making sure you follow
> > the
> > > > > > standard
> > > > > > > format for your commit message, prefixing with the JIRA number
> in
> > > the
> > > > > > > format
> > > > > > > Example:
> > > > > > > DRILL-XXXX: Feature description
> > > > > > >
> > > > > > > This will automatically link the PR to your JIRA.
> > > > > >
> > > > > ​Ok I will try thanks​
> > > > >
> > > > > ​a lot​
> > > > >
> > > > > > > - Jason
> > > > > > >
> > > > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <
> > niparisco@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Jason, I have it working,
> > > > > > > >
> > > > > > > > Just tell me the way to proceed to PR.
> > > > > > > > 1. where do I put my maven project ? Witch folder in my drill
> > > > github
> > > > > > > fork?
> > > > > > > > 2. do I need a jira ? how proceed ?
> > > > > > > >
> > > > > > > > For now, I only published it on my github account in a
> separate
> > > > > project
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >:
> > > > > > > >
> > > > > > > > > Awesome, thanks!
> > > > > > > > >
> > > > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <
> > > > niparisco@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Well I am creating a udf
> > > > > > > > > > good exercise
> > > > > > > > > > I hope a PR soon
> > > > > > > > > >
> > > > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > > > > > altekrusejason@gmail.com
> > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > I didn't realize that we were lacking this
> functionality.
> > > As
> > > > > the
> > > > > > > > > > > repeated_contains operator handles wildcards it makes
> > sense
> > > > to
> > > > > > add
> > > > > > > > > such a
> > > > > > > > > > > function to drill.
> > > > > > > > > > >
> > > > > > > > > > > It should be simple to implement, would someone like to
> > > open
> > > > a
> > > > > > JIRA
> > > > > > > > and
> > > > > > > > > > > submit a PR for this?
> > > > > > > > > > >
> > > > > > > > > > > - Jason
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <
> > > > john@omernik.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I would like to see something like this as well, even
> > if
> > > > it's
> > > > > > an
> > > > > > > > > > included
> > > > > > > > > > > > UDF like REGEX(field, pattern) using Java's library
> for
> > > > regex
> > > > > > > like
> > > > > > > > > Hive
> > > > > > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > > > > > > niparisco@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > > > > > > Drill neither.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > ​Drill has SQL functions extension like
> > > > > "REPEATED_CONTAINS"​
> > > > > > > that
> > > > > > > > > > looks
> > > > > > > > > > > > to
> > > > > > > > > > > > > handle regex. regex operator could be replaced with
> > one
> > > > new
> > > > > > SQL
> > > > > > > > > > > > extension ?
> > > > > > > > > > > > > I guess I could create my own functions in java,
> > right
> > > ?
> > > > > > Maybe
> > > > > > > > push
> > > > > > > > > > it
> > > > > > > > > > > > into
> > > > > > > > > > > > > github then ?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > ​Sadly not, I'am looking for complex pattern
> > matching.
> > > ​
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > > Miura, Masahide
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I can't find any reference in the documentation
> > > about a
> > > > > > regex
> > > > > > > > > > > operator.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I would like to be able to query this way :
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > SELECT *
> > > > > > > > > > > > > > FROM xxx
> > > > > > > > > > > > > > WHERE  text_field   regexOperator
> > 'regex_pattern';
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for helping,
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by Nicolas Paris <ni...@gmail.com>.
John,

Sorry for that, this already work as expected.
Give it a try, this is so easy to deploy

SELECT first_name FROM cp.`employee.json` WHERE contains(first_name,'\w+')
LIMIT 5;
first_name |
-----------|
Sheri      |
Derrick    |
Michael    |
Maya       |
Roberta    |


2016-02-04 20:41 GMT+01:00 John Omernik <jo...@omernik.com>:

> Ya, do you see where I am coming from here? Let's let the users submit
> regex in the pure form if possible, and code the nuances of java regex
> behind the scenes. I think it would be a great way to make Drill very
> accessible and desirable.  I think what happened in Hive is the regex
> commands started with the users having the escape and now there are just to
> many things that using the escaped regex and the project doesn't want to
> adjust.
>
>
>
>
> On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris <ni...@gmail.com> wrote:
>
> > You mean:
> > userRegex=>javaRegex
> > "\d" => "\\d"
> > "\w" => "\\w"
> > "\n" => "\n"
> > I can do that thanks to regex I guess.
> > I will give a try
> >
> >
> > 2016-02-04 19:37 GMT+01:00 John Omernik <jo...@omernik.com>:
> >
> > > So my question on the double escape, is there no way to handle that so
> > the
> > > user can use single escaped regex? I know many folks who use big data
> > > platform to test large complex regexes for things like security
> > appliances,
> > > and having to convert the regex seems like a lot of work if you
> consider
> > > every user has to do that.  If there was a way to do it in Drill, that
> > > would save countless people hours and save many mistakes.
> > >
> > > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <ni...@gmail.com>
> > > wrote:
> > >
> > > > John, Jason,
> > > >
> > > > 2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:
> > > >
> > > > > I'd be curios on how you are implemeting the regex... using Java's
> > > regex
> > > > > libraries? etc.
> > > > >
> > > > ​Yeah, I use
> > > > java.util.regex
> > > > ​
> > > >
> > > >
> > > > > I know one thing with Hive that always bothered me was the need to
> > > double
> > > > > escape things.
> > > > >
> > > > > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of
> we
> > > can
> > > > > avoid that it would be AWESOME.
> > > > >
> > > > ​My guess is this comes from java way to handle strings. All
> langages I
> > > > have used need to double escape.​
> > > >
> > > >
> > > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >
> > > > > wrote:
> > > >
> > > > ​code is here: https://github.com/parisni/drill-simple-contains
> > > > It's disturbing how it is simple...
> > > > ​
> > > >
> > > >
> > > > > > I think you should actually just put the function in
> > > > > ​​
> > > > > Drill itself. System
> > > > > > native functions are implemented in the same interface as UDFs,
> > > because
> > > > > our
> > > > > > mechanism for evaluating them is very efficient (we code generate
> > > code
> > > > > > blocks by linking together the bodies of the individual functions
> > to
> > > > > > evaluate a complete expression).
> > > > >
> > > > ​well the folder tree is quite impressive (
> > > https://github.com/apache/drill
> > > > ).
> > > > ​
> > > >
> > > > ​what folder is supposed to be "
> > > > ​
> > > > Drill itself"
> > > > ​ ?​
> > > > ​
> > > >
> > > > > > You can open a JIRA, marking it a feature request. You can open a
> > > poll
> > > > > > request against the apache github repo, making sure you follow
> the
> > > > > standard
> > > > > > format for your commit message, prefixing with the JIRA number in
> > the
> > > > > > format
> > > > > > Example:
> > > > > > DRILL-XXXX: Feature description
> > > > > >
> > > > > > This will automatically link the PR to your JIRA.
> > > > >
> > > > ​Ok I will try thanks​
> > > >
> > > > ​a lot​
> > > >
> > > > > > - Jason
> > > > > >
> > > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <
> niparisco@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Jason, I have it working,
> > > > > > >
> > > > > > > Just tell me the way to proceed to PR.
> > > > > > > 1. where do I put my maven project ? Witch folder in my drill
> > > github
> > > > > > fork?
> > > > > > > 2. do I need a jira ? how proceed ?
> > > > > > >
> > > > > > > For now, I only published it on my github account in a separate
> > > > project
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
> > > altekrusejason@gmail.com
> > > > >:
> > > > > > >
> > > > > > > > Awesome, thanks!
> > > > > > > >
> > > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <
> > > niparisco@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Well I am creating a udf
> > > > > > > > > good exercise
> > > > > > > > > I hope a PR soon
> > > > > > > > >
> > > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > > > > altekrusejason@gmail.com
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > I didn't realize that we were lacking this functionality.
> > As
> > > > the
> > > > > > > > > > repeated_contains operator handles wildcards it makes
> sense
> > > to
> > > > > add
> > > > > > > > such a
> > > > > > > > > > function to drill.
> > > > > > > > > >
> > > > > > > > > > It should be simple to implement, would someone like to
> > open
> > > a
> > > > > JIRA
> > > > > > > and
> > > > > > > > > > submit a PR for this?
> > > > > > > > > >
> > > > > > > > > > - Jason
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <
> > > john@omernik.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I would like to see something like this as well, even
> if
> > > it's
> > > > > an
> > > > > > > > > included
> > > > > > > > > > > UDF like REGEX(field, pattern) using Java's library for
> > > regex
> > > > > > like
> > > > > > > > Hive
> > > > > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > > > > > niparisco@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > > > > > Drill neither.
> > > > > > > > > > > > >
> > > > > > > > > > > > ​Drill has SQL functions extension like
> > > > "REPEATED_CONTAINS"​
> > > > > > that
> > > > > > > > > looks
> > > > > > > > > > > to
> > > > > > > > > > > > handle regex. regex operator could be replaced with
> one
> > > new
> > > > > SQL
> > > > > > > > > > > extension ?
> > > > > > > > > > > > I guess I could create my own functions in java,
> right
> > ?
> > > > > Maybe
> > > > > > > push
> > > > > > > > > it
> > > > > > > > > > > into
> > > > > > > > > > > > github then ?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > ​Sadly not, I'am looking for complex pattern
> matching.
> > ​
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > > Miura, Masahide
> > > > > > > > > > > > >
> > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hello,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I can't find any reference in the documentation
> > about a
> > > > > regex
> > > > > > > > > > operator.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I would like to be able to query this way :
> > > > > > > > > > > > >
> > > > > > > > > > > > > SELECT *
> > > > > > > > > > > > > FROM xxx
> > > > > > > > > > > > > WHERE  text_field   regexOperator
> 'regex_pattern';
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for helping,
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by John Omernik <jo...@omernik.com>.
Ya, do you see where I am coming from here? Let's let the users submit
regex in the pure form if possible, and code the nuances of java regex
behind the scenes. I think it would be a great way to make Drill very
accessible and desirable.  I think what happened in Hive is the regex
commands started with the users having the escape and now there are just to
many things that using the escaped regex and the project doesn't want to
adjust.




On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris <ni...@gmail.com> wrote:

> You mean:
> userRegex=>javaRegex
> "\d" => "\\d"
> "\w" => "\\w"
> "\n" => "\n"
> I can do that thanks to regex I guess.
> I will give a try
>
>
> 2016-02-04 19:37 GMT+01:00 John Omernik <jo...@omernik.com>:
>
> > So my question on the double escape, is there no way to handle that so
> the
> > user can use single escaped regex? I know many folks who use big data
> > platform to test large complex regexes for things like security
> appliances,
> > and having to convert the regex seems like a lot of work if you consider
> > every user has to do that.  If there was a way to do it in Drill, that
> > would save countless people hours and save many mistakes.
> >
> > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <ni...@gmail.com>
> > wrote:
> >
> > > John, Jason,
> > >
> > > 2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:
> > >
> > > > I'd be curios on how you are implemeting the regex... using Java's
> > regex
> > > > libraries? etc.
> > > >
> > > ​Yeah, I use
> > > java.util.regex
> > > ​
> > >
> > >
> > > > I know one thing with Hive that always bothered me was the need to
> > double
> > > > escape things.
> > > >
> > > > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of we
> > can
> > > > avoid that it would be AWESOME.
> > > >
> > > ​My guess is this comes from java way to handle strings. All langages I
> > > have used need to double escape.​
> > >
> > >
> > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> > > altekrusejason@gmail.com
> > > > >
> > > > wrote:
> > >
> > > ​code is here: https://github.com/parisni/drill-simple-contains
> > > It's disturbing how it is simple...
> > > ​
> > >
> > >
> > > > > I think you should actually just put the function in
> > > > ​​
> > > > Drill itself. System
> > > > > native functions are implemented in the same interface as UDFs,
> > because
> > > > our
> > > > > mechanism for evaluating them is very efficient (we code generate
> > code
> > > > > blocks by linking together the bodies of the individual functions
> to
> > > > > evaluate a complete expression).
> > > >
> > > ​well the folder tree is quite impressive (
> > https://github.com/apache/drill
> > > ).
> > > ​
> > >
> > > ​what folder is supposed to be "
> > > ​
> > > Drill itself"
> > > ​ ?​
> > > ​
> > >
> > > > > You can open a JIRA, marking it a feature request. You can open a
> > poll
> > > > > request against the apache github repo, making sure you follow the
> > > > standard
> > > > > format for your commit message, prefixing with the JIRA number in
> the
> > > > > format
> > > > > Example:
> > > > > DRILL-XXXX: Feature description
> > > > >
> > > > > This will automatically link the PR to your JIRA.
> > > >
> > > ​Ok I will try thanks​
> > >
> > > ​a lot​
> > >
> > > > > - Jason
> > > > >
> > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <niparisco@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Jason, I have it working,
> > > > > >
> > > > > > Just tell me the way to proceed to PR.
> > > > > > 1. where do I put my maven project ? Witch folder in my drill
> > github
> > > > > fork?
> > > > > > 2. do I need a jira ? how proceed ?
> > > > > >
> > > > > > For now, I only published it on my github account in a separate
> > > project
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
> > altekrusejason@gmail.com
> > > >:
> > > > > >
> > > > > > > Awesome, thanks!
> > > > > > >
> > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <
> > niparisco@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Well I am creating a udf
> > > > > > > > good exercise
> > > > > > > > I hope a PR soon
> > > > > > > >
> > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >:
> > > > > > > >
> > > > > > > > > I didn't realize that we were lacking this functionality.
> As
> > > the
> > > > > > > > > repeated_contains operator handles wildcards it makes sense
> > to
> > > > add
> > > > > > > such a
> > > > > > > > > function to drill.
> > > > > > > > >
> > > > > > > > > It should be simple to implement, would someone like to
> open
> > a
> > > > JIRA
> > > > > > and
> > > > > > > > > submit a PR for this?
> > > > > > > > >
> > > > > > > > > - Jason
> > > > > > > > >
> > > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <
> > john@omernik.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I would like to see something like this as well, even if
> > it's
> > > > an
> > > > > > > > included
> > > > > > > > > > UDF like REGEX(field, pattern) using Java's library for
> > regex
> > > > > like
> > > > > > > Hive
> > > > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > > > > niparisco@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > > > > Drill neither.
> > > > > > > > > > > >
> > > > > > > > > > > ​Drill has SQL functions extension like
> > > "REPEATED_CONTAINS"​
> > > > > that
> > > > > > > > looks
> > > > > > > > > > to
> > > > > > > > > > > handle regex. regex operator could be replaced with one
> > new
> > > > SQL
> > > > > > > > > > extension ?
> > > > > > > > > > > I guess I could create my own functions in java, right
> ?
> > > > Maybe
> > > > > > push
> > > > > > > > it
> > > > > > > > > > into
> > > > > > > > > > > github then ?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > ​Sadly not, I'am looking for complex pattern matching.
> ​
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > > Miura, Masahide
> > > > > > > > > > > >
> > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > > > >
> > > > > > > > > > > > Hello,
> > > > > > > > > > > >
> > > > > > > > > > > > I can't find any reference in the documentation
> about a
> > > > regex
> > > > > > > > > operator.
> > > > > > > > > > > >
> > > > > > > > > > > > I would like to be able to query this way :
> > > > > > > > > > > >
> > > > > > > > > > > > SELECT *
> > > > > > > > > > > > FROM xxx
> > > > > > > > > > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for helping,
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by Nicolas Paris <ni...@gmail.com>.
You mean:
userRegex=>javaRegex
"\d" => "\\d"
"\w" => "\\w"
"\n" => "\n"
I can do that thanks to regex I guess.
I will give a try


2016-02-04 19:37 GMT+01:00 John Omernik <jo...@omernik.com>:

> So my question on the double escape, is there no way to handle that so the
> user can use single escaped regex? I know many folks who use big data
> platform to test large complex regexes for things like security appliances,
> and having to convert the regex seems like a lot of work if you consider
> every user has to do that.  If there was a way to do it in Drill, that
> would save countless people hours and save many mistakes.
>
> On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <ni...@gmail.com>
> wrote:
>
> > John, Jason,
> >
> > 2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:
> >
> > > I'd be curios on how you are implemeting the regex... using Java's
> regex
> > > libraries? etc.
> > >
> > ​Yeah, I use
> > java.util.regex
> > ​
> >
> >
> > > I know one thing with Hive that always bothered me was the need to
> double
> > > escape things.
> > >
> > > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of we
> can
> > > avoid that it would be AWESOME.
> > >
> > ​My guess is this comes from java way to handle strings. All langages I
> > have used need to double escape.​
> >
> >
> > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> > altekrusejason@gmail.com
> > > >
> > > wrote:
> >
> > ​code is here: https://github.com/parisni/drill-simple-contains
> > It's disturbing how it is simple...
> > ​
> >
> >
> > > > I think you should actually just put the function in
> > > ​​
> > > Drill itself. System
> > > > native functions are implemented in the same interface as UDFs,
> because
> > > our
> > > > mechanism for evaluating them is very efficient (we code generate
> code
> > > > blocks by linking together the bodies of the individual functions to
> > > > evaluate a complete expression).
> > >
> > ​well the folder tree is quite impressive (
> https://github.com/apache/drill
> > ).
> > ​
> >
> > ​what folder is supposed to be "
> > ​
> > Drill itself"
> > ​ ?​
> > ​
> >
> > > > You can open a JIRA, marking it a feature request. You can open a
> poll
> > > > request against the apache github repo, making sure you follow the
> > > standard
> > > > format for your commit message, prefixing with the JIRA number in the
> > > > format
> > > > Example:
> > > > DRILL-XXXX: Feature description
> > > >
> > > > This will automatically link the PR to your JIRA.
> > >
> > ​Ok I will try thanks​
> >
> > ​a lot​
> >
> > > > - Jason
> > > >
> > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <ni...@gmail.com>
> > > wrote:
> > > >
> > > > > Jason, I have it working,
> > > > >
> > > > > Just tell me the way to proceed to PR.
> > > > > 1. where do I put my maven project ? Witch folder in my drill
> github
> > > > fork?
> > > > > 2. do I need a jira ? how proceed ?
> > > > >
> > > > > For now, I only published it on my github account in a separate
> > project
> > > > >
> > > > > Thanks
> > > > >
> > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
> altekrusejason@gmail.com
> > >:
> > > > >
> > > > > > Awesome, thanks!
> > > > > >
> > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <
> niparisco@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Well I am creating a udf
> > > > > > > good exercise
> > > > > > > I hope a PR soon
> > > > > > >
> > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > > altekrusejason@gmail.com
> > > > >:
> > > > > > >
> > > > > > > > I didn't realize that we were lacking this functionality. As
> > the
> > > > > > > > repeated_contains operator handles wildcards it makes sense
> to
> > > add
> > > > > > such a
> > > > > > > > function to drill.
> > > > > > > >
> > > > > > > > It should be simple to implement, would someone like to open
> a
> > > JIRA
> > > > > and
> > > > > > > > submit a PR for this?
> > > > > > > >
> > > > > > > > - Jason
> > > > > > > >
> > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <
> john@omernik.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > I would like to see something like this as well, even if
> it's
> > > an
> > > > > > > included
> > > > > > > > > UDF like REGEX(field, pattern) using Java's library for
> regex
> > > > like
> > > > > > Hive
> > > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > > > niparisco@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > > > Drill neither.
> > > > > > > > > > >
> > > > > > > > > > ​Drill has SQL functions extension like
> > "REPEATED_CONTAINS"​
> > > > that
> > > > > > > looks
> > > > > > > > > to
> > > > > > > > > > handle regex. regex operator could be replaced with one
> new
> > > SQL
> > > > > > > > > extension ?
> > > > > > > > > > I guess I could create my own functions in java, right ?
> > > Maybe
> > > > > push
> > > > > > > it
> > > > > > > > > into
> > > > > > > > > > github then ?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ​Sadly not, I'am looking for complex pattern matching. ​
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > > Miura, Masahide
> > > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > > >
> > > > > > > > > > > Hello,
> > > > > > > > > > >
> > > > > > > > > > > I can't find any reference in the documentation about a
> > > regex
> > > > > > > > operator.
> > > > > > > > > > >
> > > > > > > > > > > I would like to be able to query this way :
> > > > > > > > > > >
> > > > > > > > > > > SELECT *
> > > > > > > > > > > FROM xxx
> > > > > > > > > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > > > > > > > > >
> > > > > > > > > > > Thanks for helping,
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by John Omernik <jo...@omernik.com>.
So my question on the double escape, is there no way to handle that so the
user can use single escaped regex? I know many folks who use big data
platform to test large complex regexes for things like security appliances,
and having to convert the regex seems like a lot of work if you consider
every user has to do that.  If there was a way to do it in Drill, that
would save countless people hours and save many mistakes.

On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <ni...@gmail.com> wrote:

> John, Jason,
>
> 2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:
>
> > I'd be curios on how you are implemeting the regex... using Java's regex
> > libraries? etc.
> >
> ​Yeah, I use
> java.util.regex
> ​
>
>
> > I know one thing with Hive that always bothered me was the need to double
> > escape things.
> >
> > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of we can
> > avoid that it would be AWESOME.
> >
> ​My guess is this comes from java way to handle strings. All langages I
> have used need to double escape.​
>
>
> > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> altekrusejason@gmail.com
> > >
> > wrote:
>
> ​code is here: https://github.com/parisni/drill-simple-contains
> It's disturbing how it is simple...
> ​
>
>
> > > I think you should actually just put the function in
> > ​​
> > Drill itself. System
> > > native functions are implemented in the same interface as UDFs, because
> > our
> > > mechanism for evaluating them is very efficient (we code generate code
> > > blocks by linking together the bodies of the individual functions to
> > > evaluate a complete expression).
> >
> ​well the folder tree is quite impressive (https://github.com/apache/drill
> ).
> ​
>
> ​what folder is supposed to be "
> ​
> Drill itself"
> ​ ?​
> ​
>
> > > You can open a JIRA, marking it a feature request. You can open a poll
> > > request against the apache github repo, making sure you follow the
> > standard
> > > format for your commit message, prefixing with the JIRA number in the
> > > format
> > > Example:
> > > DRILL-XXXX: Feature description
> > >
> > > This will automatically link the PR to your JIRA.
> >
> ​Ok I will try thanks​
>
> ​a lot​
>
> > > - Jason
> > >
> > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <ni...@gmail.com>
> > wrote:
> > >
> > > > Jason, I have it working,
> > > >
> > > > Just tell me the way to proceed to PR.
> > > > 1. where do I put my maven project ? Witch folder in my drill github
> > > fork?
> > > > 2. do I need a jira ? how proceed ?
> > > >
> > > > For now, I only published it on my github account in a separate
> project
> > > >
> > > > Thanks
> > > >
> > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <altekrusejason@gmail.com
> >:
> > > >
> > > > > Awesome, thanks!
> > > > >
> > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <niparisco@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Well I am creating a udf
> > > > > > good exercise
> > > > > > I hope a PR soon
> > > > > >
> > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > altekrusejason@gmail.com
> > > >:
> > > > > >
> > > > > > > I didn't realize that we were lacking this functionality. As
> the
> > > > > > > repeated_contains operator handles wildcards it makes sense to
> > add
> > > > > such a
> > > > > > > function to drill.
> > > > > > >
> > > > > > > It should be simple to implement, would someone like to open a
> > JIRA
> > > > and
> > > > > > > submit a PR for this?
> > > > > > >
> > > > > > > - Jason
> > > > > > >
> > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <john@omernik.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > I would like to see something like this as well, even if it's
> > an
> > > > > > included
> > > > > > > > UDF like REGEX(field, pattern) using Java's library for regex
> > > like
> > > > > Hive
> > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > > niparisco@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > > Drill neither.
> > > > > > > > > >
> > > > > > > > > ​Drill has SQL functions extension like
> "REPEATED_CONTAINS"​
> > > that
> > > > > > looks
> > > > > > > > to
> > > > > > > > > handle regex. regex operator could be replaced with one new
> > SQL
> > > > > > > > extension ?
> > > > > > > > > I guess I could create my own functions in java, right ?
> > Maybe
> > > > push
> > > > > > it
> > > > > > > > into
> > > > > > > > > github then ?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > ​Sadly not, I'am looking for complex pattern matching. ​
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > > Miura, Masahide
> > > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > I can't find any reference in the documentation about a
> > regex
> > > > > > > operator.
> > > > > > > > > >
> > > > > > > > > > I would like to be able to query this way :
> > > > > > > > > >
> > > > > > > > > > SELECT *
> > > > > > > > > > FROM xxx
> > > > > > > > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > > > > > > > >
> > > > > > > > > > Thanks for helping,
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by Nicolas Paris <ni...@gmail.com>.
John, Jason,

2016-02-04 18:47 GMT+01:00 John Omernik <jo...@omernik.com>:

> I'd be curios on how you are implemeting the regex... using Java's regex
> libraries? etc.
>
​Yeah, I use
java.util.regex
​


> I know one thing with Hive that always bothered me was the need to double
> escape things.
>
> '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of we can
> avoid that it would be AWESOME.
>
​My guess is this comes from java way to handle strings. All langages I
have used need to double escape.​


> On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <altekrusejason@gmail.com
> >
> wrote:

​code is here: https://github.com/parisni/drill-simple-contains
It's disturbing how it is simple...
​


> > I think you should actually just put the function in
> ​​
> Drill itself. System
> > native functions are implemented in the same interface as UDFs, because
> our
> > mechanism for evaluating them is very efficient (we code generate code
> > blocks by linking together the bodies of the individual functions to
> > evaluate a complete expression).
>
​well the folder tree is quite impressive (https://github.com/apache/drill).
​

​what folder is supposed to be "
​
Drill itself"
​ ?​
​

> > You can open a JIRA, marking it a feature request. You can open a poll
> > request against the apache github repo, making sure you follow the
> standard
> > format for your commit message, prefixing with the JIRA number in the
> > format
> > Example:
> > DRILL-XXXX: Feature description
> >
> > This will automatically link the PR to your JIRA.
>
​Ok I will try thanks​

​a lot​

> > - Jason
> >
> > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <ni...@gmail.com>
> wrote:
> >
> > > Jason, I have it working,
> > >
> > > Just tell me the way to proceed to PR.
> > > 1. where do I put my maven project ? Witch folder in my drill github
> > fork?
> > > 2. do I need a jira ? how proceed ?
> > >
> > > For now, I only published it on my github account in a separate project
> > >
> > > Thanks
> > >
> > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <al...@gmail.com>:
> > >
> > > > Awesome, thanks!
> > > >
> > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <ni...@gmail.com>
> > > wrote:
> > > >
> > > > > Well I am creating a udf
> > > > > good exercise
> > > > > I hope a PR soon
> > > > >
> > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> altekrusejason@gmail.com
> > >:
> > > > >
> > > > > > I didn't realize that we were lacking this functionality. As the
> > > > > > repeated_contains operator handles wildcards it makes sense to
> add
> > > > such a
> > > > > > function to drill.
> > > > > >
> > > > > > It should be simple to implement, would someone like to open a
> JIRA
> > > and
> > > > > > submit a PR for this?
> > > > > >
> > > > > > - Jason
> > > > > >
> > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <jo...@omernik.com>
> > > wrote:
> > > > > >
> > > > > > > I would like to see something like this as well, even if it's
> an
> > > > > included
> > > > > > > UDF like REGEX(field, pattern) using Java's library for regex
> > like
> > > > Hive
> > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > niparisco@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > Drill neither.
> > > > > > > > >
> > > > > > > > ​Drill has SQL functions extension like "REPEATED_CONTAINS"​
> > that
> > > > > looks
> > > > > > > to
> > > > > > > > handle regex. regex operator could be replaced with one new
> SQL
> > > > > > > extension ?
> > > > > > > > I guess I could create my own functions in java, right ?
> Maybe
> > > push
> > > > > it
> > > > > > > into
> > > > > > > > github then ?
> > > > > > > >
> > > > > > > >
> > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > >
> > > > > > > >
> > > > > > > > ​Sadly not, I'am looking for complex pattern matching. ​
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > > Miura, Masahide
> > > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > To: user@drill.apache.org
> > > > > > > > > Subject: REGEX search Operator
> > > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > I can't find any reference in the documentation about a
> regex
> > > > > > operator.
> > > > > > > > >
> > > > > > > > > I would like to be able to query this way :
> > > > > > > > >
> > > > > > > > > SELECT *
> > > > > > > > > FROM xxx
> > > > > > > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > > > > > > >
> > > > > > > > > Thanks for helping,
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by John Omernik <jo...@omernik.com>.
I'd be curios on how you are implemeting the regex... using Java's regex
libraries? etc.

I know one thing with Hive that always bothered me was the need to double
escape things.

'\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of we can
avoid that it would be AWESOME.

On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <al...@gmail.com>
wrote:

> I think you should actually just put the function in Drill itself. System
> native functions are implemented in the same interface as UDFs, because our
> mechanism for evaluating them is very efficient (we code generate code
> blocks by linking together the bodies of the individual functions to
> evaluate a complete expression).
>
> You can open a JIRA, marking it a feature request. You can open a poll
> request against the apache github repo, making sure you follow the standard
> format for your commit message, prefixing with the JIRA number in the
> format
> Example:
> DRILL-XXXX: Feature description
>
> This will automatically link the PR to your JIRA.
>
> - Jason
>
> On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <ni...@gmail.com> wrote:
>
> > Jason, I have it working,
> >
> > Just tell me the way to proceed to PR.
> > 1. where do I put my maven project ? Witch folder in my drill github
> fork?
> > 2. do I need a jira ? how proceed ?
> >
> > For now, I only published it on my github account in a separate project
> >
> > Thanks
> >
> > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <al...@gmail.com>:
> >
> > > Awesome, thanks!
> > >
> > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <ni...@gmail.com>
> > wrote:
> > >
> > > > Well I am creating a udf
> > > > good exercise
> > > > I hope a PR soon
> > > >
> > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <altekrusejason@gmail.com
> >:
> > > >
> > > > > I didn't realize that we were lacking this functionality. As the
> > > > > repeated_contains operator handles wildcards it makes sense to add
> > > such a
> > > > > function to drill.
> > > > >
> > > > > It should be simple to implement, would someone like to open a JIRA
> > and
> > > > > submit a PR for this?
> > > > >
> > > > > - Jason
> > > > >
> > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <jo...@omernik.com>
> > wrote:
> > > > >
> > > > > > I would like to see something like this as well, even if it's an
> > > > included
> > > > > > UDF like REGEX(field, pattern) using Java's library for regex
> like
> > > Hive
> > > > > > does.  That would be EXTREMELY helpful.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> niparisco@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > Drill neither.
> > > > > > > >
> > > > > > > ​Drill has SQL functions extension like "REPEATED_CONTAINS"​
> that
> > > > looks
> > > > > > to
> > > > > > > handle regex. regex operator could be replaced with one new SQL
> > > > > > extension ?
> > > > > > > I guess I could create my own functions in java, right ? Maybe
> > push
> > > > it
> > > > > > into
> > > > > > > github then ?
> > > > > > >
> > > > > > >
> > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > >
> > > > > > >
> > > > > > > ​Sadly not, I'am looking for complex pattern matching. ​
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > > Miura, Masahide
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > To: user@drill.apache.org
> > > > > > > > Subject: REGEX search Operator
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I can't find any reference in the documentation about a regex
> > > > > operator.
> > > > > > > >
> > > > > > > > I would like to be able to query this way :
> > > > > > > >
> > > > > > > > SELECT *
> > > > > > > > FROM xxx
> > > > > > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > > > > > >
> > > > > > > > Thanks for helping,
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by Jason Altekruse <al...@gmail.com>.
I think you should actually just put the function in Drill itself. System
native functions are implemented in the same interface as UDFs, because our
mechanism for evaluating them is very efficient (we code generate code
blocks by linking together the bodies of the individual functions to
evaluate a complete expression).

You can open a JIRA, marking it a feature request. You can open a poll
request against the apache github repo, making sure you follow the standard
format for your commit message, prefixing with the JIRA number in the format
Example:
DRILL-XXXX: Feature description

This will automatically link the PR to your JIRA.

- Jason

On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <ni...@gmail.com> wrote:

> Jason, I have it working,
>
> Just tell me the way to proceed to PR.
> 1. where do I put my maven project ? Witch folder in my drill github fork?
> 2. do I need a jira ? how proceed ?
>
> For now, I only published it on my github account in a separate project
>
> Thanks
>
> 2016-02-04 16:52 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>
> > Awesome, thanks!
> >
> > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <ni...@gmail.com>
> wrote:
> >
> > > Well I am creating a udf
> > > good exercise
> > > I hope a PR soon
> > >
> > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <al...@gmail.com>:
> > >
> > > > I didn't realize that we were lacking this functionality. As the
> > > > repeated_contains operator handles wildcards it makes sense to add
> > such a
> > > > function to drill.
> > > >
> > > > It should be simple to implement, would someone like to open a JIRA
> and
> > > > submit a PR for this?
> > > >
> > > > - Jason
> > > >
> > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <jo...@omernik.com>
> wrote:
> > > >
> > > > > I would like to see something like this as well, even if it's an
> > > included
> > > > > UDF like REGEX(field, pattern) using Java's library for regex like
> > Hive
> > > > > does.  That would be EXTREMELY helpful.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <niparisco@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > Drill neither.
> > > > > > >
> > > > > > ​Drill has SQL functions extension like "REPEATED_CONTAINS"​ that
> > > looks
> > > > > to
> > > > > > handle regex. regex operator could be replaced with one new SQL
> > > > > extension ?
> > > > > > I guess I could create my own functions in java, right ? Maybe
> push
> > > it
> > > > > into
> > > > > > github then ?
> > > > > >
> > > > > >
> > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > >
> > > > > >
> > > > > > ​Sadly not, I'am looking for complex pattern matching. ​
> > > > > >
> > > > > >
> > > > > > --
> > > > > > > Miura, Masahide
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > To: user@drill.apache.org
> > > > > > > Subject: REGEX search Operator
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I can't find any reference in the documentation about a regex
> > > > operator.
> > > > > > >
> > > > > > > I would like to be able to query this way :
> > > > > > >
> > > > > > > SELECT *
> > > > > > > FROM xxx
> > > > > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > > > > >
> > > > > > > Thanks for helping,
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by Nicolas Paris <ni...@gmail.com>.
Jason, I have it working,

Just tell me the way to proceed to PR.
1. where do I put my maven project ? Witch folder in my drill github fork?
2. do I need a jira ? how proceed ?

For now, I only published it on my github account in a separate project

Thanks

2016-02-04 16:52 GMT+01:00 Jason Altekruse <al...@gmail.com>:

> Awesome, thanks!
>
> On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <ni...@gmail.com> wrote:
>
> > Well I am creating a udf
> > good exercise
> > I hope a PR soon
> >
> > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <al...@gmail.com>:
> >
> > > I didn't realize that we were lacking this functionality. As the
> > > repeated_contains operator handles wildcards it makes sense to add
> such a
> > > function to drill.
> > >
> > > It should be simple to implement, would someone like to open a JIRA and
> > > submit a PR for this?
> > >
> > > - Jason
> > >
> > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <jo...@omernik.com> wrote:
> > >
> > > > I would like to see something like this as well, even if it's an
> > included
> > > > UDF like REGEX(field, pattern) using Java's library for regex like
> Hive
> > > > does.  That would be EXTREMELY helpful.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <ni...@gmail.com>
> > > wrote:
> > > >
> > > > > > ANSI SQL doesn't define regex operator.
> > > > > > Drill neither.
> > > > > >
> > > > > ​Drill has SQL functions extension like "REPEATED_CONTAINS"​ that
> > looks
> > > > to
> > > > > handle regex. regex operator could be replaced with one new SQL
> > > > extension ?
> > > > > I guess I could create my own functions in java, right ? Maybe push
> > it
> > > > into
> > > > > github then ?
> > > > >
> > > > >
> > > > > > Doesn't it enough 'LIKE' operator?
> > > > > >
> > > > >
> > > > > ​Sadly not, I'am looking for complex pattern matching. ​
> > > > >
> > > > >
> > > > > --
> > > > > > Miura, Masahide
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > To: user@drill.apache.org
> > > > > > Subject: REGEX search Operator
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I can't find any reference in the documentation about a regex
> > > operator.
> > > > > >
> > > > > > I would like to be able to query this way :
> > > > > >
> > > > > > SELECT *
> > > > > > FROM xxx
> > > > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > > > >
> > > > > > Thanks for helping,
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by Jason Altekruse <al...@gmail.com>.
Awesome, thanks!

On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <ni...@gmail.com> wrote:

> Well I am creating a udf
> good exercise
> I hope a PR soon
>
> 2016-02-04 16:37 GMT+01:00 Jason Altekruse <al...@gmail.com>:
>
> > I didn't realize that we were lacking this functionality. As the
> > repeated_contains operator handles wildcards it makes sense to add such a
> > function to drill.
> >
> > It should be simple to implement, would someone like to open a JIRA and
> > submit a PR for this?
> >
> > - Jason
> >
> > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <jo...@omernik.com> wrote:
> >
> > > I would like to see something like this as well, even if it's an
> included
> > > UDF like REGEX(field, pattern) using Java's library for regex like Hive
> > > does.  That would be EXTREMELY helpful.
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <ni...@gmail.com>
> > wrote:
> > >
> > > > > ANSI SQL doesn't define regex operator.
> > > > > Drill neither.
> > > > >
> > > > ​Drill has SQL functions extension like "REPEATED_CONTAINS"​ that
> looks
> > > to
> > > > handle regex. regex operator could be replaced with one new SQL
> > > extension ?
> > > > I guess I could create my own functions in java, right ? Maybe push
> it
> > > into
> > > > github then ?
> > > >
> > > >
> > > > > Doesn't it enough 'LIKE' operator?
> > > > >
> > > >
> > > > ​Sadly not, I'am looking for complex pattern matching. ​
> > > >
> > > >
> > > > --
> > > > > Miura, Masahide
> > > > >
> > > > > -----Original Message-----
> > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > To: user@drill.apache.org
> > > > > Subject: REGEX search Operator
> > > > >
> > > > > Hello,
> > > > >
> > > > > I can't find any reference in the documentation about a regex
> > operator.
> > > > >
> > > > > I would like to be able to query this way :
> > > > >
> > > > > SELECT *
> > > > > FROM xxx
> > > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > > >
> > > > > Thanks for helping,
> > > > >
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by Nicolas Paris <ni...@gmail.com>.
Well I am creating a udf
good exercise
I hope a PR soon

2016-02-04 16:37 GMT+01:00 Jason Altekruse <al...@gmail.com>:

> I didn't realize that we were lacking this functionality. As the
> repeated_contains operator handles wildcards it makes sense to add such a
> function to drill.
>
> It should be simple to implement, would someone like to open a JIRA and
> submit a PR for this?
>
> - Jason
>
> On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <jo...@omernik.com> wrote:
>
> > I would like to see something like this as well, even if it's an included
> > UDF like REGEX(field, pattern) using Java's library for regex like Hive
> > does.  That would be EXTREMELY helpful.
> >
> >
> >
> >
> >
> > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <ni...@gmail.com>
> wrote:
> >
> > > > ANSI SQL doesn't define regex operator.
> > > > Drill neither.
> > > >
> > > ​Drill has SQL functions extension like "REPEATED_CONTAINS"​ that looks
> > to
> > > handle regex. regex operator could be replaced with one new SQL
> > extension ?
> > > I guess I could create my own functions in java, right ? Maybe push it
> > into
> > > github then ?
> > >
> > >
> > > > Doesn't it enough 'LIKE' operator?
> > > >
> > >
> > > ​Sadly not, I'am looking for complex pattern matching. ​
> > >
> > >
> > > --
> > > > Miura, Masahide
> > > >
> > > > -----Original Message-----
> > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > To: user@drill.apache.org
> > > > Subject: REGEX search Operator
> > > >
> > > > Hello,
> > > >
> > > > I can't find any reference in the documentation about a regex
> operator.
> > > >
> > > > I would like to be able to query this way :
> > > >
> > > > SELECT *
> > > > FROM xxx
> > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > >
> > > > Thanks for helping,
> > > >
> > >
> >
>

Re: REGEX search Operator

Posted by Jason Altekruse <al...@gmail.com>.
I didn't realize that we were lacking this functionality. As the
repeated_contains operator handles wildcards it makes sense to add such a
function to drill.

It should be simple to implement, would someone like to open a JIRA and
submit a PR for this?

- Jason

On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <jo...@omernik.com> wrote:

> I would like to see something like this as well, even if it's an included
> UDF like REGEX(field, pattern) using Java's library for regex like Hive
> does.  That would be EXTREMELY helpful.
>
>
>
>
>
> On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <ni...@gmail.com> wrote:
>
> > > ANSI SQL doesn't define regex operator.
> > > Drill neither.
> > >
> > ​Drill has SQL functions extension like "REPEATED_CONTAINS"​ that looks
> to
> > handle regex. regex operator could be replaced with one new SQL
> extension ?
> > I guess I could create my own functions in java, right ? Maybe push it
> into
> > github then ?
> >
> >
> > > Doesn't it enough 'LIKE' operator?
> > >
> >
> > ​Sadly not, I'am looking for complex pattern matching. ​
> >
> >
> > --
> > > Miura, Masahide
> > >
> > > -----Original Message-----
> > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > To: user@drill.apache.org
> > > Subject: REGEX search Operator
> > >
> > > Hello,
> > >
> > > I can't find any reference in the documentation about a regex operator.
> > >
> > > I would like to be able to query this way :
> > >
> > > SELECT *
> > > FROM xxx
> > > WHERE  text_field   regexOperator    'regex_pattern';
> > >
> > > Thanks for helping,
> > >
> >
>

Re: REGEX search Operator

Posted by John Omernik <jo...@omernik.com>.
I would like to see something like this as well, even if it's an included
UDF like REGEX(field, pattern) using Java's library for regex like Hive
does.  That would be EXTREMELY helpful.





On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <ni...@gmail.com> wrote:

> > ANSI SQL doesn't define regex operator.
> > Drill neither.
> >
> ​Drill has SQL functions extension like "REPEATED_CONTAINS"​ that looks to
> handle regex. regex operator could be replaced with one new SQL extension ?
> I guess I could create my own functions in java, right ? Maybe push it into
> github then ?
>
>
> > Doesn't it enough 'LIKE' operator?
> >
>
> ​Sadly not, I'am looking for complex pattern matching. ​
>
>
> --
> > Miura, Masahide
> >
> > -----Original Message-----
> > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > Sent: Tuesday, February 02, 2016 9:04 PM
> > To: user@drill.apache.org
> > Subject: REGEX search Operator
> >
> > Hello,
> >
> > I can't find any reference in the documentation about a regex operator.
> >
> > I would like to be able to query this way :
> >
> > SELECT *
> > FROM xxx
> > WHERE  text_field   regexOperator    'regex_pattern';
> >
> > Thanks for helping,
> >
>

Re: REGEX search Operator

Posted by Nicolas Paris <ni...@gmail.com>.
> ANSI SQL doesn't define regex operator.
> Drill neither.
>
​Drill has SQL functions extension like "REPEATED_CONTAINS"​ that looks to
handle regex. regex operator could be replaced with one new SQL extension ?
I guess I could create my own functions in java, right ? Maybe push it into
github then ?


> Doesn't it enough 'LIKE' operator?
>

​Sadly not, I'am looking for complex pattern matching. ​


--
> Miura, Masahide
>
> -----Original Message-----
> From: Nicolas Paris [mailto:niparisco@gmail.com]
> Sent: Tuesday, February 02, 2016 9:04 PM
> To: user@drill.apache.org
> Subject: REGEX search Operator
>
> Hello,
>
> I can't find any reference in the documentation about a regex operator.
>
> I would like to be able to query this way :
>
> SELECT *
> FROM xxx
> WHERE  text_field   regexOperator    'regex_pattern';
>
> Thanks for helping,
>

RE: REGEX search Operator

Posted by ma...@brother.co.jp.
Hi,

ANSI SQL doesn't define regex operator.
Drill neither.

Doesn't it enough 'LIKE' operator?
Or, REGEXP_REPLACE/SUBSTR functions may help you.
https://drill.apache.org/docs/string-manipulation/

-- 
Miura, Masahide

-----Original Message-----
From: Nicolas Paris [mailto:niparisco@gmail.com] 
Sent: Tuesday, February 02, 2016 9:04 PM
To: user@drill.apache.org
Subject: REGEX search Operator

Hello,

I can't find any reference in the documentation about a regex operator.

I would like to be able to query this way :

SELECT *
FROM xxx
WHERE  text_field   regexOperator    'regex_pattern';

Thanks for helping,