You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Steven Phillips <st...@dremio.com> on 2015/10/05 00:15:45 UTC

Re: repeated_contains - intended behaviour?

Repated_contains originally worked as Jason describes, exact matching. At
some point, someone thought that it should allow wildcards and do substring
matching. There was never any real discussion on what this function should
do, though. It would probably be a good idea for someone to come up with a
more thorough proposal that includes a more comprehensive list of repeated
functions and what they will do.

On Wed, Sep 23, 2015 at 10:16 AM, Jason Altekruse <al...@gmail.com>
wrote:

> I think it is reasonable to consider that a bug. We should implement the
> function both as it works today and as you were originally expecting it.
> Any ideas about about a good naming scheme for the two?
>
> Unfortunately the regular contains() method does substring matching, but I
> think the name repeated_contains() should be used for exact matching. I'm
> inclined to suggest something like repeated_contains_regex_matching() for
> the other, but that is a bit long.
>
> On Mon, Sep 21, 2015 at 2:41 AM, Stefán Baxter <st...@activitystream.com>
> wrote:
>
> > Hi
> >
> > repeated_contains seems to have a strange default behavior as it behaves
> > like a startsWith, rather than a equalTo.
> >
> > With this data:
> >
> > {"alist":["cat","dog"]}
> > {"alist":["catastrophic"]}
> >
> >
> > and this query:;
> >
> > select alist from table where repeated_contains(`alist`,'cat');
> >
> >
> > both records are returned.
> >
> > I do realize that repeated_contains accepts regular expressions but i
> > wonder if this behavior is by design or a bug. (I also know that I can
> end
> > the query string with a $ but that just does not seem right)
> >
> > Regards,
> >  -Stefan
> >
>

Re: repeated_contains - intended behaviour?

Posted by Steven Phillips <st...@dremio.com>.
I think we should get rid of repeated_contains, and instead have a set of
boolean functions that operate on repeated types. For every boolean
function, there would be 2 corresponding boolean functions which operate on
repeated types.

the "any" version would return true if the corresponding non-repeated
function returns true for any of the elements in the array. The "all"
version would return true if and only if all of the elements return true.
For example,

{ a : 1, b : [ 1, 2 ] }

equals_any(a, b) would return true
equals_all(a, b) would return false

{ a : [ "cat", "dog" ] }

like_any(a, 'cat') would return true
like_all(a, 'cat') would return false

We could also take a cue from mongodb, and map functions that don't specify
"any" or "all", and map them to the "any" version. So

>From the first example, b = 1 would return true;
>From the second example, a like 'cat' would return true;

I think this approach would make it easier to understand what the functions
do, as the functionality would be directly related to the functionality of
the regular SQL function of the same name.

On Sun, Oct 4, 2015 at 3:34 PM, Stefán Baxter <st...@activitystream.com>
wrote:

> Hi,
>
> For me the wild card functionality is fine and functions as expected.
> It's partly because of it that I expected an exact match when no operator
> was in play.
>
> Regards,
>  -Stefan
>
> On Sun, Oct 4, 2015 at 10:15 PM, Steven Phillips <st...@dremio.com>
> wrote:
>
> > Repated_contains originally worked as Jason describes, exact matching. At
> > some point, someone thought that it should allow wildcards and do
> substring
> > matching. There was never any real discussion on what this function
> should
> > do, though. It would probably be a good idea for someone to come up with
> a
> > more thorough proposal that includes a more comprehensive list of
> repeated
> > functions and what they will do.
> >
> > On Wed, Sep 23, 2015 at 10:16 AM, Jason Altekruse <
> > altekrusejason@gmail.com>
> > wrote:
> >
> > > I think it is reasonable to consider that a bug. We should implement
> the
> > > function both as it works today and as you were originally expecting
> it.
> > > Any ideas about about a good naming scheme for the two?
> > >
> > > Unfortunately the regular contains() method does substring matching,
> but
> > I
> > > think the name repeated_contains() should be used for exact matching.
> I'm
> > > inclined to suggest something like repeated_contains_regex_matching()
> for
> > > the other, but that is a bit long.
> > >
> > > On Mon, Sep 21, 2015 at 2:41 AM, Stefán Baxter <
> > stefan@activitystream.com>
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > repeated_contains seems to have a strange default behavior as it
> > behaves
> > > > like a startsWith, rather than a equalTo.
> > > >
> > > > With this data:
> > > >
> > > > {"alist":["cat","dog"]}
> > > > {"alist":["catastrophic"]}
> > > >
> > > >
> > > > and this query:;
> > > >
> > > > select alist from table where repeated_contains(`alist`,'cat');
> > > >
> > > >
> > > > both records are returned.
> > > >
> > > > I do realize that repeated_contains accepts regular expressions but i
> > > > wonder if this behavior is by design or a bug. (I also know that I
> can
> > > end
> > > > the query string with a $ but that just does not seem right)
> > > >
> > > > Regards,
> > > >  -Stefan
> > > >
> > >
> >
>

Re: repeated_contains - intended behaviour?

Posted by Stefán Baxter <st...@activitystream.com>.
Hi,

For me the wild card functionality is fine and functions as expected.
It's partly because of it that I expected an exact match when no operator
was in play.

Regards,
 -Stefan

On Sun, Oct 4, 2015 at 10:15 PM, Steven Phillips <st...@dremio.com> wrote:

> Repated_contains originally worked as Jason describes, exact matching. At
> some point, someone thought that it should allow wildcards and do substring
> matching. There was never any real discussion on what this function should
> do, though. It would probably be a good idea for someone to come up with a
> more thorough proposal that includes a more comprehensive list of repeated
> functions and what they will do.
>
> On Wed, Sep 23, 2015 at 10:16 AM, Jason Altekruse <
> altekrusejason@gmail.com>
> wrote:
>
> > I think it is reasonable to consider that a bug. We should implement the
> > function both as it works today and as you were originally expecting it.
> > Any ideas about about a good naming scheme for the two?
> >
> > Unfortunately the regular contains() method does substring matching, but
> I
> > think the name repeated_contains() should be used for exact matching. I'm
> > inclined to suggest something like repeated_contains_regex_matching() for
> > the other, but that is a bit long.
> >
> > On Mon, Sep 21, 2015 at 2:41 AM, Stefán Baxter <
> stefan@activitystream.com>
> > wrote:
> >
> > > Hi
> > >
> > > repeated_contains seems to have a strange default behavior as it
> behaves
> > > like a startsWith, rather than a equalTo.
> > >
> > > With this data:
> > >
> > > {"alist":["cat","dog"]}
> > > {"alist":["catastrophic"]}
> > >
> > >
> > > and this query:;
> > >
> > > select alist from table where repeated_contains(`alist`,'cat');
> > >
> > >
> > > both records are returned.
> > >
> > > I do realize that repeated_contains accepts regular expressions but i
> > > wonder if this behavior is by design or a bug. (I also know that I can
> > end
> > > the query string with a $ but that just does not seem right)
> > >
> > > Regards,
> > >  -Stefan
> > >
> >
>