You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Damiano Albani <da...@gmail.com> on 2021/12/29 11:39:39 UTC

Non equi-joins with streaming expressions

Hello,

I'm new to streaming expressions, so I'm trying to understand their
features and limitations.
In particular the so-called "stream operators" implementing join operations.
Like "innerJoin", "leftOuterJoin", etc.

I see that they support a "on" parameter, defining the *equality* check to
be performed.
But, coming from the SQL world, I'm used to being able to use a variety of
comparison operators in join predicates. That is, not only equality, as in
"equi-joins".

Is there a reason why the current implementation of Solr supports
equi-joins only? Would it be technically possible (and desired) to support
other comparison operators with joins?
And maybe somehow allow the use of the available stream evaluators
<https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?

To give the context of my question: I'm trying to join 2 sets of documents
with a hierarchical relationship.
My goal is to join them using a "path" field on one side and
"descendent_path" field on the other side.
But it looks like that only doc values are accessible (and not analyzed
ones) in streams, so I suppose I'd be left with a join criteria like this
pseudo-code:

>   on="starts_with(right.path, left.path)"

Where, in this hypothetical example:

>   left.path=/categories/category1"
>   right.path=/categories/category1/sub-categories/sub-category-a"


Or do I completely misunderstand how Solr (streams) work? ;-)
Thanks for your help!

Regards,

-- 
Damiano Albani

Re: Non equi-joins with streaming expressions

Posted by Damiano Albani <da...@gmail.com>.
OK, if I can find some time, I'll have a try.
In the long run, it would be better (for me) to have the functionality by
default in Solr than to maintain my own piece of code 😀

On Fri, Jan 7, 2022 at 1:42 AM Dennis Gove <dp...@gmail.com> wrote:

> Hi Damiano,
>
> Yup, that's what I meant. I'd be happy to collaborate with you on this.
>
> Cheers!
>
> On Thu, Jan 6, 2022 at 5:09 PM Damiano Albani <da...@gmail.com>
> wrote:
>
> > Hi Dennis,
> >
> > Do you (implicitly) mean by your message that it would be a good idea to
> > get the changes you mentioned into the official Solr code base?
> > In other words, that a PR implementing this enhancement would be
> considered
> > by the Solr team?
> >
> > Regards,
> >
> > On Wed, Jan 5, 2022 at 1:58 AM Dennis Gove <dp...@gmail.com> wrote:
> >
> > > My recollection from working on this code years ago is that other
> > > definitions of "equal" can be supported by creating new implementations
> > of
> > > the Equalitor class (
> > >
> > >
> >
> https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/io/eq/Equalitor.java#L27-L30
> > > ).
> > > The purpose of the Equalitor class is not so much to say "these two
> > values
> > > are the same" but instead "these values can be joined on". Joins were
> one
> > > of the first streaming expressions created and as such existed before
> > > evaluators. The Equalitor class is a bit of an unfortunate holdover
> from
> > > that initial implementation. Were I doing it again now I'd use
> evaluators
> > > instead.
> > >
> > > That said, it may be possible to refactor the Equalitor class as a type
> > of
> > > Evaluator. An approach like that would, I think, clean up what's
> become a
> > > confusing holdover of that original implementation and simultaneously
> > make
> > > it possible to use any evaluator within a join clause.
> > >
> > > Alternatively, it'd be possible to enhance the join classes to support
> > > either Equalitors or Evaluators. Equalitors are constructed with this
> > > method -
> > >
> > >
> >
> https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.java#L352
> > > - so you could enhance any place that's called from to also support
> > > Evaluators.
> > >
> > > Cheers,
> > > Dennis
> > >
> > > On Tue, Jan 4, 2022 at 5:00 PM Damiano Albani <
> damiano.albani@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > It's the first time that I hear about those Lucene expressions
> written
> > in
> > > > JavaScript. Good to learn about it!
> > > > I suppose you're referring to
> > > >
> > > >
> > >
> >
> https://lucene.apache.org/core/9_0_0/expressions/org/apache/lucene/expressions/js/package-summary.html
> > > > ?
> > > > I couldn't find much information about how to use it, especially in
> > > > combination with Solr. If someone knowledgeable could chime in, that
> > > would
> > > > be great.
> > > > Though what I see on the API documentation page at first impression,
> is
> > > > that the list of supported functions is pretty limited.
> > > > Actually, I think that Solr's decorators provide a similar coverage
> of
> > > > functions out of the box:
> > > > https://solr.apache.org/guide/8_11/stream-evaluator-reference.html.
> > > > If I can find some time, I will play with my java() decorator idea
> and
> > > see
> > > > if it is any good.
> > > > Especially in terms of performance, where JavaScript-in-Lucene could
> > have
> > > > the upper hand indeed.
> > > >
> > > > Regards,
> > > >
> > > > On Tue, Jan 4, 2022 at 6:41 PM David Smiley <ds...@apache.org>
> > wrote:
> > > >
> > > > > I'd prefer to use Lucene's "expressions" module and thus do
> > JavaScript.
> > > > > This is more accessible to a wider audience, and I believe makes
> > > > > safety/security easier (though I have not checked).
> > > > >
> > > > > ~ David Smiley
> > > > > Apache Lucene/Solr Search Developer
> > > > > http://www.linkedin.com/in/davidwsmiley
> > > > >
> > > > >
> > > > > On Tue, Jan 4, 2022 at 12:30 PM Eric Pugh <
> > > > epugh@opensourceconnections.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > That looks great!  I love how (relatively) simple it all is to
> > write
> > > > your
> > > > > > own logic.
> > > > > >
> > > > > > One of the reasons that we added packages (bin/solr package) to
> > Solr
> > > is
> > > > > so
> > > > > > that if someone wants to add something like a java() evaluator,
> > they
> > > > can!
> > > > > >
> > > > > > > On Jan 4, 2022, at 11:40 AM, Damiano Albani <
> > > > damiano.albani@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Just a quick note to mention that I've managed to implement
> what
> > I
> > > > > wanted
> > > > > > > in terms of non equi-joins.
> > > > > > > Should someone be interested, I've put my code on
> > > > > > > https://github.com/dalbani/solr-streaming-expressions.
> > > > > > >
> > > > > > > By the way, I happened to need a startsWith function and I
> > > > implemented
> > > > > it
> > > > > > > quite easily.
> > > > > > > But I'm wondering if a very generic -- if not possibly not very
> > > safe
> > > > --
> > > > > > > java() evaluator could be built.
> > > > > > > That would open streaming expressions to the whole Java API
> > instead
> > > > of
> > > > > > > having to write individual evaluators.
> > > > > > > For the example of startsWith, it could look like something in
> > the
> > > > > range
> > > > > > of:
> > > > > > >
> > > > > > >> java(val(Hello), val(World), "arg0.startsWith(arg1)")
> > > > > > >
> > > > > > > Using say, https://www.javassist.org/, to turn the code
> argument
> > > > into
> > > > > > > bytecode.
> > > > > > > What do you think?
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > On Wed, Dec 29, 2021 at 12:39 PM Damiano Albani <
> > > > > > damiano.albani@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hello,
> > > > > > >>
> > > > > > >> I'm new to streaming expressions, so I'm trying to understand
> > > their
> > > > > > >> features and limitations.
> > > > > > >> In particular the so-called "stream operators" implementing
> join
> > > > > > >> operations.
> > > > > > >> Like "innerJoin", "leftOuterJoin", etc.
> > > > > > >>
> > > > > > >> I see that they support a "on" parameter, defining the
> > *equality*
> > > > > check
> > > > > > >> to be performed.
> > > > > > >> But, coming from the SQL world, I'm used to being able to use
> a
> > > > > variety
> > > > > > of
> > > > > > >> comparison operators in join predicates. That is, not only
> > > equality,
> > > > > as
> > > > > > in
> > > > > > >> "equi-joins".
> > > > > > >>
> > > > > > >> Is there a reason why the current implementation of Solr
> > supports
> > > > > > >> equi-joins only? Would it be technically possible (and
> desired)
> > to
> > > > > > support
> > > > > > >> other comparison operators with joins?
> > > > > > >> And maybe somehow allow the use of the available stream
> > evaluators
> > > > > > >> <
> > > https://solr.apache.org/guide/8_11/stream-evaluator-reference.html
> > > > >?
> > > > > > >>
> > > > > > >> To give the context of my question: I'm trying to join 2 sets
> of
> > > > > > documents
> > > > > > >> with a hierarchical relationship.
> > > > > > >> My goal is to join them using a "path" field on one side and
> > > > > > >> "descendent_path" field on the other side.
> > > > > > >> But it looks like that only doc values are accessible (and not
> > > > > analyzed
> > > > > > >> ones) in streams, so I suppose I'd be left with a join
> criteria
> > > like
> > > > > > this
> > > > > > >> pseudo-code:
> > > > > > >>
> > > > > > >>>  on="starts_with(right.path, left.path)"
> > > > > > >>
> > > > > > >> Where, in this hypothetical example:
> > > > > > >>
> > > > > > >>>  left.path=/categories/category1"
> > > > > > >>>
> > right.path=/categories/category1/sub-categories/sub-category-a"
> > > > > > >>
> > > > > > >>
> > > > > > >> Or do I completely misunderstand how Solr (streams) work? ;-)
> > > > > > >> Thanks for your help!
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >>
> > > > > > >> --
> > > > > > >> Damiano Albani
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Damiano Albani
> > > > > >
> > > > > > _______________________
> > > > > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC |
> > > 434.466.1467
> > > > |
> > > > > > http://www.opensourceconnections.com <
> > > > > > http://www.opensourceconnections.com/> | My Free/Busy <
> > > > > > http://tinyurl.com/eric-cal>
> > > > > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> > > > > >
> > > > > >
> > > > > > This e-mail and all contents, including attachments, is
> considered
> > to
> > > > be
> > > > > > Company Confidential unless explicitly stated otherwise,
> regardless
> > > of
> > > > > > whether attachments are marked as such.
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Damiano Albani
> > > >
> > >
> >
> >
> > --
> > Damiano Albani
> >
>


-- 
Damiano Albani

Re: Non equi-joins with streaming expressions

Posted by Dennis Gove <dp...@gmail.com>.
Hi Damiano,

Yup, that's what I meant. I'd be happy to collaborate with you on this.

Cheers!

On Thu, Jan 6, 2022 at 5:09 PM Damiano Albani <da...@gmail.com>
wrote:

> Hi Dennis,
>
> Do you (implicitly) mean by your message that it would be a good idea to
> get the changes you mentioned into the official Solr code base?
> In other words, that a PR implementing this enhancement would be considered
> by the Solr team?
>
> Regards,
>
> On Wed, Jan 5, 2022 at 1:58 AM Dennis Gove <dp...@gmail.com> wrote:
>
> > My recollection from working on this code years ago is that other
> > definitions of "equal" can be supported by creating new implementations
> of
> > the Equalitor class (
> >
> >
> https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/io/eq/Equalitor.java#L27-L30
> > ).
> > The purpose of the Equalitor class is not so much to say "these two
> values
> > are the same" but instead "these values can be joined on". Joins were one
> > of the first streaming expressions created and as such existed before
> > evaluators. The Equalitor class is a bit of an unfortunate holdover from
> > that initial implementation. Were I doing it again now I'd use evaluators
> > instead.
> >
> > That said, it may be possible to refactor the Equalitor class as a type
> of
> > Evaluator. An approach like that would, I think, clean up what's become a
> > confusing holdover of that original implementation and simultaneously
> make
> > it possible to use any evaluator within a join clause.
> >
> > Alternatively, it'd be possible to enhance the join classes to support
> > either Equalitors or Evaluators. Equalitors are constructed with this
> > method -
> >
> >
> https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.java#L352
> > - so you could enhance any place that's called from to also support
> > Evaluators.
> >
> > Cheers,
> > Dennis
> >
> > On Tue, Jan 4, 2022 at 5:00 PM Damiano Albani <da...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > It's the first time that I hear about those Lucene expressions written
> in
> > > JavaScript. Good to learn about it!
> > > I suppose you're referring to
> > >
> > >
> >
> https://lucene.apache.org/core/9_0_0/expressions/org/apache/lucene/expressions/js/package-summary.html
> > > ?
> > > I couldn't find much information about how to use it, especially in
> > > combination with Solr. If someone knowledgeable could chime in, that
> > would
> > > be great.
> > > Though what I see on the API documentation page at first impression, is
> > > that the list of supported functions is pretty limited.
> > > Actually, I think that Solr's decorators provide a similar coverage of
> > > functions out of the box:
> > > https://solr.apache.org/guide/8_11/stream-evaluator-reference.html.
> > > If I can find some time, I will play with my java() decorator idea and
> > see
> > > if it is any good.
> > > Especially in terms of performance, where JavaScript-in-Lucene could
> have
> > > the upper hand indeed.
> > >
> > > Regards,
> > >
> > > On Tue, Jan 4, 2022 at 6:41 PM David Smiley <ds...@apache.org>
> wrote:
> > >
> > > > I'd prefer to use Lucene's "expressions" module and thus do
> JavaScript.
> > > > This is more accessible to a wider audience, and I believe makes
> > > > safety/security easier (though I have not checked).
> > > >
> > > > ~ David Smiley
> > > > Apache Lucene/Solr Search Developer
> > > > http://www.linkedin.com/in/davidwsmiley
> > > >
> > > >
> > > > On Tue, Jan 4, 2022 at 12:30 PM Eric Pugh <
> > > epugh@opensourceconnections.com
> > > > >
> > > > wrote:
> > > >
> > > > > That looks great!  I love how (relatively) simple it all is to
> write
> > > your
> > > > > own logic.
> > > > >
> > > > > One of the reasons that we added packages (bin/solr package) to
> Solr
> > is
> > > > so
> > > > > that if someone wants to add something like a java() evaluator,
> they
> > > can!
> > > > >
> > > > > > On Jan 4, 2022, at 11:40 AM, Damiano Albani <
> > > damiano.albani@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Just a quick note to mention that I've managed to implement what
> I
> > > > wanted
> > > > > > in terms of non equi-joins.
> > > > > > Should someone be interested, I've put my code on
> > > > > > https://github.com/dalbani/solr-streaming-expressions.
> > > > > >
> > > > > > By the way, I happened to need a startsWith function and I
> > > implemented
> > > > it
> > > > > > quite easily.
> > > > > > But I'm wondering if a very generic -- if not possibly not very
> > safe
> > > --
> > > > > > java() evaluator could be built.
> > > > > > That would open streaming expressions to the whole Java API
> instead
> > > of
> > > > > > having to write individual evaluators.
> > > > > > For the example of startsWith, it could look like something in
> the
> > > > range
> > > > > of:
> > > > > >
> > > > > >> java(val(Hello), val(World), "arg0.startsWith(arg1)")
> > > > > >
> > > > > > Using say, https://www.javassist.org/, to turn the code argument
> > > into
> > > > > > bytecode.
> > > > > > What do you think?
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > On Wed, Dec 29, 2021 at 12:39 PM Damiano Albani <
> > > > > damiano.albani@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Hello,
> > > > > >>
> > > > > >> I'm new to streaming expressions, so I'm trying to understand
> > their
> > > > > >> features and limitations.
> > > > > >> In particular the so-called "stream operators" implementing join
> > > > > >> operations.
> > > > > >> Like "innerJoin", "leftOuterJoin", etc.
> > > > > >>
> > > > > >> I see that they support a "on" parameter, defining the
> *equality*
> > > > check
> > > > > >> to be performed.
> > > > > >> But, coming from the SQL world, I'm used to being able to use a
> > > > variety
> > > > > of
> > > > > >> comparison operators in join predicates. That is, not only
> > equality,
> > > > as
> > > > > in
> > > > > >> "equi-joins".
> > > > > >>
> > > > > >> Is there a reason why the current implementation of Solr
> supports
> > > > > >> equi-joins only? Would it be technically possible (and desired)
> to
> > > > > support
> > > > > >> other comparison operators with joins?
> > > > > >> And maybe somehow allow the use of the available stream
> evaluators
> > > > > >> <
> > https://solr.apache.org/guide/8_11/stream-evaluator-reference.html
> > > >?
> > > > > >>
> > > > > >> To give the context of my question: I'm trying to join 2 sets of
> > > > > documents
> > > > > >> with a hierarchical relationship.
> > > > > >> My goal is to join them using a "path" field on one side and
> > > > > >> "descendent_path" field on the other side.
> > > > > >> But it looks like that only doc values are accessible (and not
> > > > analyzed
> > > > > >> ones) in streams, so I suppose I'd be left with a join criteria
> > like
> > > > > this
> > > > > >> pseudo-code:
> > > > > >>
> > > > > >>>  on="starts_with(right.path, left.path)"
> > > > > >>
> > > > > >> Where, in this hypothetical example:
> > > > > >>
> > > > > >>>  left.path=/categories/category1"
> > > > > >>>
> right.path=/categories/category1/sub-categories/sub-category-a"
> > > > > >>
> > > > > >>
> > > > > >> Or do I completely misunderstand how Solr (streams) work? ;-)
> > > > > >> Thanks for your help!
> > > > > >>
> > > > > >> Regards,
> > > > > >>
> > > > > >> --
> > > > > >> Damiano Albani
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Damiano Albani
> > > > >
> > > > > _______________________
> > > > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC |
> > 434.466.1467
> > > |
> > > > > http://www.opensourceconnections.com <
> > > > > http://www.opensourceconnections.com/> | My Free/Busy <
> > > > > http://tinyurl.com/eric-cal>
> > > > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> > > > >
> > > >
> > >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> > > > >
> > > > >
> > > > > This e-mail and all contents, including attachments, is considered
> to
> > > be
> > > > > Company Confidential unless explicitly stated otherwise, regardless
> > of
> > > > > whether attachments are marked as such.
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Damiano Albani
> > >
> >
>
>
> --
> Damiano Albani
>

Re: Non equi-joins with streaming expressions

Posted by Damiano Albani <da...@gmail.com>.
Hi Dennis,

Do you (implicitly) mean by your message that it would be a good idea to
get the changes you mentioned into the official Solr code base?
In other words, that a PR implementing this enhancement would be considered
by the Solr team?

Regards,

On Wed, Jan 5, 2022 at 1:58 AM Dennis Gove <dp...@gmail.com> wrote:

> My recollection from working on this code years ago is that other
> definitions of "equal" can be supported by creating new implementations of
> the Equalitor class (
>
> https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/io/eq/Equalitor.java#L27-L30
> ).
> The purpose of the Equalitor class is not so much to say "these two values
> are the same" but instead "these values can be joined on". Joins were one
> of the first streaming expressions created and as such existed before
> evaluators. The Equalitor class is a bit of an unfortunate holdover from
> that initial implementation. Were I doing it again now I'd use evaluators
> instead.
>
> That said, it may be possible to refactor the Equalitor class as a type of
> Evaluator. An approach like that would, I think, clean up what's become a
> confusing holdover of that original implementation and simultaneously make
> it possible to use any evaluator within a join clause.
>
> Alternatively, it'd be possible to enhance the join classes to support
> either Equalitors or Evaluators. Equalitors are constructed with this
> method -
>
> https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.java#L352
> - so you could enhance any place that's called from to also support
> Evaluators.
>
> Cheers,
> Dennis
>
> On Tue, Jan 4, 2022 at 5:00 PM Damiano Albani <da...@gmail.com>
> wrote:
>
> > Hello,
> >
> > It's the first time that I hear about those Lucene expressions written in
> > JavaScript. Good to learn about it!
> > I suppose you're referring to
> >
> >
> https://lucene.apache.org/core/9_0_0/expressions/org/apache/lucene/expressions/js/package-summary.html
> > ?
> > I couldn't find much information about how to use it, especially in
> > combination with Solr. If someone knowledgeable could chime in, that
> would
> > be great.
> > Though what I see on the API documentation page at first impression, is
> > that the list of supported functions is pretty limited.
> > Actually, I think that Solr's decorators provide a similar coverage of
> > functions out of the box:
> > https://solr.apache.org/guide/8_11/stream-evaluator-reference.html.
> > If I can find some time, I will play with my java() decorator idea and
> see
> > if it is any good.
> > Especially in terms of performance, where JavaScript-in-Lucene could have
> > the upper hand indeed.
> >
> > Regards,
> >
> > On Tue, Jan 4, 2022 at 6:41 PM David Smiley <ds...@apache.org> wrote:
> >
> > > I'd prefer to use Lucene's "expressions" module and thus do JavaScript.
> > > This is more accessible to a wider audience, and I believe makes
> > > safety/security easier (though I have not checked).
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> > >
> > > On Tue, Jan 4, 2022 at 12:30 PM Eric Pugh <
> > epugh@opensourceconnections.com
> > > >
> > > wrote:
> > >
> > > > That looks great!  I love how (relatively) simple it all is to write
> > your
> > > > own logic.
> > > >
> > > > One of the reasons that we added packages (bin/solr package) to Solr
> is
> > > so
> > > > that if someone wants to add something like a java() evaluator, they
> > can!
> > > >
> > > > > On Jan 4, 2022, at 11:40 AM, Damiano Albani <
> > damiano.albani@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Just a quick note to mention that I've managed to implement what I
> > > wanted
> > > > > in terms of non equi-joins.
> > > > > Should someone be interested, I've put my code on
> > > > > https://github.com/dalbani/solr-streaming-expressions.
> > > > >
> > > > > By the way, I happened to need a startsWith function and I
> > implemented
> > > it
> > > > > quite easily.
> > > > > But I'm wondering if a very generic -- if not possibly not very
> safe
> > --
> > > > > java() evaluator could be built.
> > > > > That would open streaming expressions to the whole Java API instead
> > of
> > > > > having to write individual evaluators.
> > > > > For the example of startsWith, it could look like something in the
> > > range
> > > > of:
> > > > >
> > > > >> java(val(Hello), val(World), "arg0.startsWith(arg1)")
> > > > >
> > > > > Using say, https://www.javassist.org/, to turn the code argument
> > into
> > > > > bytecode.
> > > > > What do you think?
> > > > >
> > > > > Regards,
> > > > >
> > > > > On Wed, Dec 29, 2021 at 12:39 PM Damiano Albani <
> > > > damiano.albani@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >> I'm new to streaming expressions, so I'm trying to understand
> their
> > > > >> features and limitations.
> > > > >> In particular the so-called "stream operators" implementing join
> > > > >> operations.
> > > > >> Like "innerJoin", "leftOuterJoin", etc.
> > > > >>
> > > > >> I see that they support a "on" parameter, defining the *equality*
> > > check
> > > > >> to be performed.
> > > > >> But, coming from the SQL world, I'm used to being able to use a
> > > variety
> > > > of
> > > > >> comparison operators in join predicates. That is, not only
> equality,
> > > as
> > > > in
> > > > >> "equi-joins".
> > > > >>
> > > > >> Is there a reason why the current implementation of Solr supports
> > > > >> equi-joins only? Would it be technically possible (and desired) to
> > > > support
> > > > >> other comparison operators with joins?
> > > > >> And maybe somehow allow the use of the available stream evaluators
> > > > >> <
> https://solr.apache.org/guide/8_11/stream-evaluator-reference.html
> > >?
> > > > >>
> > > > >> To give the context of my question: I'm trying to join 2 sets of
> > > > documents
> > > > >> with a hierarchical relationship.
> > > > >> My goal is to join them using a "path" field on one side and
> > > > >> "descendent_path" field on the other side.
> > > > >> But it looks like that only doc values are accessible (and not
> > > analyzed
> > > > >> ones) in streams, so I suppose I'd be left with a join criteria
> like
> > > > this
> > > > >> pseudo-code:
> > > > >>
> > > > >>>  on="starts_with(right.path, left.path)"
> > > > >>
> > > > >> Where, in this hypothetical example:
> > > > >>
> > > > >>>  left.path=/categories/category1"
> > > > >>>  right.path=/categories/category1/sub-categories/sub-category-a"
> > > > >>
> > > > >>
> > > > >> Or do I completely misunderstand how Solr (streams) work? ;-)
> > > > >> Thanks for your help!
> > > > >>
> > > > >> Regards,
> > > > >>
> > > > >> --
> > > > >> Damiano Albani
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Damiano Albani
> > > >
> > > > _______________________
> > > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC |
> 434.466.1467
> > |
> > > > http://www.opensourceconnections.com <
> > > > http://www.opensourceconnections.com/> | My Free/Busy <
> > > > http://tinyurl.com/eric-cal>
> > > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> > > >
> > >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> > > >
> > > >
> > > > This e-mail and all contents, including attachments, is considered to
> > be
> > > > Company Confidential unless explicitly stated otherwise, regardless
> of
> > > > whether attachments are marked as such.
> > > >
> > > >
> > >
> >
> >
> > --
> > Damiano Albani
> >
>


-- 
Damiano Albani

Re: Non equi-joins with streaming expressions

Posted by Dennis Gove <dp...@gmail.com>.
My recollection from working on this code years ago is that other
definitions of "equal" can be supported by creating new implementations of
the Equalitor class (
https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/io/eq/Equalitor.java#L27-L30).
The purpose of the Equalitor class is not so much to say "these two values
are the same" but instead "these values can be joined on". Joins were one
of the first streaming expressions created and as such existed before
evaluators. The Equalitor class is a bit of an unfortunate holdover from
that initial implementation. Were I doing it again now I'd use evaluators
instead.

That said, it may be possible to refactor the Equalitor class as a type of
Evaluator. An approach like that would, I think, clean up what's become a
confusing holdover of that original implementation and simultaneously make
it possible to use any evaluator within a join clause.

Alternatively, it'd be possible to enhance the join classes to support
either Equalitors or Evaluators. Equalitors are constructed with this
method -
https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamFactory.java#L352
- so you could enhance any place that's called from to also support
Evaluators.

Cheers,
Dennis

On Tue, Jan 4, 2022 at 5:00 PM Damiano Albani <da...@gmail.com>
wrote:

> Hello,
>
> It's the first time that I hear about those Lucene expressions written in
> JavaScript. Good to learn about it!
> I suppose you're referring to
>
> https://lucene.apache.org/core/9_0_0/expressions/org/apache/lucene/expressions/js/package-summary.html
> ?
> I couldn't find much information about how to use it, especially in
> combination with Solr. If someone knowledgeable could chime in, that would
> be great.
> Though what I see on the API documentation page at first impression, is
> that the list of supported functions is pretty limited.
> Actually, I think that Solr's decorators provide a similar coverage of
> functions out of the box:
> https://solr.apache.org/guide/8_11/stream-evaluator-reference.html.
> If I can find some time, I will play with my java() decorator idea and see
> if it is any good.
> Especially in terms of performance, where JavaScript-in-Lucene could have
> the upper hand indeed.
>
> Regards,
>
> On Tue, Jan 4, 2022 at 6:41 PM David Smiley <ds...@apache.org> wrote:
>
> > I'd prefer to use Lucene's "expressions" module and thus do JavaScript.
> > This is more accessible to a wider audience, and I believe makes
> > safety/security easier (though I have not checked).
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Tue, Jan 4, 2022 at 12:30 PM Eric Pugh <
> epugh@opensourceconnections.com
> > >
> > wrote:
> >
> > > That looks great!  I love how (relatively) simple it all is to write
> your
> > > own logic.
> > >
> > > One of the reasons that we added packages (bin/solr package) to Solr is
> > so
> > > that if someone wants to add something like a java() evaluator, they
> can!
> > >
> > > > On Jan 4, 2022, at 11:40 AM, Damiano Albani <
> damiano.albani@gmail.com>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > Just a quick note to mention that I've managed to implement what I
> > wanted
> > > > in terms of non equi-joins.
> > > > Should someone be interested, I've put my code on
> > > > https://github.com/dalbani/solr-streaming-expressions.
> > > >
> > > > By the way, I happened to need a startsWith function and I
> implemented
> > it
> > > > quite easily.
> > > > But I'm wondering if a very generic -- if not possibly not very safe
> --
> > > > java() evaluator could be built.
> > > > That would open streaming expressions to the whole Java API instead
> of
> > > > having to write individual evaluators.
> > > > For the example of startsWith, it could look like something in the
> > range
> > > of:
> > > >
> > > >> java(val(Hello), val(World), "arg0.startsWith(arg1)")
> > > >
> > > > Using say, https://www.javassist.org/, to turn the code argument
> into
> > > > bytecode.
> > > > What do you think?
> > > >
> > > > Regards,
> > > >
> > > > On Wed, Dec 29, 2021 at 12:39 PM Damiano Albani <
> > > damiano.albani@gmail.com>
> > > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I'm new to streaming expressions, so I'm trying to understand their
> > > >> features and limitations.
> > > >> In particular the so-called "stream operators" implementing join
> > > >> operations.
> > > >> Like "innerJoin", "leftOuterJoin", etc.
> > > >>
> > > >> I see that they support a "on" parameter, defining the *equality*
> > check
> > > >> to be performed.
> > > >> But, coming from the SQL world, I'm used to being able to use a
> > variety
> > > of
> > > >> comparison operators in join predicates. That is, not only equality,
> > as
> > > in
> > > >> "equi-joins".
> > > >>
> > > >> Is there a reason why the current implementation of Solr supports
> > > >> equi-joins only? Would it be technically possible (and desired) to
> > > support
> > > >> other comparison operators with joins?
> > > >> And maybe somehow allow the use of the available stream evaluators
> > > >> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html
> >?
> > > >>
> > > >> To give the context of my question: I'm trying to join 2 sets of
> > > documents
> > > >> with a hierarchical relationship.
> > > >> My goal is to join them using a "path" field on one side and
> > > >> "descendent_path" field on the other side.
> > > >> But it looks like that only doc values are accessible (and not
> > analyzed
> > > >> ones) in streams, so I suppose I'd be left with a join criteria like
> > > this
> > > >> pseudo-code:
> > > >>
> > > >>>  on="starts_with(right.path, left.path)"
> > > >>
> > > >> Where, in this hypothetical example:
> > > >>
> > > >>>  left.path=/categories/category1"
> > > >>>  right.path=/categories/category1/sub-categories/sub-category-a"
> > > >>
> > > >>
> > > >> Or do I completely misunderstand how Solr (streams) work? ;-)
> > > >> Thanks for your help!
> > > >>
> > > >> Regards,
> > > >>
> > > >> --
> > > >> Damiano Albani
> > > >>
> > > >
> > > >
> > > > --
> > > > Damiano Albani
> > >
> > > _______________________
> > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> |
> > > http://www.opensourceconnections.com <
> > > http://www.opensourceconnections.com/> | My Free/Busy <
> > > http://tinyurl.com/eric-cal>
> > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> > >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> > >
> > >
> > > This e-mail and all contents, including attachments, is considered to
> be
> > > Company Confidential unless explicitly stated otherwise, regardless of
> > > whether attachments are marked as such.
> > >
> > >
> >
>
>
> --
> Damiano Albani
>

Re: Non equi-joins with streaming expressions

Posted by Damiano Albani <da...@gmail.com>.
Hello,

It's the first time that I hear about those Lucene expressions written in
JavaScript. Good to learn about it!
I suppose you're referring to
https://lucene.apache.org/core/9_0_0/expressions/org/apache/lucene/expressions/js/package-summary.html
?
I couldn't find much information about how to use it, especially in
combination with Solr. If someone knowledgeable could chime in, that would
be great.
Though what I see on the API documentation page at first impression, is
that the list of supported functions is pretty limited.
Actually, I think that Solr's decorators provide a similar coverage of
functions out of the box:
https://solr.apache.org/guide/8_11/stream-evaluator-reference.html.
If I can find some time, I will play with my java() decorator idea and see
if it is any good.
Especially in terms of performance, where JavaScript-in-Lucene could have
the upper hand indeed.

Regards,

On Tue, Jan 4, 2022 at 6:41 PM David Smiley <ds...@apache.org> wrote:

> I'd prefer to use Lucene's "expressions" module and thus do JavaScript.
> This is more accessible to a wider audience, and I believe makes
> safety/security easier (though I have not checked).
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Jan 4, 2022 at 12:30 PM Eric Pugh <epugh@opensourceconnections.com
> >
> wrote:
>
> > That looks great!  I love how (relatively) simple it all is to write your
> > own logic.
> >
> > One of the reasons that we added packages (bin/solr package) to Solr is
> so
> > that if someone wants to add something like a java() evaluator, they can!
> >
> > > On Jan 4, 2022, at 11:40 AM, Damiano Albani <da...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > Just a quick note to mention that I've managed to implement what I
> wanted
> > > in terms of non equi-joins.
> > > Should someone be interested, I've put my code on
> > > https://github.com/dalbani/solr-streaming-expressions.
> > >
> > > By the way, I happened to need a startsWith function and I implemented
> it
> > > quite easily.
> > > But I'm wondering if a very generic -- if not possibly not very safe --
> > > java() evaluator could be built.
> > > That would open streaming expressions to the whole Java API instead of
> > > having to write individual evaluators.
> > > For the example of startsWith, it could look like something in the
> range
> > of:
> > >
> > >> java(val(Hello), val(World), "arg0.startsWith(arg1)")
> > >
> > > Using say, https://www.javassist.org/, to turn the code argument into
> > > bytecode.
> > > What do you think?
> > >
> > > Regards,
> > >
> > > On Wed, Dec 29, 2021 at 12:39 PM Damiano Albani <
> > damiano.albani@gmail.com>
> > > wrote:
> > >
> > >> Hello,
> > >>
> > >> I'm new to streaming expressions, so I'm trying to understand their
> > >> features and limitations.
> > >> In particular the so-called "stream operators" implementing join
> > >> operations.
> > >> Like "innerJoin", "leftOuterJoin", etc.
> > >>
> > >> I see that they support a "on" parameter, defining the *equality*
> check
> > >> to be performed.
> > >> But, coming from the SQL world, I'm used to being able to use a
> variety
> > of
> > >> comparison operators in join predicates. That is, not only equality,
> as
> > in
> > >> "equi-joins".
> > >>
> > >> Is there a reason why the current implementation of Solr supports
> > >> equi-joins only? Would it be technically possible (and desired) to
> > support
> > >> other comparison operators with joins?
> > >> And maybe somehow allow the use of the available stream evaluators
> > >> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?
> > >>
> > >> To give the context of my question: I'm trying to join 2 sets of
> > documents
> > >> with a hierarchical relationship.
> > >> My goal is to join them using a "path" field on one side and
> > >> "descendent_path" field on the other side.
> > >> But it looks like that only doc values are accessible (and not
> analyzed
> > >> ones) in streams, so I suppose I'd be left with a join criteria like
> > this
> > >> pseudo-code:
> > >>
> > >>>  on="starts_with(right.path, left.path)"
> > >>
> > >> Where, in this hypothetical example:
> > >>
> > >>>  left.path=/categories/category1"
> > >>>  right.path=/categories/category1/sub-categories/sub-category-a"
> > >>
> > >>
> > >> Or do I completely misunderstand how Solr (streams) work? ;-)
> > >> Thanks for your help!
> > >>
> > >> Regards,
> > >>
> > >> --
> > >> Damiano Albani
> > >>
> > >
> > >
> > > --
> > > Damiano Albani
> >
> > _______________________
> > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> > http://www.opensourceconnections.com <
> > http://www.opensourceconnections.com/> | My Free/Busy <
> > http://tinyurl.com/eric-cal>
> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> >
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless of
> > whether attachments are marked as such.
> >
> >
>


-- 
Damiano Albani

Re: Non equi-joins with streaming expressions

Posted by David Smiley <ds...@apache.org>.
I'd prefer to use Lucene's "expressions" module and thus do JavaScript.
This is more accessible to a wider audience, and I believe makes
safety/security easier (though I have not checked).

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jan 4, 2022 at 12:30 PM Eric Pugh <ep...@opensourceconnections.com>
wrote:

> That looks great!  I love how (relatively) simple it all is to write your
> own logic.
>
> One of the reasons that we added packages (bin/solr package) to Solr is so
> that if someone wants to add something like a java() evaluator, they can!
>
> > On Jan 4, 2022, at 11:40 AM, Damiano Albani <da...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > Just a quick note to mention that I've managed to implement what I wanted
> > in terms of non equi-joins.
> > Should someone be interested, I've put my code on
> > https://github.com/dalbani/solr-streaming-expressions.
> >
> > By the way, I happened to need a startsWith function and I implemented it
> > quite easily.
> > But I'm wondering if a very generic -- if not possibly not very safe --
> > java() evaluator could be built.
> > That would open streaming expressions to the whole Java API instead of
> > having to write individual evaluators.
> > For the example of startsWith, it could look like something in the range
> of:
> >
> >> java(val(Hello), val(World), "arg0.startsWith(arg1)")
> >
> > Using say, https://www.javassist.org/, to turn the code argument into
> > bytecode.
> > What do you think?
> >
> > Regards,
> >
> > On Wed, Dec 29, 2021 at 12:39 PM Damiano Albani <
> damiano.albani@gmail.com>
> > wrote:
> >
> >> Hello,
> >>
> >> I'm new to streaming expressions, so I'm trying to understand their
> >> features and limitations.
> >> In particular the so-called "stream operators" implementing join
> >> operations.
> >> Like "innerJoin", "leftOuterJoin", etc.
> >>
> >> I see that they support a "on" parameter, defining the *equality* check
> >> to be performed.
> >> But, coming from the SQL world, I'm used to being able to use a variety
> of
> >> comparison operators in join predicates. That is, not only equality, as
> in
> >> "equi-joins".
> >>
> >> Is there a reason why the current implementation of Solr supports
> >> equi-joins only? Would it be technically possible (and desired) to
> support
> >> other comparison operators with joins?
> >> And maybe somehow allow the use of the available stream evaluators
> >> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?
> >>
> >> To give the context of my question: I'm trying to join 2 sets of
> documents
> >> with a hierarchical relationship.
> >> My goal is to join them using a "path" field on one side and
> >> "descendent_path" field on the other side.
> >> But it looks like that only doc values are accessible (and not analyzed
> >> ones) in streams, so I suppose I'd be left with a join criteria like
> this
> >> pseudo-code:
> >>
> >>>  on="starts_with(right.path, left.path)"
> >>
> >> Where, in this hypothetical example:
> >>
> >>>  left.path=/categories/category1"
> >>>  right.path=/categories/category1/sub-categories/sub-category-a"
> >>
> >>
> >> Or do I completely misunderstand how Solr (streams) work? ;-)
> >> Thanks for your help!
> >>
> >> Regards,
> >>
> >> --
> >> Damiano Albani
> >>
> >
> >
> > --
> > Damiano Albani
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

Re: Non equi-joins with streaming expressions

Posted by Eric Pugh <ep...@opensourceconnections.com>.
That looks great!  I love how (relatively) simple it all is to write your own logic.

One of the reasons that we added packages (bin/solr package) to Solr is so that if someone wants to add something like a java() evaluator, they can!

> On Jan 4, 2022, at 11:40 AM, Damiano Albani <da...@gmail.com> wrote:
> 
> Hi,
> 
> Just a quick note to mention that I've managed to implement what I wanted
> in terms of non equi-joins.
> Should someone be interested, I've put my code on
> https://github.com/dalbani/solr-streaming-expressions.
> 
> By the way, I happened to need a startsWith function and I implemented it
> quite easily.
> But I'm wondering if a very generic -- if not possibly not very safe --
> java() evaluator could be built.
> That would open streaming expressions to the whole Java API instead of
> having to write individual evaluators.
> For the example of startsWith, it could look like something in the range of:
> 
>> java(val(Hello), val(World), "arg0.startsWith(arg1)")
> 
> Using say, https://www.javassist.org/, to turn the code argument into
> bytecode.
> What do you think?
> 
> Regards,
> 
> On Wed, Dec 29, 2021 at 12:39 PM Damiano Albani <da...@gmail.com>
> wrote:
> 
>> Hello,
>> 
>> I'm new to streaming expressions, so I'm trying to understand their
>> features and limitations.
>> In particular the so-called "stream operators" implementing join
>> operations.
>> Like "innerJoin", "leftOuterJoin", etc.
>> 
>> I see that they support a "on" parameter, defining the *equality* check
>> to be performed.
>> But, coming from the SQL world, I'm used to being able to use a variety of
>> comparison operators in join predicates. That is, not only equality, as in
>> "equi-joins".
>> 
>> Is there a reason why the current implementation of Solr supports
>> equi-joins only? Would it be technically possible (and desired) to support
>> other comparison operators with joins?
>> And maybe somehow allow the use of the available stream evaluators
>> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?
>> 
>> To give the context of my question: I'm trying to join 2 sets of documents
>> with a hierarchical relationship.
>> My goal is to join them using a "path" field on one side and
>> "descendent_path" field on the other side.
>> But it looks like that only doc values are accessible (and not analyzed
>> ones) in streams, so I suppose I'd be left with a join criteria like this
>> pseudo-code:
>> 
>>>  on="starts_with(right.path, left.path)"
>> 
>> Where, in this hypothetical example:
>> 
>>>  left.path=/categories/category1"
>>>  right.path=/categories/category1/sub-categories/sub-category-a"
>> 
>> 
>> Or do I completely misunderstand how Solr (streams) work? ;-)
>> Thanks for your help!
>> 
>> Regards,
>> 
>> --
>> Damiano Albani
>> 
> 
> 
> -- 
> Damiano Albani

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Re: Non equi-joins with streaming expressions

Posted by Damiano Albani <da...@gmail.com>.
Hi,

Just a quick note to mention that I've managed to implement what I wanted
in terms of non equi-joins.
Should someone be interested, I've put my code on
https://github.com/dalbani/solr-streaming-expressions.

By the way, I happened to need a startsWith function and I implemented it
quite easily.
But I'm wondering if a very generic -- if not possibly not very safe --
java() evaluator could be built.
That would open streaming expressions to the whole Java API instead of
having to write individual evaluators.
For the example of startsWith, it could look like something in the range of:

> java(val(Hello), val(World), "arg0.startsWith(arg1)")

Using say, https://www.javassist.org/, to turn the code argument into
bytecode.
What do you think?

Regards,

On Wed, Dec 29, 2021 at 12:39 PM Damiano Albani <da...@gmail.com>
wrote:

> Hello,
>
> I'm new to streaming expressions, so I'm trying to understand their
> features and limitations.
> In particular the so-called "stream operators" implementing join
> operations.
> Like "innerJoin", "leftOuterJoin", etc.
>
> I see that they support a "on" parameter, defining the *equality* check
> to be performed.
> But, coming from the SQL world, I'm used to being able to use a variety of
> comparison operators in join predicates. That is, not only equality, as in
> "equi-joins".
>
> Is there a reason why the current implementation of Solr supports
> equi-joins only? Would it be technically possible (and desired) to support
> other comparison operators with joins?
> And maybe somehow allow the use of the available stream evaluators
> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?
>
> To give the context of my question: I'm trying to join 2 sets of documents
> with a hierarchical relationship.
> My goal is to join them using a "path" field on one side and
> "descendent_path" field on the other side.
> But it looks like that only doc values are accessible (and not analyzed
> ones) in streams, so I suppose I'd be left with a join criteria like this
> pseudo-code:
>
>>   on="starts_with(right.path, left.path)"
>
> Where, in this hypothetical example:
>
>>   left.path=/categories/category1"
>>   right.path=/categories/category1/sub-categories/sub-category-a"
>
>
> Or do I completely misunderstand how Solr (streams) work? ;-)
> Thanks for your help!
>
> Regards,
>
> --
> Damiano Albani
>


-- 
Damiano Albani

Re: Non equi-joins with streaming expressions

Posted by Joel Bernstein <jo...@gmail.com>.
As you mentioned currently only the equi-join is supported. But you could
pretty quickly adapt an existing join to do what you want.

https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/LeftOuterJoinStream.java

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Dec 29, 2021 at 10:23 AM Eric Pugh <ep...@opensourceconnections.com>
wrote:

>
> https://github.com/epugh/playing-with-solr-streaming-expressions/tree/master/streaming_expressions/src/main/java/com/o19s/solr/streaming
> has an example of parsing JSONL formatted docs and an example of using
> atomic updates ;-)
>
>
> https://github.com/epugh/playing-with-solr-streaming-expressions/blob/interact_with_tika_server/streaming_expressions/src/main/java/com/o19s/solr/streaming/SpaCyStream.java
> is an example of interacting with SpaCy ;-)
>
>
>
> > On Dec 29, 2021, at 10:01 AM, Damiano Albani <da...@gmail.com>
> wrote:
> >
> > Hi Eric,
> >
> > Thanks for your feedback, I highly appreciate it.
> > I don't mind going the route of implementing something myself. I will
> have
> > a try.
> > By any chance, apart from looking at the official codebase, do you know
> of
> > any examples out there I could draw my inspiration from?
> >
> > Regards,
> >
> > On Wed, Dec 29, 2021 at 3:08 PM Eric Pugh <
> epugh@opensourceconnections.com <ma...@opensourceconnections.com>>
> > wrote:
> >
> >> Damiano,  I don’t really have a direct answer for you.   However, one of
> >> the aspects of Streaming that I really like is that it’s relatively
> easy to
> >> create your own operators and add them to Solr.   I find that I often
> just
> >> create my own operator to fill in the gap of what is available.
> >>
> >> I do think joining disparate datasets to make new datasets is one of the
> >> most interesting uses of Streaming, so would love to see what you cook
> up.
> >>
> >> Eric
> >>
> >>> On Dec 29, 2021, at 6:39 AM, Damiano Albani <da...@gmail.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm new to streaming expressions, so I'm trying to understand their
> >>> features and limitations.
> >>> In particular the so-called "stream operators" implementing join
> >> operations.
> >>> Like "innerJoin", "leftOuterJoin", etc.
> >>>
> >>> I see that they support a "on" parameter, defining the *equality* check
> >> to
> >>> be performed.
> >>> But, coming from the SQL world, I'm used to being able to use a variety
> >> of
> >>> comparison operators in join predicates. That is, not only equality, as
> >> in
> >>> "equi-joins".
> >>>
> >>> Is there a reason why the current implementation of Solr supports
> >>> equi-joins only? Would it be technically possible (and desired) to
> >> support
> >>> other comparison operators with joins?
> >>> And maybe somehow allow the use of the available stream evaluators
> >>> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?
> >>>
> >>> To give the context of my question: I'm trying to join 2 sets of
> >> documents
> >>> with a hierarchical relationship.
> >>> My goal is to join them using a "path" field on one side and
> >>> "descendent_path" field on the other side.
> >>> But it looks like that only doc values are accessible (and not analyzed
> >>> ones) in streams, so I suppose I'd be left with a join criteria like
> this
> >>> pseudo-code:
> >>>
> >>>> on="starts_with(right.path, left.path)"
> >>>
> >>> Where, in this hypothetical example:
> >>>
> >>>> left.path=/categories/category1"
> >>>> right.path=/categories/category1/sub-categories/sub-category-a"
> >>>
> >>>
> >>> Or do I completely misunderstand how Solr (streams) work? ;-)
> >>> Thanks for your help!
> >>>
> >>> Regards,
> >>>
> >>> --
> >>> Damiano Albani
> >>
> >> _______________________
> >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> >> http://www.opensourceconnections.com <
> >> http://www.opensourceconnections.com/ <
> http://www.opensourceconnections.com/>> | My Free/Busy <
> >> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>
> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >>
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >>
> >>
> >> This e-mail and all contents, including attachments, is considered to be
> >> Company Confidential unless explicitly stated otherwise, regardless of
> >> whether attachments are marked as such.
> >>
> >>
> >
> > --
> > Damiano Albani
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

Re: Non equi-joins with streaming expressions

Posted by Eric Pugh <ep...@opensourceconnections.com>.
https://github.com/epugh/playing-with-solr-streaming-expressions/tree/master/streaming_expressions/src/main/java/com/o19s/solr/streaming has an example of parsing JSONL formatted docs and an example of using atomic updates ;-)

https://github.com/epugh/playing-with-solr-streaming-expressions/blob/interact_with_tika_server/streaming_expressions/src/main/java/com/o19s/solr/streaming/SpaCyStream.java is an example of interacting with SpaCy ;-)



> On Dec 29, 2021, at 10:01 AM, Damiano Albani <da...@gmail.com> wrote:
> 
> Hi Eric,
> 
> Thanks for your feedback, I highly appreciate it.
> I don't mind going the route of implementing something myself. I will have
> a try.
> By any chance, apart from looking at the official codebase, do you know of
> any examples out there I could draw my inspiration from?
> 
> Regards,
> 
> On Wed, Dec 29, 2021 at 3:08 PM Eric Pugh <epugh@opensourceconnections.com <ma...@opensourceconnections.com>>
> wrote:
> 
>> Damiano,  I don’t really have a direct answer for you.   However, one of
>> the aspects of Streaming that I really like is that it’s relatively easy to
>> create your own operators and add them to Solr.   I find that I often just
>> create my own operator to fill in the gap of what is available.
>> 
>> I do think joining disparate datasets to make new datasets is one of the
>> most interesting uses of Streaming, so would love to see what you cook up.
>> 
>> Eric
>> 
>>> On Dec 29, 2021, at 6:39 AM, Damiano Albani <da...@gmail.com>
>> wrote:
>>> 
>>> Hello,
>>> 
>>> I'm new to streaming expressions, so I'm trying to understand their
>>> features and limitations.
>>> In particular the so-called "stream operators" implementing join
>> operations.
>>> Like "innerJoin", "leftOuterJoin", etc.
>>> 
>>> I see that they support a "on" parameter, defining the *equality* check
>> to
>>> be performed.
>>> But, coming from the SQL world, I'm used to being able to use a variety
>> of
>>> comparison operators in join predicates. That is, not only equality, as
>> in
>>> "equi-joins".
>>> 
>>> Is there a reason why the current implementation of Solr supports
>>> equi-joins only? Would it be technically possible (and desired) to
>> support
>>> other comparison operators with joins?
>>> And maybe somehow allow the use of the available stream evaluators
>>> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?
>>> 
>>> To give the context of my question: I'm trying to join 2 sets of
>> documents
>>> with a hierarchical relationship.
>>> My goal is to join them using a "path" field on one side and
>>> "descendent_path" field on the other side.
>>> But it looks like that only doc values are accessible (and not analyzed
>>> ones) in streams, so I suppose I'd be left with a join criteria like this
>>> pseudo-code:
>>> 
>>>> on="starts_with(right.path, left.path)"
>>> 
>>> Where, in this hypothetical example:
>>> 
>>>> left.path=/categories/category1"
>>>> right.path=/categories/category1/sub-categories/sub-category-a"
>>> 
>>> 
>>> Or do I completely misunderstand how Solr (streams) work? ;-)
>>> Thanks for your help!
>>> 
>>> Regards,
>>> 
>>> --
>>> Damiano Albani
>> 
>> _______________________
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
>> http://www.opensourceconnections.com <
>> http://www.opensourceconnections.com/ <http://www.opensourceconnections.com/>> | My Free/Busy <
>> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
>> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>>
>> 
>> This e-mail and all contents, including attachments, is considered to be
>> Company Confidential unless explicitly stated otherwise, regardless of
>> whether attachments are marked as such.
>> 
>> 
> 
> -- 
> Damiano Albani

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Re: Non equi-joins with streaming expressions

Posted by Damiano Albani <da...@gmail.com>.
Hi Eric,

Thanks for your feedback, I highly appreciate it.
I don't mind going the route of implementing something myself. I will have
a try.
By any chance, apart from looking at the official codebase, do you know of
any examples out there I could draw my inspiration from?

Regards,

On Wed, Dec 29, 2021 at 3:08 PM Eric Pugh <ep...@opensourceconnections.com>
wrote:

> Damiano,  I don’t really have a direct answer for you.   However, one of
> the aspects of Streaming that I really like is that it’s relatively easy to
> create your own operators and add them to Solr.   I find that I often just
> create my own operator to fill in the gap of what is available.
>
> I do think joining disparate datasets to make new datasets is one of the
> most interesting uses of Streaming, so would love to see what you cook up.
>
> Eric
>
> > On Dec 29, 2021, at 6:39 AM, Damiano Albani <da...@gmail.com>
> wrote:
> >
> > Hello,
> >
> > I'm new to streaming expressions, so I'm trying to understand their
> > features and limitations.
> > In particular the so-called "stream operators" implementing join
> operations.
> > Like "innerJoin", "leftOuterJoin", etc.
> >
> > I see that they support a "on" parameter, defining the *equality* check
> to
> > be performed.
> > But, coming from the SQL world, I'm used to being able to use a variety
> of
> > comparison operators in join predicates. That is, not only equality, as
> in
> > "equi-joins".
> >
> > Is there a reason why the current implementation of Solr supports
> > equi-joins only? Would it be technically possible (and desired) to
> support
> > other comparison operators with joins?
> > And maybe somehow allow the use of the available stream evaluators
> > <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?
> >
> > To give the context of my question: I'm trying to join 2 sets of
> documents
> > with a hierarchical relationship.
> > My goal is to join them using a "path" field on one side and
> > "descendent_path" field on the other side.
> > But it looks like that only doc values are accessible (and not analyzed
> > ones) in streams, so I suppose I'd be left with a join criteria like this
> > pseudo-code:
> >
> >>  on="starts_with(right.path, left.path)"
> >
> > Where, in this hypothetical example:
> >
> >>  left.path=/categories/category1"
> >>  right.path=/categories/category1/sub-categories/sub-category-a"
> >
> >
> > Or do I completely misunderstand how Solr (streams) work? ;-)
> > Thanks for your help!
> >
> > Regards,
> >
> > --
> > Damiano Albani
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

-- 
Damiano Albani

Re: Non equi-joins with streaming expressions

Posted by Eric Pugh <ep...@opensourceconnections.com>.
Damiano,  I don’t really have a direct answer for you.   However, one of the aspects of Streaming that I really like is that it’s relatively easy to create your own operators and add them to Solr.   I find that I often just create my own operator to fill in the gap of what is available.

I do think joining disparate datasets to make new datasets is one of the most interesting uses of Streaming, so would love to see what you cook up.

Eric

> On Dec 29, 2021, at 6:39 AM, Damiano Albani <da...@gmail.com> wrote:
> 
> Hello,
> 
> I'm new to streaming expressions, so I'm trying to understand their
> features and limitations.
> In particular the so-called "stream operators" implementing join operations.
> Like "innerJoin", "leftOuterJoin", etc.
> 
> I see that they support a "on" parameter, defining the *equality* check to
> be performed.
> But, coming from the SQL world, I'm used to being able to use a variety of
> comparison operators in join predicates. That is, not only equality, as in
> "equi-joins".
> 
> Is there a reason why the current implementation of Solr supports
> equi-joins only? Would it be technically possible (and desired) to support
> other comparison operators with joins?
> And maybe somehow allow the use of the available stream evaluators
> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?
> 
> To give the context of my question: I'm trying to join 2 sets of documents
> with a hierarchical relationship.
> My goal is to join them using a "path" field on one side and
> "descendent_path" field on the other side.
> But it looks like that only doc values are accessible (and not analyzed
> ones) in streams, so I suppose I'd be left with a join criteria like this
> pseudo-code:
> 
>>  on="starts_with(right.path, left.path)"
> 
> Where, in this hypothetical example:
> 
>>  left.path=/categories/category1"
>>  right.path=/categories/category1/sub-categories/sub-category-a"
> 
> 
> Or do I completely misunderstand how Solr (streams) work? ;-)
> Thanks for your help!
> 
> Regards,
> 
> -- 
> Damiano Albani

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.