You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Gus Heck <gu...@gmail.com> on 2018/10/20 14:18:09 UTC

Curiosity Q about placement of expressions classes

In prep for my talk at the conference I wound up having to make one quick
and dirty adjustment to the MaxMetric to handle dates (I need to think
about if I "fixed" the right thing there however). I'll probably open an
issue around that in a bit.

However, considering this post
https://lucidworks.com/2017/12/06/streaming-expressions-in-solrj/ and my
initial gut reaction of "why is THIS in solrj" I'm wondering what the
rational for having all of the streaming expressions classes (including
metrics) in the SolrJ area is. It seems more like something that happens on
the server side so I would have expected it to be in core.

-Gus

-- 
http://www.the111shift.com

Re: Curiosity Q about placement of expressions classes

Posted by Gus Heck <gu...@gmail.com>.
After digging some more, I'm feeling like the SolrJ API feels slightly
trappy WRT streaming. There's a positive example to follow in the ref guide
but unless you have a handy dandy Gandalf to tell you "stay on the path"
you could easily get yourself into murky territory :). Neither the
availability of classes nor the package names nor the javadoc, nor the
(official) documentation warn you against going down the "Hard way" in Erick
Erickson's blog post
<https://lucidworks.com/2017/12/06/streaming-expressions-in-solrj/>. In
past projects I have typically used the recommended way more or less
because it is a nice thing to be able to test stuff in the solr admin ui,
but his blog post made me say "Yikes!" because I could totally see myself
blundering into the hard way looking for ways to avoid 80 lines of string
concatenation in my java class, or trying to re-use repetitive stuff from a
300 line expression that is mostly repeated patterns... both are pain
points I have had but have not yet had time to refactor.

Also, streaming has grown to be an enormous number of classes adding a lot
of weight to the solrj jar (343 out of 501 classes and 39,000 lines out of
71,000 total lines of java under o.a.s.c.solrj are in o.a.s.c.solrj.io).

I spent my "code holiday" yesterday digging into the feasibility of making
things more intuitive via hiding the majority of the "not recommended for
direct use" classes in solr-core so here's my thoughts/finding:

   1. Back compatibility is a key issue that might can the whole idea.
   Early implementations may have been based on the pattern shown in the 7.2
   ref guide and reference the classes
   2. StreamFactory probably should be/implement an interface
   3. To keep all the low level classes out of the way we would need to not
   parse the expression on the client
   4. Too do that we might create special expression that can be serialized
   to carry the string, and evaluated on the server.
   5. That makes explain functionality need it's own round trip which is
   where things get hard/irritating

Such a change is a fairly major undertaking, and maybe not worth it unless
lots of other folks care.

Looking back at Erick's post I think it's slightly biased toward the case
where zk isn't available. Looking at the ref guide example it seems like it
could be simplified further by letting CloudSolrClient manage the
StreamFactory... then the ref guide example
<https://lucene.apache.org/solr/guide/7_5/streaming-expressions.html#streaming-requests-and-responses>could
look like:

TupleStream stream = client.constructStream("....");

And maybe the easy to remember general guideline we should work towards in
solrj is that the Clients listed in the ref guide
<https://lucene.apache.org/solr/guide/7_2/using-solrj.html#types-of-solrclients>
are "the path"... If you get it from the client it's safe/supported to play
with (it's "on the path" so to speak). Anything else is "off the path,
bring your eleven sword and be prepared to hire a hobbit to rescue you". I
also have ideas about substitution/templating but that's for another
thread/ticket.


On Sat, Oct 20, 2018 at 4:32 PM Gus Heck <gu...@gmail.com> wrote:

> Hi Shawn,
>
> Yeah, I understand that's the general logic but the recommendation in the
> above link is to avoid using *most* of the classes and supply a string
> representing the expression.  A user seeing all these classes in the
> javadoc (or ide) would easily think that they should be using them.
>
> After looking back at the 7.2 refguide I see that there used to be an
> example that explicitly set up names for classes, but that's now done
> automatically via o.a.s.streaming.Lang (yay for standardization! :) ), so
> there is no need to reference the vast majority of the streaming classes
> directly (unless ignoring Eric's advice in the linked post)
>
> This morning for fun I tested what it takes to move everything in the
> o.a.s.solrj.io package to core as a package names o.a.s.streaming, and
> only a few unit test configs got sticky. There were no cross refs to
> constants or uses by outside classes. I think things may have progressed to
> the point where stuff users shouldn't be using could be pulled down to core
> before they do get entangled in dependencies.
>
> Of course one can't just move everything, one has to leave behind
> something to facilitate the execution of streaming... but from the looks of
> the example here:
> https://lucene.apache.org/solr/guide/7_5/streaming-expressions.html this
> process needs only a zkhost, a collection and a string containing the
> expression, so it seems like the string should pass directly to a cloud
> solr client (which knows about a zkhost and collection already) and not
> require any special classes beyond the TupleStream return value and Tuple
> (produced by TupleStream)... plus Explanation and StreamComparator which
> are returned by other methods on TupleStream.
>
> -Gus
>
> On Sat, Oct 20, 2018, 2:39 PM Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 10/20/2018 8:34 AM, Gus Heck wrote:
>> > To put it another way, I'm not sure why this statement from that
>> > article must be true: "SolrJ is what’s used for the communication
>> > between the Solr node, so this level must be exposed."
>>
>> All communication between Solr nodes uses SolrJ.  SolrJ is an integral
>> part of the server as well as a jar providing a standalone client.
>>
>> Many of the string constants that Solr provides are actually located in
>> SolrJ, because they are useful for both client and server operations.
>> Take a look at the CommonParams class.
>>
>> Streaming expressions are something that users want to do with the
>> client, so it makes sense for some significant parts of it to live in
>> the client code.
>>
>> Thanks,
>> Shawn
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

-- 
http://www.the111shift.com

Re: Curiosity Q about placement of expressions classes

Posted by Gus Heck <gu...@gmail.com>.
Hi Shawn,

Yeah, I understand that's the general logic but the recommendation in the
above link is to avoid using *most* of the classes and supply a string
representing the expression.  A user seeing all these classes in the
javadoc (or ide) would easily think that they should be using them.

After looking back at the 7.2 refguide I see that there used to be an
example that explicitly set up names for classes, but that's now done
automatically via o.a.s.streaming.Lang (yay for standardization! :) ), so
there is no need to reference the vast majority of the streaming classes
directly (unless ignoring Eric's advice in the linked post)

This morning for fun I tested what it takes to move everything in the
o.a.s.solrj.io package to core as a package names o.a.s.streaming, and only
a few unit test configs got sticky. There were no cross refs to constants
or uses by outside classes. I think things may have progressed to the point
where stuff users shouldn't be using could be pulled down to core before
they do get entangled in dependencies.

Of course one can't just move everything, one has to leave behind something
to facilitate the execution of streaming... but from the looks of the
example here:
https://lucene.apache.org/solr/guide/7_5/streaming-expressions.html this
process needs only a zkhost, a collection and a string containing the
expression, so it seems like the string should pass directly to a cloud
solr client (which knows about a zkhost and collection already) and not
require any special classes beyond the TupleStream return value and Tuple
(produced by TupleStream)... plus Explanation and StreamComparator which
are returned by other methods on TupleStream.

-Gus

On Sat, Oct 20, 2018, 2:39 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/20/2018 8:34 AM, Gus Heck wrote:
> > To put it another way, I'm not sure why this statement from that
> > article must be true: "SolrJ is what’s used for the communication
> > between the Solr node, so this level must be exposed."
>
> All communication between Solr nodes uses SolrJ.  SolrJ is an integral
> part of the server as well as a jar providing a standalone client.
>
> Many of the string constants that Solr provides are actually located in
> SolrJ, because they are useful for both client and server operations.
> Take a look at the CommonParams class.
>
> Streaming expressions are something that users want to do with the
> client, so it makes sense for some significant parts of it to live in
> the client code.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Curiosity Q about placement of expressions classes

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/20/2018 8:34 AM, Gus Heck wrote:
> To put it another way, I'm not sure why this statement from that 
> article must be true: "SolrJ is what’s used for the communication 
> between the Solr node, so this level must be exposed."

All communication between Solr nodes uses SolrJ.  SolrJ is an integral 
part of the server as well as a jar providing a standalone client.

Many of the string constants that Solr provides are actually located in 
SolrJ, because they are useful for both client and server operations.  
Take a look at the CommonParams class.

Streaming expressions are something that users want to do with the 
client, so it makes sense for some significant parts of it to live in 
the client code.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Curiosity Q about placement of expressions classes

Posted by Gus Heck <gu...@gmail.com>.
To put it another way, I'm not sure why this statement from that article
must be true: "SolrJ is what’s used for the communication between the Solr
node, so this level must be exposed."



On Sat, Oct 20, 2018 at 10:18 AM Gus Heck <gu...@gmail.com> wrote:

> In prep for my talk at the conference I wound up having to make one quick
> and dirty adjustment to the MaxMetric to handle dates (I need to think
> about if I "fixed" the right thing there however). I'll probably open an
> issue around that in a bit.
>
> However, considering this post
> https://lucidworks.com/2017/12/06/streaming-expressions-in-solrj/ and my
> initial gut reaction of "why is THIS in solrj" I'm wondering what the
> rational for having all of the streaming expressions classes (including
> metrics) in the SolrJ area is. It seems more like something that happens on
> the server side so I would have expected it to be in core.
>
> -Gus
>
> --
> http://www.the111shift.com
>


-- 
http://www.the111shift.com