You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2017/05/02 01:42:20 UTC

Joining more than 2 collections

Hi,

Is it possible to join more than 2 collections using one of the streaming
expressions (Eg: innerJoin)? If not, is there other ways we can do it?

Currently, I may need to join 3 or 4 collections together, and to output
selected fields from all these collections together.

I'm using Solr 6.4.2.

Regards,
Edwin

Re: Joining more than 2 collections

Posted by Joel Bernstein <jo...@gmail.com>.
Yeah, the newest configurations are in implicitPlugins.json. So in the
standard release now there is nothing about the /export handler in the
solrconfig.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 4, 2017 at 11:38 AM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi Joel,
>
> I think that might be one of the reason.
> This is what I have for the /export handler in my solrconfig.xml
>
> <requestHandler name="/export" class="solr.SearchHandler"> <lst name=
> "invariants"> <str name="rq">{!xport}</str> <str name="wt">xsort</str> <str
> name="distrib">false</str> </lst> <arr name="components"> <str>query</str>
> </arr> </requestHandler>
>
> This is the error message that I get when I use the /export handler.
>
> java.io.IOException: java.util.concurrent.ExecutionException:
> java.io.IOException: -->
> http://localhost:8983/solr/collection1_shard1_replica1/: An exception has
> occurred on the server, refer to server log for details.
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> openStreams(CloudSolrStream.java:451)
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> open(CloudSolrStream.java:308)
> at
> org.apache.solr.client.solrj.io.stream.PushBackStream.open(
> PushBackStream.java:70)
> at
> org.apache.solr.client.solrj.io.stream.JoinStream.open(
> JoinStream.java:147)
> at
> org.apache.solr.client.solrj.io.stream.ExceptionStream.
> open(ExceptionStream.java:51)
> at
> org.apache.solr.handler.StreamHandler$TimerStream.
> open(StreamHandler.java:457)
> at
> org.apache.solr.client.solrj.io.stream.TupleStream.
> writeMap(TupleStream.java:63)
> at org.apache.solr.response.JSONWriter.writeMap(
> JSONResponseWriter.java:547)
> at
> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWriter.java:193)
> at
> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> JSONResponseWriter.java:209)
> at
> org.apache.solr.response.JSONWriter.writeNamedList(
> JSONResponseWriter.java:325)
> at
> org.apache.solr.response.JSONWriter.writeResponse(
> JSONResponseWriter.java:120)
> at
> org.apache.solr.response.JSONResponseWriter.write(
> JSONResponseWriter.java:71)
> at
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> QueryResponseWriterUtil.java:65)
> at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrCall.java:732)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:345)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:296)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> executeProduceConsume(ExecuteProduceConsume.java:303)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceConsume(ExecuteProduceConsume.java:148)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:136)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.ExecutionException: java.io.IOException:
> --> http://localhost:8983/solr/collection1_shard1_replica1/: An exception
> has occurred on the server, refer to server log for details.
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> openStreams(CloudSolrStream.java:445)
> ... 42 more
> Caused by: java.io.IOException: -->
> http://localhost:8983/solr/collection1_shard1_replica1/: An exception has
> occurred on the server, refer to server log for details.
> at
> org.apache.solr.client.solrj.io.stream.SolrStream.read(
> SolrStream.java:238)
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.next(
> CloudSolrStream.java:541)
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream$StreamOpener.call(
> CloudSolrStream.java:564)
> at
> org.apache.solr.client.solrj.io.stream.CloudSolrStream$StreamOpener.call(
> CloudSolrStream.java:551)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.
> lambda$execute$0(ExecutorUtil.java:229)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> ... 1 more
> Caused by: org.noggit.JSONParser$ParseException: JSON Parse Error:
> char=<,position=0 BEFORE='<' AFTER='?xml version="1.0" encoding="UTF-8"?>
> <'
> at org.noggit.JSONParser.err(JSONParser.java:356)
> at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.java:712)
> at org.noggit.JSONParser.next(JSONParser.java:886)
> at org.noggit.JSONParser.nextEvent(JSONParser.java:930)
> at
> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> expect(JSONTupleStream.java:97)
> at
> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> advanceToDocs(JSONTupleStream.java:179)
> at
> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> next(JSONTupleStream.java:77)
> at
> org.apache.solr.client.solrj.io.stream.SolrStream.read(
> SolrStream.java:207)
> ... 8 more
>
>
> Regards,
> Edwin
>
>
> On 4 May 2017 at 22:54, Joel Bernstein <jo...@gmail.com> wrote:
>
> > I suspect that there is something not quite right about the how the
> /export
> > handler is configured. Straight out of the box in solr 6.4.2  /export
> will
> > be automatically configured. Are you using a Solr instance that has been
> > upgraded in the past and doesn't have standard 6.4.2 configs?
> >
> > To really do joins properly you'll have to use the /export handler
> because
> > /select will not stream entire result sets (unless they are pretty
> small).
> > So your results will be missing data possibly.
> >
> > I would take a close look at the logs and see what all the exceptions are
> > when you run the a search using qt=/export. If you can post all the stack
> > traces that get generated when you run the search we'll probably be able
> to
> > spot the issue.
> >
> > About the field ordering. There is support for field ordering in the
> > Streaming classes but only a few places actually enforce the order. The
> 6.5
> > SQL interface does keep the fields in order as does the new Tuple
> > expression in Solr 6.6. But the expressions you are working with
> currently
> > don't enforce field ordering.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com>
> > wrote:
> >
> > > Hi Joel,
> > >
> > > I have managed to get the Join to work, but so far it is only working
> > when
> > > I use qt="/select". It is not working when I use qt="/export".
> > >
> > > For the display of the field, is there a way to allow it to list them
> in
> > > the order which I want?
> > > Currently, the display is quite random, and I can get a field in
> > > collection1, followed by a field in collection3, then collection1
> again,
> > > and then collection2.
> > >
> > > It will be good if we can arrange the field to display in the order
> that
> > we
> > > want.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > >
> > > On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <ed...@gmail.com>
> > wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > It works when I started off with just one expression.
> > > >
> > > > Could it be that the data size is too big for export after the join,
> > > which
> > > > causes the error?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > > On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com> wrote:
> > > >
> > > >> I was just testing with the query below and it worked for me. Some
> of
> > > the
> > > >> error messages I was getting with the syntax was not what I was
> > > expecting
> > > >> though, so I'll look into the error handling. But the joins do work
> > when
> > > >> the syntax correct. The query below is joining to the same
> collection
> > > >> three
> > > >> times, but the mechanics are exactly the same joining three
> different
> > > >> tables. In this example each join narrows down the result set.
> > > >>
> > > >> hashJoin(parallel(collection2,
> > > >>                             workers=3,
> > > >>                             sort="id asc",
> > > >>                             innerJoin(search(collection2, q="*:*",
> > > >> fl="id",
> > > >> sort="id asc", qt="/export", partitionKeys="id"),
> > > >>                                             search(collection2,
> > > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> > > >> partitionKeys="id"),
> > > >>                                             on="id")),
> > > >>                 hashed=search(collection2, q="day_i:7", fl="id,
> > day_i",
> > > >> sort="id asc", qt="/export"),
> > > >>                 on="id")
> > > >>
> > > >> Joel Bernstein
> > > >> http://joelsolr.blogspot.com/
> > > >>
> > > >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <jo...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Start off with just this expression:
> > > >> >
> > > >> > search(collection2,
> > > >> >             q=*:*,
> > > >> >             fl="a_s,b_s,c_s,d_s,e_s",
> > > >> >             sort="a_s asc",
> > > >> >             qt="/export")
> > > >> >
> > > >> > And then check the logs for exceptions.
> > > >> >
> > > >> > Joel Bernstein
> > > >> > http://joelsolr.blogspot.com/
> > > >> >
> > > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> > > >> edwinyeozl@gmail.com
> > > >> > > wrote:
> > > >> >
> > > >> >> Hi Joel,
> > > >> >>
> > > >> >> I am getting this error after I change add qt=/export and removed
> > the
> > > >> rows
> > > >> >> param. Do you know what could be the reason?
> > > >> >>
> > > >> >> {
> > > >> >>   "error":{
> > > >> >>     "metadata":[
> > > >> >>       "error-class","org.apache.solr.common.SolrException",
> > > >> >>       "root-error-class","org.apache.http.
> MalformedChunkCodingExce
> > > >> >> ption"],
> > > >> >>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF
> > > >> expected
> > > >> >> at
> > > >> >> end of chunk",
> > > >> >>     "trace":"org.apache.solr.common.SolrException:
> > > >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at
> > end
> > > of
> > > >> >> chunk\r\n\tat
> > > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> > > >> >> iteMap$0(TupleStream.java:79)\r\n\tat
> > > >> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
> > > >> >> seWriter.java:523)\r\n\tat
> > > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> > > >> >> ponseWriter.java:175)\r\n\tat
> > > >> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
> > > >> >> .java:559)\r\n\tat
> > > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
> > > >> >> TupleStream.java:64)\r\n\tat
> > > >> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
> > > >> >> ter.java:547)\r\n\tat
> > > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> > > >> >> ponseWriter.java:193)\r\n\tat
> > > >> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
> > > >> >> ups(JSONResponseWriter.java:209)\r\n\tat
> > > >> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
> > > >> >> nseWriter.java:325)\r\n\tat
> > > >> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
> > > >> >> seWriter.java:120)\r\n\tat
> > > >> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
> > > >> >> seWriter.java:71)\r\n\tat
> > > >> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> > > >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> > > >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
> > > >> >> all.java:732)\r\n\tat
> > > >> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
> > > >> 473)\r\n\tat
> > > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> > > >> >> atchFilter.java:345)\r\n\tat
> > > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> > > >> >> atchFilter.java:296)\r\n\tat
> > > >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
> > > >> >> r(ServletHandler.java:1691)\r\n\tat
> > > >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
> > > >> >> dler.java:582)\r\n\tat
> > > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> > > >> >> Handler.java:143)\r\n\tat
> > > >> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
> > > >> >> ndler.java:548)\r\n\tat
> > > >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
> > > >> >> SessionHandler.java:226)\r\n\tat
> > > >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> > > >> >> ContextHandler.java:1180)\r\n\tat
> > > >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
> > > >> >> ler.java:512)\r\n\tat
> > > >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> > > >> >> SessionHandler.java:185)\r\n\tat
> > > >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> > > >> >> ContextHandler.java:1112)\r\n\tat
> > > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> > > >> >> Handler.java:141)\r\n\tat
> > > >> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
> > > >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> > > >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
> > > >> >> HandlerCollection.java:119)\r\n\tat
> > > >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
> > > >> >> erWrapper.java:134)\r\n\tat
> > > >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
> > > >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> > > >> java:320)\r\n\tat
> > > >> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
> > > >> >> ction.java:251)\r\n\tat
> > > >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> > > >> >> succeeded(AbstractConnection.java:273)\r\n\tat
> > > >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> > > >> java:95)\r\n\tat
> > > >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
> > > >> >> elEndPoint.java:93)\r\n\tat
> > > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> > > >> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
> > > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> > > >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> > > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> > > >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> > > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
> > > >> >> ThreadPool.java:671)\r\n\tat
> > > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
> > > >> >> hreadPool.java:589)\r\n\tat
> > > >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> > > >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at
> > end
> > > of
> > > >> >> chunk\r\n\tat
> > > >> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
> > > >> >> kedInputStream.java:255)\r\n\tat
> > > >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
> > > >> >> InputStream.java:227)\r\n\tat
> > > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> > > >> >> Stream.java:186)\r\n\tat
> > > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> > > >> >> Stream.java:215)\r\n\tat
> > > >> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
> > > >> >> tStream.java:316)\r\n\tat
> > > >> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
> > > >> >> nagedEntity.java:164)\r\n\tat
> > > >> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
> > > >> >> orInputStream.java:228)\r\n\tat
> > > >> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
> > > >> >> utStream.java:174)\r\n\tat
> > > >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:
> 378)\r\n\tat
> > > >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
> > > >> >> java.io.InputStreamReader.close(InputStreamReader.java:
> > 199)\r\n\tat
> > > >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
> > > >> >> (JSONTupleStream.java:92)\r\n\tat
> > > >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
> > > >> >> Stream.java:193)\r\n\tat
> > > >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
> > > >> >> (CloudSolrStream.java:464)\r\n\tat
> > > >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
> > > >> >> HashJoinStream.java:231)\r\n\tat
> > > >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
> > > >> >> (ExceptionStream.java:93)\r\n\tat
> > > >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> > > >> >> StreamHandler.java:452)\r\n\tat
> > > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> > > >> >> iteMap$0(TupleStream.java:71)\r\n\t...
> > > >> >> 40 more\r\n",
> > > >> >>     "code":500}}
> > > >> >>
> > > >> >>
> > > >> >> Regards,
> > > >> >> Edwin
> > > >> >>
> > > >> >>
> > > >> >> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com>
> wrote:
> > > >> >>
> > > >> >> > I've reformatted the expression below and made a few changes.
> You
> > > >> have
> > > >> >> put
> > > >> >> > things together properly. But these are MapReduce joins that
> > > require
> > > >> >> > exporting the entire result sets. So you will need to add
> > > qt=/export
> > > >> to
> > > >> >> all
> > > >> >> > the searches and remove the rows param. In Solr 6.6. there is a
> > new
> > > >> >> > "shuffle" expression that does this automatically.
> > > >> >> >
> > > >> >> > To test things you'll want to break down each expression and
> make
> > > >> sure
> > > >> >> it's
> > > >> >> > behaving as expected.
> > > >> >> >
> > > >> >> > For example first run each search. Then run the innerJoin, not
> in
> > > >> >> parallel
> > > >> >> > mode. Then run it in parallel mode. Then try the whole thing.
> > > >> >> >
> > > >> >> > hashJoin(parallel(collection2,
> > > >> >> >                             innerJoin(search(collection2,
> > > >> >> >                                                        q=*:*,
> > > >> >> >
> > > >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> > > >> >> >
> sort="a_s
> > > >> asc",
> > > >> >> >
> > > >> >> partitionKeys="a_s",
> > > >> >> >
> > > qt="/export"),
> > > >> >> >                                            search(collection1,
> > > >> >> >                                                        q=*:*,
> > > >> >> >
> > > >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > > >> >> >
> sort="a_s
> > > >> asc",
> > > >> >> >
> > > >> >>  partitionKeys="a_s",
> > > >> >> >
> > >  qt="/export"),
> > > >> >> >                                            on="a_s"),
> > > >> >> >                              workers="2",
> > > >> >> >                              sort="a_s asc"),
> > > >> >> >                hashed=search(collection3,
> > > >> >> >                                          q=*:*,
> > > >> >> >                                          fl="a_s,k_s,l_s",
> > > >> >> >                                          sort="a_s asc",
> > > >> >> >                                          qt="/export"),
> > > >> >> >               on="a_s")
> > > >> >> >
> > > >> >> > Joel Bernstein
> > > >> >> > http://joelsolr.blogspot.com/
> > > >> >> >
> > > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> > > >> >> edwinyeozl@gmail.com
> > > >> >> > >
> > > >> >> > wrote:
> > > >> >> >
> > > >> >> > > Hi Joel,
> > > >> >> > >
> > > >> >> > > Thanks for the clarification.
> > > >> >> > >
> > > >> >> > > Would like to check, is this the correct way to do the join?
> > > >> >> Currently, I
> > > >> >> > > could not get any results after putting in the hashJoin for
> the
> > > >> 3rd,
> > > >> >> > > smallerStream collection (collection3).
> > > >> >> > >
> > > >> >> > > http://localhost:8983/solr/collection1/stream?expr=
> > > >> >> > > hashJoin(parallel(collection2
> > > >> >> > > ,
> > > >> >> > > innerJoin(
> > > >> >> > >  search(collection2,
> > > >> >> > > q=*:*,
> > > >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> > > >> >> > >              sort="a_s asc",
> > > >> >> > > partitionKeys="a_s",
> > > >> >> > > rows=200),
> > > >> >> > >  search(collection1,
> > > >> >> > > q=*:*,
> > > >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > > >> >> > >              sort="a_s asc",
> > > >> >> > > partitionKeys="a_s",
> > > >> >> > > rows=200),
> > > >> >> > >          on="a_s"),
> > > >> >> > > workers="2",
> > > >> >> > >                  sort="a_s asc"),
> > > >> >> > >          hashed=search(collection3,
> > > >> >> > > q=*:*,
> > > >> >> > > fl="a_s,k_s,l_s",
> > > >> >> > > sort="a_s asc",
> > > >> >> > > rows=200),
> > > >> >> > > on="a_s")
> > > >> >> > > &indent=true
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > Regards,
> > > >> >> > > Edwin
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com>
> > > wrote:
> > > >> >> > >
> > > >> >> > > > Sorry, it's just called hashJoin
> > > >> >> > > >
> > > >> >> > > > Joel Bernstein
> > > >> >> > > > http://joelsolr.blogspot.com/
> > > >> >> > > >
> > > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> > > >> >> > > edwinyeozl@gmail.com>
> > > >> >> > > > wrote:
> > > >> >> > > >
> > > >> >> > > > > Hi Joel,
> > > >> >> > > > >
> > > >> >> > > > > I am getting this error when I used the innerHashJoin.
> > > >> >> > > > >
> > > >> >> > > > >  "EXCEPTION":"Invalid stream expression
> > > innerHashJoin(parallel(
> > > >> >> > > innerJoin
> > > >> >> > > > >
> > > >> >> > > > > I also can't find the documentation on innerHashJoin for
> > the
> > > >> >> > Streaming
> > > >> >> > > > > Expressions.
> > > >> >> > > > >
> > > >> >> > > > > Are you referring to hashJoin?
> > > >> >> > > > >
> > > >> >> > > > > Regards,
> > > >> >> > > > > Edwin
> > > >> >> > > > >
> > > >> >> > > > >
> > > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> > > >> edwinyeozl@gmail.com
> > > >> >> >
> > > >> >> > > > wrote:
> > > >> >> > > > >
> > > >> >> > > > > > Hi Joel,
> > > >> >> > > > > >
> > > >> >> > > > > > Thanks for the info.
> > > >> >> > > > > >
> > > >> >> > > > > > Regards,
> > > >> >> > > > > > Edwin
> > > >> >> > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <
> > joelsolr@gmail.com
> > > >
> > > >> >> wrote:
> > > >> >> > > > > >
> > > >> >> > > > > >> Also take a look at the documentation for the "fetch"
> > > >> streaming
> > > >> >> > > > > >> expression.
> > > >> >> > > > > >>
> > > >> >> > > > > >> Joel Bernstein
> > > >> >> > > > > >> http://joelsolr.blogspot.com/
> > > >> >> > > > > >>
> > > >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> > > >> >> > joelsolr@gmail.com>
> > > >> >> > > > > >> wrote:
> > > >> >> > > > > >>
> > > >> >> > > > > >> > Yes you join more then one collection with Streaming
> > > >> >> > Expressions.
> > > >> >> > > > Here
> > > >> >> > > > > >> are
> > > >> >> > > > > >> > a few things to keep in mind.
> > > >> >> > > > > >> >
> > > >> >> > > > > >> > * You'll likely want to use the parallel function
> > around
> > > >> the
> > > >> >> > > largest
> > > >> >> > > > > >> join.
> > > >> >> > > > > >> > You'll need to use the join keys as the
> partitionKeys.
> > > >> >> > > > > >> > * innerJoin: requires that the streams be sorted on
> > the
> > > >> join
> > > >> >> > keys.
> > > >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> > > >> >> > > > > >> >
> > > >> >> > > > > >> > So a strategy for a three collection join might look
> > > like
> > > >> >> this:
> > > >> >> > > > > >> >
> > > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
> > > bigStream)),
> > > >> >> > > > > smallerStream)
> > > >> >> > > > > >> >
> > > >> >> > > > > >> > The largest join can be done in parallel using an
> > > >> innerJoin.
> > > >> >> You
> > > >> >> > > can
> > > >> >> > > > > >> then
> > > >> >> > > > > >> > wrap the stream coming out of the parallel function
> in
> > > an
> > > >> >> > > > > innerHashJoin
> > > >> >> > > > > >> to
> > > >> >> > > > > >> > join it to another stream.
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >> > Joel Bernstein
> > > >> >> > > > > >> > http://joelsolr.blogspot.com/
> > > >> >> > > > > >> >
> > > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo
> <
> > > >> >> > > > > >> edwinyeozl@gmail.com>
> > > >> >> > > > > >> > wrote:
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >> Hi,
> > > >> >> > > > > >> >>
> > > >> >> > > > > >> >> Is it possible to join more than 2 collections
> using
> > > one
> > > >> of
> > > >> >> the
> > > >> >> > > > > >> streaming
> > > >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there other
> > > ways
> > > >> we
> > > >> >> can
> > > >> >> > > do
> > > >> >> > > > > it?
> > > >> >> > > > > >> >>
> > > >> >> > > > > >> >> Currently, I may need to join 3 or 4 collections
> > > >> together,
> > > >> >> and
> > > >> >> > to
> > > >> >> > > > > >> output
> > > >> >> > > > > >> >> selected fields from all these collections
> together.
> > > >> >> > > > > >> >>
> > > >> >> > > > > >> >> I'm using Solr 6.4.2.
> > > >> >> > > > > >> >>
> > > >> >> > > > > >> >> Regards,
> > > >> >> > > > > >> >> Edwin
> > > >> >> > > > > >> >>
> > > >> >> > > > > >> >
> > > >> >> > > > > >> >
> > > >> >> > > > > >>
> > > >> >> > > > > >
> > > >> >> > > > > >
> > > >> >> > > > >
> > > >> >> > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Thanks for the explanation.

Yes, all my join keys are the same, so I think both should be ok too.

All my 3 collections have a lot of records, but for my last collection, I'm
only extracting a few of the fields (about 5) to be shown.

So does this considered that I have three very large joins?

Regards,
Edwin



On 5 May 2017 at 23:37, Joel Bernstein <jo...@gmail.com> wrote:

> *:* queries will work fine for the innerJoin, which is a merge join that
> never runs out of memory. The hashJoin read the entire "hashed" query into
> memory though, so there are limitations.
>
> So if you have three very large joins that require *:* then the hashJoin
> approach will be problematic. In that case you could use fetch() around the
> innerJoin to do the third join.
>
> parallel(fetch(innerJoin(search(), search())))
>
> Or if the hashJoin uses the same key as the innerJoin you can do the
> hashJoin in parallel as well and partition the "hashed" search across the
> workers:
>
> parallel(hashJoin(innerJoin(search(), search()), hashed=search())))
>
> In this case the "hashed" search partitionKeys would be the same as the
> innerJoin searches. But the join keys must be same for this scenario to
> work.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, May 5, 2017 at 11:17 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> wrote:
>
> > I found that using *:* will return the entire resultset, and cause the
> > result from the join query to blow up.
> >
> > Like if from the query, there are 2 results in collection1, and 3 results
> > in collection2, I found that there could be 6 results that will be
> returned
> > in the join query (using hashJoin or innerJoin).
> >
> > Is that correct?
> >
> > Regards,
> > Edwin
> >
> >
> > On 5 May 2017 at 07:17, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> >
> > > Hi Joel,
> > >
> > > Yes, the /export works after I remove the /export handler from
> > > solrconfig.xml. Thanks for the advice.
> > >
> > > For *:*, there will be result returned when using /export.
> > > But if one of the queries is *:*, this means the entire resultset will
> > > contains all the records from the query which has *:*?
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 5 May 2017 at 01:46, Joel Bernstein <jo...@gmail.com> wrote:
> > >
> > >> No *:* will simply return all the results from one of the queries. It
> > >> should still join properly. If you are using the /select handler joins
> > >> will
> > >> not work properly.
> > >>
> > >>
> > >> This example worked properly for me:
> > >>
> > >> hashJoin(parallel(collection2, j
> > >>                             workers=3,
> > >>                             sort="id asc",
> > >>                             innerJoin(search(collection2, q="*:*",
> > >> fl="id",
> > >> sort="id asc", qt="/export", partitionKeys="id"),
> > >>                                             search(collection2,
> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> > >> partitionKeys="id"),
> > >>                                             on="id")),
> > >>                 hashed=search(collection2, q="day_i:7", fl="id,
> day_i",
> > >> sort="id asc", qt="/export"),
> > >>                 on="id")
> > >>
> > >>
> > >>
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Thu, May 4, 2017 at 12:28 PM, Zheng Lin Edwin Yeo <
> > >> edwinyeozl@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Joel,
> > >> >
> > >> > For the join queries, is it true that if we use q=*:* for the query
> > for
> > >> one
> > >> > of the join, there will not be any results return?
> > >> >
> > >> > Currently I found this is the case, if I just put q=*:*.
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> >
> > >> >
> > >> > On 4 May 2017 at 23:38, Zheng Lin Edwin Yeo <ed...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hi Joel,
> > >> > >
> > >> > > I think that might be one of the reason.
> > >> > > This is what I have for the /export handler in my solrconfig.xml
> > >> > >
> > >> > > <requestHandler name="/export" class="solr.SearchHandler"> <lst
> > name=
> > >> > > "invariants"> <str name="rq">{!xport}</str> <str
> > >> name="wt">xsort</str> <
> > >> > > str name="distrib">false</str> </lst> <arr name="components">
> > >> > <str>query</
> > >> > > str> </arr> </requestHandler>
> > >> > >
> > >> > > This is the error message that I get when I use the /export
> handler.
> > >> > >
> > >> > > java.io.IOException: java.util.concurrent.ExecutionException:
> > >> > > java.io.IOException: --> http://localhost:8983/solr/
> > >> > > collection1_shard1_replica1/: An exception has occurred on the
> > server,
> > >> > > refer to server log for details.
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > >> > > openStreams(CloudSolrStream.java:451)
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > >> > > open(CloudSolrStream.java:308)
> > >> > > at org.apache.solr.client.solrj.io.stream.PushBackStream.open(
> > >> > > PushBackStream.java:70)
> > >> > > at org.apache.solr.client.solrj.io.stream.JoinStream.open(
> > >> > > JoinStream.java:147)
> > >> > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > >> > > open(ExceptionStream.java:51)
> > >> > > at org.apache.solr.handler.StreamHandler$TimerStream.
> > >> > > open(StreamHandler.java:457)
> > >> > > at org.apache.solr.client.solrj.io.stream.TupleStream.
> > >> > > writeMap(TupleStream.java:63)
> > >> > > at org.apache.solr.response.JSONWriter.writeMap(
> > >> > > JSONResponseWriter.java:547)
> > >> > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > >> > > TextResponseWriter.java:193)
> > >> > > at org.apache.solr.response.JSONWriter.
> writeNamedListAsMapWithDups(
> > >> > > JSONResponseWriter.java:209)
> > >> > > at org.apache.solr.response.JSONWriter.writeNamedList(
> > >> > > JSONResponseWriter.java:325)
> > >> > > at org.apache.solr.response.JSONWriter.writeResponse(
> > >> > > JSONResponseWriter.java:120)
> > >> > > at org.apache.solr.response.JSONResponseWriter.write(
> > >> > > JSONResponseWriter.java:71)
> > >> > > at org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> > >> esponse(
> > >> > > QueryResponseWriterUtil.java:65)
> > >> > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > >> > > HttpSolrCall.java:732)
> > >> > > at org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:473)
> > >> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > >> > > SolrDispatchFilter.java:345)
> > >> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > >> > > SolrDispatchFilter.java:296)
> > >> > > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > >> > > doFilter(ServletHandler.java:1691)
> > >> > > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > >> > > ServletHandler.java:582)
> > >> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > >> > > ScopedHandler.java:143)
> > >> > > at org.eclipse.jetty.security.SecurityHandler.handle(
> > >> > > SecurityHandler.java:548)
> > >> > > at org.eclipse.jetty.server.session.SessionHandler.
> > >> > > doHandle(SessionHandler.java:226)
> > >> > > at org.eclipse.jetty.server.handler.ContextHandler.
> > >> > > doHandle(ContextHandler.java:1180)
> > >> > > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> > >> > > ServletHandler.java:512)
> > >> > > at org.eclipse.jetty.server.session.SessionHandler.
> > >> > > doScope(SessionHandler.java:185)
> > >> > > at org.eclipse.jetty.server.handler.ContextHandler.
> > >> > > doScope(ContextHandler.java:1112)
> > >> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > >> > > ScopedHandler.java:141)
> > >> > > at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> > handle(
> > >> > > ContextHandlerCollection.java:213)
> > >> > > at org.eclipse.jetty.server.handler.HandlerCollection.
> > >> > > handle(HandlerCollection.java:119)
> > >> > > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > >> > > HandlerWrapper.java:134)
> > >> > > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > >> > > at org.eclipse.jetty.server.HttpChannel.handle(
> > HttpChannel.java:320)
> > >> > > at org.eclipse.jetty.server.HttpConnection.onFillable(
> > >> > > HttpConnection.java:251)
> > >> > > at org.eclipse.jetty.io.AbstractConnection$
> ReadCallback.succeeded(
> > >> > > AbstractConnection.java:273)
> > >> > > at org.eclipse.jetty.io.FillInterest.fillable(
> FillInterest.java:95)
> > >> > > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> > >> > > SelectChannelEndPoint.java:93)
> > >> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > >> > > executeProduceConsume(ExecuteProduceConsume.java:303)
> > >> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > >> > > produceConsume(ExecuteProduceConsume.java:148)
> > >> > > at org.eclipse.jetty.util.thread.strategy.
> > ExecuteProduceConsume.run(
> > >> > > ExecuteProduceConsume.java:136)
> > >> > > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> > >> > > QueuedThreadPool.java:671)
> > >> > > at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> > >> > > QueuedThreadPool.java:589)
> > >> > > at java.lang.Thread.run(Thread.java:745)
> > >> > > Caused by: java.util.concurrent.ExecutionException:
> > >> java.io.IOException:
> > >> > > --> http://localhost:8983/solr/collection1_shard1_replica1/: An
> > >> > exception
> > >> > > has occurred on the server, refer to server log for details.
> > >> > > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > >> > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > >> > > openStreams(CloudSolrStream.java:445)
> > >> > > ... 42 more
> > >> > > Caused by: java.io.IOException: --> http://localhost:8983/solr/
> > >> > > collection1_shard1_replica1/: An exception has occurred on the
> > server,
> > >> > > refer to server log for details.
> > >> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > >> > > SolrStream.java:238)
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > >> > > TupleWrapper.next(CloudSolrStream.java:541)
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > >> > > StreamOpener.call(CloudSolrStream.java:564)
> > >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > >> > > StreamOpener.call(CloudSolrStream.java:551)
> > >> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > >> > > at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
> > >> xecutor.
> > >> > > lambda$execute$0(ExecutorUtil.java:229)
> > >> > > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > >> > > ThreadPoolExecutor.java:1142)
> > >> > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > >> > > ThreadPoolExecutor.java:617)
> > >> > > ... 1 more
> > >> > > Caused by: org.noggit.JSONParser$ParseException: JSON Parse
> Error:
> > >> > > char=<,position=0 BEFORE='<' AFTER='?xml version="1.0"
> > >> > encoding="UTF-8"?> <'
> > >> > > at org.noggit.JSONParser.err(JSONParser.java:356)
> > >> > > at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.
> > >> java:712)
> > >> > > at org.noggit.JSONParser.next(JSONParser.java:886)
> > >> > > at org.noggit.JSONParser.nextEvent(JSONParser.java:930)
> > >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > >> > > expect(JSONTupleStream.java:97)
> > >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > >> > > advanceToDocs(JSONTupleStream.java:179)
> > >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > >> > > next(JSONTupleStream.java:77)
> > >> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > >> > > SolrStream.java:207)
> > >> > > ... 8 more
> > >> > >
> > >> > >
> > >> > > Regards,
> > >> > > Edwin
> > >> > >
> > >> > >
> > >> > > On 4 May 2017 at 22:54, Joel Bernstein <jo...@gmail.com>
> wrote:
> > >> > >
> > >> > >> I suspect that there is something not quite right about the how
> the
> > >> > >> /export
> > >> > >> handler is configured. Straight out of the box in solr 6.4.2
> > /export
> > >> > will
> > >> > >> be automatically configured. Are you using a Solr instance that
> has
> > >> been
> > >> > >> upgraded in the past and doesn't have standard 6.4.2 configs?
> > >> > >>
> > >> > >> To really do joins properly you'll have to use the /export
> handler
> > >> > because
> > >> > >> /select will not stream entire result sets (unless they are
> pretty
> > >> > small).
> > >> > >> So your results will be missing data possibly.
> > >> > >>
> > >> > >> I would take a close look at the logs and see what all the
> > exceptions
> > >> > are
> > >> > >> when you run the a search using qt=/export. If you can post all
> the
> > >> > stack
> > >> > >> traces that get generated when you run the search we'll probably
> be
> > >> able
> > >> > >> to
> > >> > >> spot the issue.
> > >> > >>
> > >> > >> About the field ordering. There is support for field ordering in
> > the
> > >> > >> Streaming classes but only a few places actually enforce the
> order.
> > >> The
> > >> > >> 6.5
> > >> > >> SQL interface does keep the fields in order as does the new Tuple
> > >> > >> expression in Solr 6.6. But the expressions you are working with
> > >> > currently
> > >> > >> don't enforce field ordering.
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> Joel Bernstein
> > >> > >> http://joelsolr.blogspot.com/
> > >> > >>
> > >> > >> On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <
> > >> > edwinyeozl@gmail.com
> > >> > >> >
> > >> > >> wrote:
> > >> > >>
> > >> > >> > Hi Joel,
> > >> > >> >
> > >> > >> > I have managed to get the Join to work, but so far it is only
> > >> working
> > >> > >> when
> > >> > >> > I use qt="/select". It is not working when I use qt="/export".
> > >> > >> >
> > >> > >> > For the display of the field, is there a way to allow it to
> list
> > >> them
> > >> > in
> > >> > >> > the order which I want?
> > >> > >> > Currently, the display is quite random, and I can get a field
> in
> > >> > >> > collection1, followed by a field in collection3, then
> collection1
> > >> > again,
> > >> > >> > and then collection2.
> > >> > >> >
> > >> > >> > It will be good if we can arrange the field to display in the
> > order
> > >> > >> that we
> > >> > >> > want.
> > >> > >> >
> > >> > >> > Regards,
> > >> > >> > Edwin
> > >> > >> >
> > >> > >> >
> > >> > >> >
> > >> > >> > On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <
> > edwinyeozl@gmail.com>
> > >> > >> wrote:
> > >> > >> >
> > >> > >> > > Hi Joel,
> > >> > >> > >
> > >> > >> > > It works when I started off with just one expression.
> > >> > >> > >
> > >> > >> > > Could it be that the data size is too big for export after
> the
> > >> join,
> > >> > >> > which
> > >> > >> > > causes the error?
> > >> > >> > >
> > >> > >> > > Regards,
> > >> > >> > > Edwin
> > >> > >> > >
> > >> > >> > > On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com>
> > >> wrote:
> > >> > >> > >
> > >> > >> > >> I was just testing with the query below and it worked for
> me.
> > >> Some
> > >> > of
> > >> > >> > the
> > >> > >> > >> error messages I was getting with the syntax was not what I
> > was
> > >> > >> > expecting
> > >> > >> > >> though, so I'll look into the error handling. But the joins
> do
> > >> work
> > >> > >> when
> > >> > >> > >> the syntax correct. The query below is joining to the same
> > >> > collection
> > >> > >> > >> three
> > >> > >> > >> times, but the mechanics are exactly the same joining three
> > >> > different
> > >> > >> > >> tables. In this example each join narrows down the result
> set.
> > >> > >> > >>
> > >> > >> > >> hashJoin(parallel(collection2,
> > >> > >> > >>                             workers=3,
> > >> > >> > >>                             sort="id asc",
> > >> > >> > >>                             innerJoin(search(collection2,
> > >> q="*:*",
> > >> > >> > >> fl="id",
> > >> > >> > >> sort="id asc", qt="/export", partitionKeys="id"),
> > >> > >> > >>
> >  search(collection2,
> > >> > >> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> > >> > >> > >> partitionKeys="id"),
> > >> > >> > >>                                             on="id")),
> > >> > >> > >>                 hashed=search(collection2, q="day_i:7",
> > fl="id,
> > >> > >> day_i",
> > >> > >> > >> sort="id asc", qt="/export"),
> > >> > >> > >>                 on="id")
> > >> > >> > >>
> > >> > >> > >> Joel Bernstein
> > >> > >> > >> http://joelsolr.blogspot.com/
> > >> > >> > >>
> > >> > >> > >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <
> > >> joelsolr@gmail.com
> > >> > >
> > >> > >> > >> wrote:
> > >> > >> > >>
> > >> > >> > >> > Start off with just this expression:
> > >> > >> > >> >
> > >> > >> > >> > search(collection2,
> > >> > >> > >> >             q=*:*,
> > >> > >> > >> >             fl="a_s,b_s,c_s,d_s,e_s",
> > >> > >> > >> >             sort="a_s asc",
> > >> > >> > >> >             qt="/export")
> > >> > >> > >> >
> > >> > >> > >> > And then check the logs for exceptions.
> > >> > >> > >> >
> > >> > >> > >> > Joel Bernstein
> > >> > >> > >> > http://joelsolr.blogspot.com/
> > >> > >> > >> >
> > >> > >> > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> > >> > >> > >> edwinyeozl@gmail.com
> > >> > >> > >> > > wrote:
> > >> > >> > >> >
> > >> > >> > >> >> Hi Joel,
> > >> > >> > >> >>
> > >> > >> > >> >> I am getting this error after I change add qt=/export and
> > >> > removed
> > >> > >> the
> > >> > >> > >> rows
> > >> > >> > >> >> param. Do you know what could be the reason?
> > >> > >> > >> >>
> > >> > >> > >> >> {
> > >> > >> > >> >>   "error":{
> > >> > >> > >> >>     "metadata":[
> > >> > >> > >> >>       "error-class","org.apache.
> > solr.common.SolrException",
> > >> > >> > >> >>       "root-error-class","org.apache.http.
> > >> > MalformedChunkCodingExc
> > >> > >> e
> > >> > >> > >> >> ption"],
> > >> > >> > >> >>     "msg":"org.apache.http.
> MalformedChunkCodingException:
> > >> CRLF
> > >> > >> > >> expected
> > >> > >> > >> >> at
> > >> > >> > >> >> end of chunk",
> > >> > >> > >> >>     "trace":"org.apache.solr.common.SolrException:
> > >> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF
> > >> expected at
> > >> > >> end
> > >> > >> > of
> > >> > >> > >> >> chunk\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.
> io.stream.TupleStream.lambda$
> > wr
> > >> > >> > >> >> iteMap$0(TupleStream.java:79)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter.writeIterator(
> > JSONRespon
> > >> > >> > >> >> seWriter.java:523)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(
> > TextRes
> > >> > >> > >> >> ponseWriter.java:175)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter$2.put(
> > JSONResponseWriter
> > >> > >> > >> >> .java:559)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.
> > writeMap(
> > >> > >> > >> >> TupleStream.java:64)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter.writeMap(
> > JSONResponseWri
> > >> > >> > >> >> ter.java:547)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(
> > TextRes
> > >> > >> > >> >> ponseWriter.java:193)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter.
> > writeNamedListAsMapWithD
> > >> > >> > >> >> ups(JSONResponseWriter.java:209)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter.writeNamedList(
> > JSONRespo
> > >> > >> > >> >> nseWriter.java:325)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONWriter.writeResponse(
> > JSONRespon
> > >> > >> > >> >> seWriter.java:120)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.JSONResponseWriter.write(
> > JSONRespon
> > >> > >> > >> >> seWriter.java:71)\r\n\tat
> > >> > >> > >> >> org.apache.solr.response.QueryResponseWriterUtil.
> > writeQueryR
> > >> > >> > >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> > >> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > HttpSolrC
> > >> > >> > >> >> all.java:732)\r\n\tat
> > >> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.call(
> > HttpSolrCall.java:
> > >> > >> > >> 473)\r\n\tat
> > >> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDisp
> > >> > >> > >> >> atchFilter.java:345)\r\n\tat
> > >> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDisp
> > >> > >> > >> >> atchFilter.java:296)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > doFilte
> > >> > >> > >> >> r(ServletHandler.java:1691)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > ServletHan
> > >> > >> > >> >> dler.java:582)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > Scoped
> > >> > >> > >> >> Handler.java:143)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.security.SecurityHandler.handle(
> > SecurityHa
> > >> > >> > >> >> ndler.java:548)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(
> > >> > >> > >> >> SessionHandler.java:226)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(
> > >> > >> > >> >> ContextHandler.java:1180)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(
> > ServletHand
> > >> > >> > >> >> ler.java:512)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> > >> > >> > >> >> SessionHandler.java:185)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> > >> > >> > >> >> ContextHandler.java:1112)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > Scoped
> > >> > >> > >> >> Handler.java:141)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.
> > ContextHandlerCollection.ha
> > >> > >> > >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(
> > >> > >> > >> >> HandlerCollection.java:119)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > Handl
> > >> > >> > >> >> erWrapper.java:134)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)
> > \r\n\
> > >> tat
> > >> > >> > >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> > >> > >> > >> java:320)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.server.HttpConnection.onFillable(
> > HttpConne
> > >> > >> > >> >> ction.java:251)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> > >> > >> > >> >> succeeded(AbstractConnection.java:273)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> > >> > >> > >> java:95)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> > SelectChann
> > >> > >> > >> >> elEndPoint.java:93)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.util.thread.
> > strategy.ExecuteProduceConsume
> > >> > >> > >> >> .executeProduceConsume(ExecuteProduceConsume.java:
> > 303)\r\n\
> > >> tat
> > >> > >> > >> >> org.eclipse.jetty.util.thread.
> > strategy.ExecuteProduceConsume
> > >> > >> > >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.util.thread.
> > strategy.ExecuteProduceConsume
> > >> > >> > >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.util.thread.
> > QueuedThreadPool.runJob(Queued
> > >> > >> > >> >> ThreadPool.java:671)\r\n\tat
> > >> > >> > >> >> org.eclipse.jetty.util.thread.
> > QueuedThreadPool$2.run(QueuedT
> > >> > >> > >> >> hreadPool.java:589)\r\n\tat
> > >> > >> > >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> > >> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF
> > >> expected at
> > >> > >> end
> > >> > >> > of
> > >> > >> > >> >> chunk\r\n\tat
> > >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.
> > getChunkSize(Chun
> > >> > >> > >> >> kedInputStream.java:255)\r\n\tat
> > >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(
> > Chunked
> > >> > >> > >> >> InputStream.java:227)\r\n\tat
> > >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(
> > ChunkedInput
> > >> > >> > >> >> Stream.java:186)\r\n\tat
> > >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(
> > ChunkedInput
> > >> > >> > >> >> Stream.java:215)\r\n\tat
> > >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.close(
> > ChunkedInpu
> > >> > >> > >> >> tStream.java:316)\r\n\tat
> > >> > >> > >> >> org.apache.http.conn.BasicManagedEntity.
> > streamClosed(BasicMa
> > >> > >> > >> >> nagedEntity.java:164)\r\n\tat
> > >> > >> > >> >> org.apache.http.conn.EofSensorInputStream.
> > checkClose(EofSens
> > >> > >> > >> >> orInputStream.java:228)\r\n\tat
> > >> > >> > >> >> org.apache.http.conn.EofSensorInputStream.close(
> > EofSensorInp
> > >> > >> > >> >> utStream.java:174)\r\n\tat
> > >> > >> > >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:
> > 378)\
> > >> > >> r\n\tat
> > >> > >> > >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\
> > r\n\
> > >> tat
> > >> > >> > >> >> java.io.InputStreamReader.close(InputStreamReader.java:
> > 199)\
> > >> > >> r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > close
> > >> > >> > >> >> (JSONTupleStream.java:92)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(
> > Solr
> > >> > >> > >> >> Stream.java:193)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > close
> > >> > >> > >> >> (CloudSolrStream.java:464)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.
> > close(
> > >> > >> > >> >> HashJoinStream.java:231)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > close
> > >> > >> > >> >> (ExceptionStream.java:93)\r\n\tat
> > >> > >> > >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> > >> > >> > >> >> StreamHandler.java:452)\r\n\tat
> > >> > >> > >> >> org.apache.solr.client.solrj.
> io.stream.TupleStream.lambda$
> > wr
> > >> > >> > >> >> iteMap$0(TupleStream.java:71)\r\n\t...
> > >> > >> > >> >> 40 more\r\n",
> > >> > >> > >> >>     "code":500}}
> > >> > >> > >> >>
> > >> > >> > >> >>
> > >> > >> > >> >> Regards,
> > >> > >> > >> >> Edwin
> > >> > >> > >> >>
> > >> > >> > >> >>
> > >> > >> > >> >> On 4 May 2017 at 00:00, Joel Bernstein <
> joelsolr@gmail.com
> > >
> > >> > >> wrote:
> > >> > >> > >> >>
> > >> > >> > >> >> > I've reformatted the expression below and made a few
> > >> changes.
> > >> > >> You
> > >> > >> > >> have
> > >> > >> > >> >> put
> > >> > >> > >> >> > things together properly. But these are MapReduce joins
> > >> that
> > >> > >> > require
> > >> > >> > >> >> > exporting the entire result sets. So you will need to
> add
> > >> > >> > qt=/export
> > >> > >> > >> to
> > >> > >> > >> >> all
> > >> > >> > >> >> > the searches and remove the rows param. In Solr 6.6.
> > there
> > >> is
> > >> > a
> > >> > >> new
> > >> > >> > >> >> > "shuffle" expression that does this automatically.
> > >> > >> > >> >> >
> > >> > >> > >> >> > To test things you'll want to break down each
> expression
> > >> and
> > >> > >> make
> > >> > >> > >> sure
> > >> > >> > >> >> it's
> > >> > >> > >> >> > behaving as expected.
> > >> > >> > >> >> >
> > >> > >> > >> >> > For example first run each search. Then run the
> > innerJoin,
> > >> not
> > >> > >> in
> > >> > >> > >> >> parallel
> > >> > >> > >> >> > mode. Then run it in parallel mode. Then try the whole
> > >> thing.
> > >> > >> > >> >> >
> > >> > >> > >> >> > hashJoin(parallel(collection2,
> > >> > >> > >> >> >
>  innerJoin(search(collection2,
> > >> > >> > >> >> >
> > >> q=*:*,
> > >> > >> > >> >> >
> > >> > >> > >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> > >> > >> > >> >> >
> > >> > sort="a_s
> > >> > >> > >> asc",
> > >> > >> > >> >> >
> > >> > >> > >> >> partitionKeys="a_s",
> > >> > >> > >> >> >
> > >> > >> > qt="/export"),
> > >> > >> > >> >> >
> > >> search(collection1,
> > >> > >> > >> >> >
> > >> q=*:*,
> > >> > >> > >> >> >
> > >> > >> > >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > >> > >> > >> >> >
> > >> > sort="a_s
> > >> > >> > >> asc",
> > >> > >> > >> >> >
> > >> > >> > >> >>  partitionKeys="a_s",
> > >> > >> > >> >> >
> > >> > >> >  qt="/export"),
> > >> > >> > >> >> >                                            on="a_s"),
> > >> > >> > >> >> >                              workers="2",
> > >> > >> > >> >> >                              sort="a_s asc"),
> > >> > >> > >> >> >                hashed=search(collection3,
> > >> > >> > >> >> >                                          q=*:*,
> > >> > >> > >> >> >
> > fl="a_s,k_s,l_s",
> > >> > >> > >> >> >                                          sort="a_s
> asc",
> > >> > >> > >> >> >                                          qt="/export"),
> > >> > >> > >> >> >               on="a_s")
> > >> > >> > >> >> >
> > >> > >> > >> >> > Joel Bernstein
> > >> > >> > >> >> > http://joelsolr.blogspot.com/
> > >> > >> > >> >> >
> > >> > >> > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> > >> > >> > >> >> edwinyeozl@gmail.com
> > >> > >> > >> >> > >
> > >> > >> > >> >> > wrote:
> > >> > >> > >> >> >
> > >> > >> > >> >> > > Hi Joel,
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > Thanks for the clarification.
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > Would like to check, is this the correct way to do
> the
> > >> join?
> > >> > >> > >> >> Currently, I
> > >> > >> > >> >> > > could not get any results after putting in the
> hashJoin
> > >> for
> > >> > >> the
> > >> > >> > >> 3rd,
> > >> > >> > >> >> > > smallerStream collection (collection3).
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > http://localhost:8983/solr/collection1/stream?expr=
> > >> > >> > >> >> > > hashJoin(parallel(collection2
> > >> > >> > >> >> > > ,
> > >> > >> > >> >> > > innerJoin(
> > >> > >> > >> >> > >  search(collection2,
> > >> > >> > >> >> > > q=*:*,
> > >> > >> > >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> > >> > >> > >> >> > >              sort="a_s asc",
> > >> > >> > >> >> > > partitionKeys="a_s",
> > >> > >> > >> >> > > rows=200),
> > >> > >> > >> >> > >  search(collection1,
> > >> > >> > >> >> > > q=*:*,
> > >> > >> > >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > >> > >> > >> >> > >              sort="a_s asc",
> > >> > >> > >> >> > > partitionKeys="a_s",
> > >> > >> > >> >> > > rows=200),
> > >> > >> > >> >> > >          on="a_s"),
> > >> > >> > >> >> > > workers="2",
> > >> > >> > >> >> > >                  sort="a_s asc"),
> > >> > >> > >> >> > >          hashed=search(collection3,
> > >> > >> > >> >> > > q=*:*,
> > >> > >> > >> >> > > fl="a_s,k_s,l_s",
> > >> > >> > >> >> > > sort="a_s asc",
> > >> > >> > >> >> > > rows=200),
> > >> > >> > >> >> > > on="a_s")
> > >> > >> > >> >> > > &indent=true
> > >> > >> > >> >> > >
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > Regards,
> > >> > >> > >> >> > > Edwin
> > >> > >> > >> >> > >
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <
> > >> joelsolr@gmail.com>
> > >> > >> > wrote:
> > >> > >> > >> >> > >
> > >> > >> > >> >> > > > Sorry, it's just called hashJoin
> > >> > >> > >> >> > > >
> > >> > >> > >> >> > > > Joel Bernstein
> > >> > >> > >> >> > > > http://joelsolr.blogspot.com/
> > >> > >> > >> >> > > >
> > >> > >> > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin
> Yeo <
> > >> > >> > >> >> > > edwinyeozl@gmail.com>
> > >> > >> > >> >> > > > wrote:
> > >> > >> > >> >> > > >
> > >> > >> > >> >> > > > > Hi Joel,
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > I am getting this error when I used the
> > >> innerHashJoin.
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > >  "EXCEPTION":"Invalid stream expression
> > >> > >> > innerHashJoin(parallel(
> > >> > >> > >> >> > > innerJoin
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > I also can't find the documentation on
> > innerHashJoin
> > >> for
> > >> > >> the
> > >> > >> > >> >> > Streaming
> > >> > >> > >> >> > > > > Expressions.
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > Are you referring to hashJoin?
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > Regards,
> > >> > >> > >> >> > > > > Edwin
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> > >> > >> > >> edwinyeozl@gmail.com
> > >> > >> > >> >> >
> > >> > >> > >> >> > > > wrote:
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > > > > Hi Joel,
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > > Thanks for the info.
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > > Regards,
> > >> > >> > >> >> > > > > > Edwin
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <
> > >> > >> joelsolr@gmail.com
> > >> > >> > >
> > >> > >> > >> >> wrote:
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > >> Also take a look at the documentation for the
> > >> "fetch"
> > >> > >> > >> streaming
> > >> > >> > >> >> > > > > >> expression.
> > >> > >> > >> >> > > > > >>
> > >> > >> > >> >> > > > > >> Joel Bernstein
> > >> > >> > >> >> > > > > >> http://joelsolr.blogspot.com/
> > >> > >> > >> >> > > > > >>
> > >> > >> > >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel
> Bernstein <
> > >> > >> > >> >> > joelsolr@gmail.com>
> > >> > >> > >> >> > > > > >> wrote:
> > >> > >> > >> >> > > > > >>
> > >> > >> > >> >> > > > > >> > Yes you join more then one collection with
> > >> > Streaming
> > >> > >> > >> >> > Expressions.
> > >> > >> > >> >> > > > Here
> > >> > >> > >> >> > > > > >> are
> > >> > >> > >> >> > > > > >> > a few things to keep in mind.
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > * You'll likely want to use the parallel
> > >> function
> > >> > >> around
> > >> > >> > >> the
> > >> > >> > >> >> > > largest
> > >> > >> > >> >> > > > > >> join.
> > >> > >> > >> >> > > > > >> > You'll need to use the join keys as the
> > >> > >> partitionKeys.
> > >> > >> > >> >> > > > > >> > * innerJoin: requires that the streams be
> > >> sorted on
> > >> > >> the
> > >> > >> > >> join
> > >> > >> > >> >> > keys.
> > >> > >> > >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > So a strategy for a three collection join
> > might
> > >> > look
> > >> > >> > like
> > >> > >> > >> >> this:
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
> > >> > >> > bigStream)),
> > >> > >> > >> >> > > > > smallerStream)
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > The largest join can be done in parallel
> using
> > >> an
> > >> > >> > >> innerJoin.
> > >> > >> > >> >> You
> > >> > >> > >> >> > > can
> > >> > >> > >> >> > > > > >> then
> > >> > >> > >> >> > > > > >> > wrap the stream coming out of the parallel
> > >> function
> > >> > >> in
> > >> > >> > an
> > >> > >> > >> >> > > > > innerHashJoin
> > >> > >> > >> >> > > > > >> to
> > >> > >> > >> >> > > > > >> > join it to another stream.
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > Joel Bernstein
> > >> > >> > >> >> > > > > >> > http://joelsolr.blogspot.com/
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin
> > Edwin
> > >> > Yeo <
> > >> > >> > >> >> > > > > >> edwinyeozl@gmail.com>
> > >> > >> > >> >> > > > > >> > wrote:
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >> Hi,
> > >> > >> > >> >> > > > > >> >>
> > >> > >> > >> >> > > > > >> >> Is it possible to join more than 2
> > collections
> > >> > using
> > >> > >> > one
> > >> > >> > >> of
> > >> > >> > >> >> the
> > >> > >> > >> >> > > > > >> streaming
> > >> > >> > >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is
> there
> > >> > other
> > >> > >> > ways
> > >> > >> > >> we
> > >> > >> > >> >> can
> > >> > >> > >> >> > > do
> > >> > >> > >> >> > > > > it?
> > >> > >> > >> >> > > > > >> >>
> > >> > >> > >> >> > > > > >> >> Currently, I may need to join 3 or 4
> > >> collections
> > >> > >> > >> together,
> > >> > >> > >> >> and
> > >> > >> > >> >> > to
> > >> > >> > >> >> > > > > >> output
> > >> > >> > >> >> > > > > >> >> selected fields from all these collections
> > >> > together.
> > >> > >> > >> >> > > > > >> >>
> > >> > >> > >> >> > > > > >> >> I'm using Solr 6.4.2.
> > >> > >> > >> >> > > > > >> >>
> > >> > >> > >> >> > > > > >> >> Regards,
> > >> > >> > >> >> > > > > >> >> Edwin
> > >> > >> > >> >> > > > > >> >>
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >> >
> > >> > >> > >> >> > > > > >>
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > > >
> > >> > >> > >> >> > > > >
> > >> > >> > >> >> > > >
> > >> > >> > >> >> > >
> > >> > >> > >> >> >
> > >> > >> > >> >>
> > >> > >> > >> >
> > >> > >> > >> >
> > >> > >> > >>
> > >> > >> > >
> > >> > >> > >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Joining more than 2 collections

Posted by Joel Bernstein <jo...@gmail.com>.
*:* queries will work fine for the innerJoin, which is a merge join that
never runs out of memory. The hashJoin read the entire "hashed" query into
memory though, so there are limitations.

So if you have three very large joins that require *:* then the hashJoin
approach will be problematic. In that case you could use fetch() around the
innerJoin to do the third join.

parallel(fetch(innerJoin(search(), search())))

Or if the hashJoin uses the same key as the innerJoin you can do the
hashJoin in parallel as well and partition the "hashed" search across the
workers:

parallel(hashJoin(innerJoin(search(), search()), hashed=search())))

In this case the "hashed" search partitionKeys would be the same as the
innerJoin searches. But the join keys must be same for this scenario to
work.




Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, May 5, 2017 at 11:17 AM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> I found that using *:* will return the entire resultset, and cause the
> result from the join query to blow up.
>
> Like if from the query, there are 2 results in collection1, and 3 results
> in collection2, I found that there could be 6 results that will be returned
> in the join query (using hashJoin or innerJoin).
>
> Is that correct?
>
> Regards,
> Edwin
>
>
> On 5 May 2017 at 07:17, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
>
> > Hi Joel,
> >
> > Yes, the /export works after I remove the /export handler from
> > solrconfig.xml. Thanks for the advice.
> >
> > For *:*, there will be result returned when using /export.
> > But if one of the queries is *:*, this means the entire resultset will
> > contains all the records from the query which has *:*?
> >
> > Regards,
> > Edwin
> >
> >
> > On 5 May 2017 at 01:46, Joel Bernstein <jo...@gmail.com> wrote:
> >
> >> No *:* will simply return all the results from one of the queries. It
> >> should still join properly. If you are using the /select handler joins
> >> will
> >> not work properly.
> >>
> >>
> >> This example worked properly for me:
> >>
> >> hashJoin(parallel(collection2, j
> >>                             workers=3,
> >>                             sort="id asc",
> >>                             innerJoin(search(collection2, q="*:*",
> >> fl="id",
> >> sort="id asc", qt="/export", partitionKeys="id"),
> >>                                             search(collection2,
> >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> >> partitionKeys="id"),
> >>                                             on="id")),
> >>                 hashed=search(collection2, q="day_i:7", fl="id, day_i",
> >> sort="id asc", qt="/export"),
> >>                 on="id")
> >>
> >>
> >>
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Thu, May 4, 2017 at 12:28 PM, Zheng Lin Edwin Yeo <
> >> edwinyeozl@gmail.com>
> >> wrote:
> >>
> >> > Hi Joel,
> >> >
> >> > For the join queries, is it true that if we use q=*:* for the query
> for
> >> one
> >> > of the join, there will not be any results return?
> >> >
> >> > Currently I found this is the case, if I just put q=*:*.
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> > On 4 May 2017 at 23:38, Zheng Lin Edwin Yeo <ed...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi Joel,
> >> > >
> >> > > I think that might be one of the reason.
> >> > > This is what I have for the /export handler in my solrconfig.xml
> >> > >
> >> > > <requestHandler name="/export" class="solr.SearchHandler"> <lst
> name=
> >> > > "invariants"> <str name="rq">{!xport}</str> <str
> >> name="wt">xsort</str> <
> >> > > str name="distrib">false</str> </lst> <arr name="components">
> >> > <str>query</
> >> > > str> </arr> </requestHandler>
> >> > >
> >> > > This is the error message that I get when I use the /export handler.
> >> > >
> >> > > java.io.IOException: java.util.concurrent.ExecutionException:
> >> > > java.io.IOException: --> http://localhost:8983/solr/
> >> > > collection1_shard1_replica1/: An exception has occurred on the
> server,
> >> > > refer to server log for details.
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> >> > > openStreams(CloudSolrStream.java:451)
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> >> > > open(CloudSolrStream.java:308)
> >> > > at org.apache.solr.client.solrj.io.stream.PushBackStream.open(
> >> > > PushBackStream.java:70)
> >> > > at org.apache.solr.client.solrj.io.stream.JoinStream.open(
> >> > > JoinStream.java:147)
> >> > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> >> > > open(ExceptionStream.java:51)
> >> > > at org.apache.solr.handler.StreamHandler$TimerStream.
> >> > > open(StreamHandler.java:457)
> >> > > at org.apache.solr.client.solrj.io.stream.TupleStream.
> >> > > writeMap(TupleStream.java:63)
> >> > > at org.apache.solr.response.JSONWriter.writeMap(
> >> > > JSONResponseWriter.java:547)
> >> > > at org.apache.solr.response.TextResponseWriter.writeVal(
> >> > > TextResponseWriter.java:193)
> >> > > at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> >> > > JSONResponseWriter.java:209)
> >> > > at org.apache.solr.response.JSONWriter.writeNamedList(
> >> > > JSONResponseWriter.java:325)
> >> > > at org.apache.solr.response.JSONWriter.writeResponse(
> >> > > JSONResponseWriter.java:120)
> >> > > at org.apache.solr.response.JSONResponseWriter.write(
> >> > > JSONResponseWriter.java:71)
> >> > > at org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> >> esponse(
> >> > > QueryResponseWriterUtil.java:65)
> >> > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> >> > > HttpSolrCall.java:732)
> >> > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> >> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >> > > SolrDispatchFilter.java:345)
> >> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> >> > > SolrDispatchFilter.java:296)
> >> > > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> >> > > doFilter(ServletHandler.java:1691)
> >> > > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> >> > > ServletHandler.java:582)
> >> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> >> > > ScopedHandler.java:143)
> >> > > at org.eclipse.jetty.security.SecurityHandler.handle(
> >> > > SecurityHandler.java:548)
> >> > > at org.eclipse.jetty.server.session.SessionHandler.
> >> > > doHandle(SessionHandler.java:226)
> >> > > at org.eclipse.jetty.server.handler.ContextHandler.
> >> > > doHandle(ContextHandler.java:1180)
> >> > > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> >> > > ServletHandler.java:512)
> >> > > at org.eclipse.jetty.server.session.SessionHandler.
> >> > > doScope(SessionHandler.java:185)
> >> > > at org.eclipse.jetty.server.handler.ContextHandler.
> >> > > doScope(ContextHandler.java:1112)
> >> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> >> > > ScopedHandler.java:141)
> >> > > at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(
> >> > > ContextHandlerCollection.java:213)
> >> > > at org.eclipse.jetty.server.handler.HandlerCollection.
> >> > > handle(HandlerCollection.java:119)
> >> > > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> >> > > HandlerWrapper.java:134)
> >> > > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> >> > > at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:320)
> >> > > at org.eclipse.jetty.server.HttpConnection.onFillable(
> >> > > HttpConnection.java:251)
> >> > > at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> >> > > AbstractConnection.java:273)
> >> > > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> >> > > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> >> > > SelectChannelEndPoint.java:93)
> >> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> >> > > executeProduceConsume(ExecuteProduceConsume.java:303)
> >> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> >> > > produceConsume(ExecuteProduceConsume.java:148)
> >> > > at org.eclipse.jetty.util.thread.strategy.
> ExecuteProduceConsume.run(
> >> > > ExecuteProduceConsume.java:136)
> >> > > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> >> > > QueuedThreadPool.java:671)
> >> > > at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> >> > > QueuedThreadPool.java:589)
> >> > > at java.lang.Thread.run(Thread.java:745)
> >> > > Caused by: java.util.concurrent.ExecutionException:
> >> java.io.IOException:
> >> > > --> http://localhost:8983/solr/collection1_shard1_replica1/: An
> >> > exception
> >> > > has occurred on the server, refer to server log for details.
> >> > > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> >> > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> >> > > openStreams(CloudSolrStream.java:445)
> >> > > ... 42 more
> >> > > Caused by: java.io.IOException: --> http://localhost:8983/solr/
> >> > > collection1_shard1_replica1/: An exception has occurred on the
> server,
> >> > > refer to server log for details.
> >> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> >> > > SolrStream.java:238)
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> >> > > TupleWrapper.next(CloudSolrStream.java:541)
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> >> > > StreamOpener.call(CloudSolrStream.java:564)
> >> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> >> > > StreamOpener.call(CloudSolrStream.java:551)
> >> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >> > > at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
> >> xecutor.
> >> > > lambda$execute$0(ExecutorUtil.java:229)
> >> > > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> >> > > ThreadPoolExecutor.java:1142)
> >> > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >> > > ThreadPoolExecutor.java:617)
> >> > > ... 1 more
> >> > > Caused by: org.noggit.JSONParser$ParseException: JSON Parse Error:
> >> > > char=<,position=0 BEFORE='<' AFTER='?xml version="1.0"
> >> > encoding="UTF-8"?> <'
> >> > > at org.noggit.JSONParser.err(JSONParser.java:356)
> >> > > at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.
> >> java:712)
> >> > > at org.noggit.JSONParser.next(JSONParser.java:886)
> >> > > at org.noggit.JSONParser.nextEvent(JSONParser.java:930)
> >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> >> > > expect(JSONTupleStream.java:97)
> >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> >> > > advanceToDocs(JSONTupleStream.java:179)
> >> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> >> > > next(JSONTupleStream.java:77)
> >> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> >> > > SolrStream.java:207)
> >> > > ... 8 more
> >> > >
> >> > >
> >> > > Regards,
> >> > > Edwin
> >> > >
> >> > >
> >> > > On 4 May 2017 at 22:54, Joel Bernstein <jo...@gmail.com> wrote:
> >> > >
> >> > >> I suspect that there is something not quite right about the how the
> >> > >> /export
> >> > >> handler is configured. Straight out of the box in solr 6.4.2
> /export
> >> > will
> >> > >> be automatically configured. Are you using a Solr instance that has
> >> been
> >> > >> upgraded in the past and doesn't have standard 6.4.2 configs?
> >> > >>
> >> > >> To really do joins properly you'll have to use the /export handler
> >> > because
> >> > >> /select will not stream entire result sets (unless they are pretty
> >> > small).
> >> > >> So your results will be missing data possibly.
> >> > >>
> >> > >> I would take a close look at the logs and see what all the
> exceptions
> >> > are
> >> > >> when you run the a search using qt=/export. If you can post all the
> >> > stack
> >> > >> traces that get generated when you run the search we'll probably be
> >> able
> >> > >> to
> >> > >> spot the issue.
> >> > >>
> >> > >> About the field ordering. There is support for field ordering in
> the
> >> > >> Streaming classes but only a few places actually enforce the order.
> >> The
> >> > >> 6.5
> >> > >> SQL interface does keep the fields in order as does the new Tuple
> >> > >> expression in Solr 6.6. But the expressions you are working with
> >> > currently
> >> > >> don't enforce field ordering.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> Joel Bernstein
> >> > >> http://joelsolr.blogspot.com/
> >> > >>
> >> > >> On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <
> >> > edwinyeozl@gmail.com
> >> > >> >
> >> > >> wrote:
> >> > >>
> >> > >> > Hi Joel,
> >> > >> >
> >> > >> > I have managed to get the Join to work, but so far it is only
> >> working
> >> > >> when
> >> > >> > I use qt="/select". It is not working when I use qt="/export".
> >> > >> >
> >> > >> > For the display of the field, is there a way to allow it to list
> >> them
> >> > in
> >> > >> > the order which I want?
> >> > >> > Currently, the display is quite random, and I can get a field in
> >> > >> > collection1, followed by a field in collection3, then collection1
> >> > again,
> >> > >> > and then collection2.
> >> > >> >
> >> > >> > It will be good if we can arrange the field to display in the
> order
> >> > >> that we
> >> > >> > want.
> >> > >> >
> >> > >> > Regards,
> >> > >> > Edwin
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com>
> >> > >> wrote:
> >> > >> >
> >> > >> > > Hi Joel,
> >> > >> > >
> >> > >> > > It works when I started off with just one expression.
> >> > >> > >
> >> > >> > > Could it be that the data size is too big for export after the
> >> join,
> >> > >> > which
> >> > >> > > causes the error?
> >> > >> > >
> >> > >> > > Regards,
> >> > >> > > Edwin
> >> > >> > >
> >> > >> > > On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com>
> >> wrote:
> >> > >> > >
> >> > >> > >> I was just testing with the query below and it worked for me.
> >> Some
> >> > of
> >> > >> > the
> >> > >> > >> error messages I was getting with the syntax was not what I
> was
> >> > >> > expecting
> >> > >> > >> though, so I'll look into the error handling. But the joins do
> >> work
> >> > >> when
> >> > >> > >> the syntax correct. The query below is joining to the same
> >> > collection
> >> > >> > >> three
> >> > >> > >> times, but the mechanics are exactly the same joining three
> >> > different
> >> > >> > >> tables. In this example each join narrows down the result set.
> >> > >> > >>
> >> > >> > >> hashJoin(parallel(collection2,
> >> > >> > >>                             workers=3,
> >> > >> > >>                             sort="id asc",
> >> > >> > >>                             innerJoin(search(collection2,
> >> q="*:*",
> >> > >> > >> fl="id",
> >> > >> > >> sort="id asc", qt="/export", partitionKeys="id"),
> >> > >> > >>
>  search(collection2,
> >> > >> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> >> > >> > >> partitionKeys="id"),
> >> > >> > >>                                             on="id")),
> >> > >> > >>                 hashed=search(collection2, q="day_i:7",
> fl="id,
> >> > >> day_i",
> >> > >> > >> sort="id asc", qt="/export"),
> >> > >> > >>                 on="id")
> >> > >> > >>
> >> > >> > >> Joel Bernstein
> >> > >> > >> http://joelsolr.blogspot.com/
> >> > >> > >>
> >> > >> > >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <
> >> joelsolr@gmail.com
> >> > >
> >> > >> > >> wrote:
> >> > >> > >>
> >> > >> > >> > Start off with just this expression:
> >> > >> > >> >
> >> > >> > >> > search(collection2,
> >> > >> > >> >             q=*:*,
> >> > >> > >> >             fl="a_s,b_s,c_s,d_s,e_s",
> >> > >> > >> >             sort="a_s asc",
> >> > >> > >> >             qt="/export")
> >> > >> > >> >
> >> > >> > >> > And then check the logs for exceptions.
> >> > >> > >> >
> >> > >> > >> > Joel Bernstein
> >> > >> > >> > http://joelsolr.blogspot.com/
> >> > >> > >> >
> >> > >> > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> >> > >> > >> edwinyeozl@gmail.com
> >> > >> > >> > > wrote:
> >> > >> > >> >
> >> > >> > >> >> Hi Joel,
> >> > >> > >> >>
> >> > >> > >> >> I am getting this error after I change add qt=/export and
> >> > removed
> >> > >> the
> >> > >> > >> rows
> >> > >> > >> >> param. Do you know what could be the reason?
> >> > >> > >> >>
> >> > >> > >> >> {
> >> > >> > >> >>   "error":{
> >> > >> > >> >>     "metadata":[
> >> > >> > >> >>       "error-class","org.apache.
> solr.common.SolrException",
> >> > >> > >> >>       "root-error-class","org.apache.http.
> >> > MalformedChunkCodingExc
> >> > >> e
> >> > >> > >> >> ption"],
> >> > >> > >> >>     "msg":"org.apache.http.MalformedChunkCodingException:
> >> CRLF
> >> > >> > >> expected
> >> > >> > >> >> at
> >> > >> > >> >> end of chunk",
> >> > >> > >> >>     "trace":"org.apache.solr.common.SolrException:
> >> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF
> >> expected at
> >> > >> end
> >> > >> > of
> >> > >> > >> >> chunk\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> wr
> >> > >> > >> >> iteMap$0(TupleStream.java:79)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter.writeIterator(
> JSONRespon
> >> > >> > >> >> seWriter.java:523)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(
> TextRes
> >> > >> > >> >> ponseWriter.java:175)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter$2.put(
> JSONResponseWriter
> >> > >> > >> >> .java:559)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.
> writeMap(
> >> > >> > >> >> TupleStream.java:64)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter.writeMap(
> JSONResponseWri
> >> > >> > >> >> ter.java:547)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(
> TextRes
> >> > >> > >> >> ponseWriter.java:193)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter.
> writeNamedListAsMapWithD
> >> > >> > >> >> ups(JSONResponseWriter.java:209)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter.writeNamedList(
> JSONRespo
> >> > >> > >> >> nseWriter.java:325)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONWriter.writeResponse(
> JSONRespon
> >> > >> > >> >> seWriter.java:120)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.JSONResponseWriter.write(
> JSONRespon
> >> > >> > >> >> seWriter.java:71)\r\n\tat
> >> > >> > >> >> org.apache.solr.response.QueryResponseWriterUtil.
> writeQueryR
> >> > >> > >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> >> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrC
> >> > >> > >> >> all.java:732)\r\n\tat
> >> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:
> >> > >> > >> 473)\r\n\tat
> >> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDisp
> >> > >> > >> >> atchFilter.java:345)\r\n\tat
> >> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDisp
> >> > >> > >> >> atchFilter.java:296)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilte
> >> > >> > >> >> r(ServletHandler.java:1691)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHan
> >> > >> > >> >> dler.java:582)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> Scoped
> >> > >> > >> >> Handler.java:143)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHa
> >> > >> > >> >> ndler.java:548)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
> >> > >> > >> >> SessionHandler.java:226)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> >> > >> > >> >> ContextHandler.java:1180)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHand
> >> > >> > >> >> ler.java:512)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> >> > >> > >> >> SessionHandler.java:185)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> >> > >> > >> >> ContextHandler.java:1112)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> Scoped
> >> > >> > >> >> Handler.java:141)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.
> ContextHandlerCollection.ha
> >> > >> > >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
> >> > >> > >> >> HandlerCollection.java:119)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> Handl
> >> > >> > >> >> erWrapper.java:134)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)
> \r\n\
> >> tat
> >> > >> > >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> >> > >> > >> java:320)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConne
> >> > >> > >> >> ction.java:251)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> >> > >> > >> >> succeeded(AbstractConnection.java:273)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> >> > >> > >> java:95)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChann
> >> > >> > >> >> elEndPoint.java:93)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.util.thread.
> strategy.ExecuteProduceConsume
> >> > >> > >> >> .executeProduceConsume(ExecuteProduceConsume.java:
> 303)\r\n\
> >> tat
> >> > >> > >> >> org.eclipse.jetty.util.thread.
> strategy.ExecuteProduceConsume
> >> > >> > >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.util.thread.
> strategy.ExecuteProduceConsume
> >> > >> > >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.util.thread.
> QueuedThreadPool.runJob(Queued
> >> > >> > >> >> ThreadPool.java:671)\r\n\tat
> >> > >> > >> >> org.eclipse.jetty.util.thread.
> QueuedThreadPool$2.run(QueuedT
> >> > >> > >> >> hreadPool.java:589)\r\n\tat
> >> > >> > >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> >> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF
> >> expected at
> >> > >> end
> >> > >> > of
> >> > >> > >> >> chunk\r\n\tat
> >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.
> getChunkSize(Chun
> >> > >> > >> >> kedInputStream.java:255)\r\n\tat
> >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(
> Chunked
> >> > >> > >> >> InputStream.java:227)\r\n\tat
> >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(
> ChunkedInput
> >> > >> > >> >> Stream.java:186)\r\n\tat
> >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(
> ChunkedInput
> >> > >> > >> >> Stream.java:215)\r\n\tat
> >> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.close(
> ChunkedInpu
> >> > >> > >> >> tStream.java:316)\r\n\tat
> >> > >> > >> >> org.apache.http.conn.BasicManagedEntity.
> streamClosed(BasicMa
> >> > >> > >> >> nagedEntity.java:164)\r\n\tat
> >> > >> > >> >> org.apache.http.conn.EofSensorInputStream.
> checkClose(EofSens
> >> > >> > >> >> orInputStream.java:228)\r\n\tat
> >> > >> > >> >> org.apache.http.conn.EofSensorInputStream.close(
> EofSensorInp
> >> > >> > >> >> utStream.java:174)\r\n\tat
> >> > >> > >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:
> 378)\
> >> > >> r\n\tat
> >> > >> > >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\
> r\n\
> >> tat
> >> > >> > >> >> java.io.InputStreamReader.close(InputStreamReader.java:
> 199)\
> >> > >> r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> close
> >> > >> > >> >> (JSONTupleStream.java:92)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(
> Solr
> >> > >> > >> >> Stream.java:193)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> close
> >> > >> > >> >> (CloudSolrStream.java:464)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.
> close(
> >> > >> > >> >> HashJoinStream.java:231)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.
> close
> >> > >> > >> >> (ExceptionStream.java:93)\r\n\tat
> >> > >> > >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> >> > >> > >> >> StreamHandler.java:452)\r\n\tat
> >> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> wr
> >> > >> > >> >> iteMap$0(TupleStream.java:71)\r\n\t...
> >> > >> > >> >> 40 more\r\n",
> >> > >> > >> >>     "code":500}}
> >> > >> > >> >>
> >> > >> > >> >>
> >> > >> > >> >> Regards,
> >> > >> > >> >> Edwin
> >> > >> > >> >>
> >> > >> > >> >>
> >> > >> > >> >> On 4 May 2017 at 00:00, Joel Bernstein <joelsolr@gmail.com
> >
> >> > >> wrote:
> >> > >> > >> >>
> >> > >> > >> >> > I've reformatted the expression below and made a few
> >> changes.
> >> > >> You
> >> > >> > >> have
> >> > >> > >> >> put
> >> > >> > >> >> > things together properly. But these are MapReduce joins
> >> that
> >> > >> > require
> >> > >> > >> >> > exporting the entire result sets. So you will need to add
> >> > >> > qt=/export
> >> > >> > >> to
> >> > >> > >> >> all
> >> > >> > >> >> > the searches and remove the rows param. In Solr 6.6.
> there
> >> is
> >> > a
> >> > >> new
> >> > >> > >> >> > "shuffle" expression that does this automatically.
> >> > >> > >> >> >
> >> > >> > >> >> > To test things you'll want to break down each expression
> >> and
> >> > >> make
> >> > >> > >> sure
> >> > >> > >> >> it's
> >> > >> > >> >> > behaving as expected.
> >> > >> > >> >> >
> >> > >> > >> >> > For example first run each search. Then run the
> innerJoin,
> >> not
> >> > >> in
> >> > >> > >> >> parallel
> >> > >> > >> >> > mode. Then run it in parallel mode. Then try the whole
> >> thing.
> >> > >> > >> >> >
> >> > >> > >> >> > hashJoin(parallel(collection2,
> >> > >> > >> >> >                             innerJoin(search(collection2,
> >> > >> > >> >> >
> >> q=*:*,
> >> > >> > >> >> >
> >> > >> > >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> >> > >> > >> >> >
> >> > sort="a_s
> >> > >> > >> asc",
> >> > >> > >> >> >
> >> > >> > >> >> partitionKeys="a_s",
> >> > >> > >> >> >
> >> > >> > qt="/export"),
> >> > >> > >> >> >
> >> search(collection1,
> >> > >> > >> >> >
> >> q=*:*,
> >> > >> > >> >> >
> >> > >> > >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> > >> > >> >> >
> >> > sort="a_s
> >> > >> > >> asc",
> >> > >> > >> >> >
> >> > >> > >> >>  partitionKeys="a_s",
> >> > >> > >> >> >
> >> > >> >  qt="/export"),
> >> > >> > >> >> >                                            on="a_s"),
> >> > >> > >> >> >                              workers="2",
> >> > >> > >> >> >                              sort="a_s asc"),
> >> > >> > >> >> >                hashed=search(collection3,
> >> > >> > >> >> >                                          q=*:*,
> >> > >> > >> >> >
> fl="a_s,k_s,l_s",
> >> > >> > >> >> >                                          sort="a_s asc",
> >> > >> > >> >> >                                          qt="/export"),
> >> > >> > >> >> >               on="a_s")
> >> > >> > >> >> >
> >> > >> > >> >> > Joel Bernstein
> >> > >> > >> >> > http://joelsolr.blogspot.com/
> >> > >> > >> >> >
> >> > >> > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> >> > >> > >> >> edwinyeozl@gmail.com
> >> > >> > >> >> > >
> >> > >> > >> >> > wrote:
> >> > >> > >> >> >
> >> > >> > >> >> > > Hi Joel,
> >> > >> > >> >> > >
> >> > >> > >> >> > > Thanks for the clarification.
> >> > >> > >> >> > >
> >> > >> > >> >> > > Would like to check, is this the correct way to do the
> >> join?
> >> > >> > >> >> Currently, I
> >> > >> > >> >> > > could not get any results after putting in the hashJoin
> >> for
> >> > >> the
> >> > >> > >> 3rd,
> >> > >> > >> >> > > smallerStream collection (collection3).
> >> > >> > >> >> > >
> >> > >> > >> >> > > http://localhost:8983/solr/collection1/stream?expr=
> >> > >> > >> >> > > hashJoin(parallel(collection2
> >> > >> > >> >> > > ,
> >> > >> > >> >> > > innerJoin(
> >> > >> > >> >> > >  search(collection2,
> >> > >> > >> >> > > q=*:*,
> >> > >> > >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> >> > >> > >> >> > >              sort="a_s asc",
> >> > >> > >> >> > > partitionKeys="a_s",
> >> > >> > >> >> > > rows=200),
> >> > >> > >> >> > >  search(collection1,
> >> > >> > >> >> > > q=*:*,
> >> > >> > >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> > >> > >> >> > >              sort="a_s asc",
> >> > >> > >> >> > > partitionKeys="a_s",
> >> > >> > >> >> > > rows=200),
> >> > >> > >> >> > >          on="a_s"),
> >> > >> > >> >> > > workers="2",
> >> > >> > >> >> > >                  sort="a_s asc"),
> >> > >> > >> >> > >          hashed=search(collection3,
> >> > >> > >> >> > > q=*:*,
> >> > >> > >> >> > > fl="a_s,k_s,l_s",
> >> > >> > >> >> > > sort="a_s asc",
> >> > >> > >> >> > > rows=200),
> >> > >> > >> >> > > on="a_s")
> >> > >> > >> >> > > &indent=true
> >> > >> > >> >> > >
> >> > >> > >> >> > >
> >> > >> > >> >> > > Regards,
> >> > >> > >> >> > > Edwin
> >> > >> > >> >> > >
> >> > >> > >> >> > >
> >> > >> > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <
> >> joelsolr@gmail.com>
> >> > >> > wrote:
> >> > >> > >> >> > >
> >> > >> > >> >> > > > Sorry, it's just called hashJoin
> >> > >> > >> >> > > >
> >> > >> > >> >> > > > Joel Bernstein
> >> > >> > >> >> > > > http://joelsolr.blogspot.com/
> >> > >> > >> >> > > >
> >> > >> > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> >> > >> > >> >> > > edwinyeozl@gmail.com>
> >> > >> > >> >> > > > wrote:
> >> > >> > >> >> > > >
> >> > >> > >> >> > > > > Hi Joel,
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > I am getting this error when I used the
> >> innerHashJoin.
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > >  "EXCEPTION":"Invalid stream expression
> >> > >> > innerHashJoin(parallel(
> >> > >> > >> >> > > innerJoin
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > I also can't find the documentation on
> innerHashJoin
> >> for
> >> > >> the
> >> > >> > >> >> > Streaming
> >> > >> > >> >> > > > > Expressions.
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > Are you referring to hashJoin?
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > Regards,
> >> > >> > >> >> > > > > Edwin
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> >> > >> > >> edwinyeozl@gmail.com
> >> > >> > >> >> >
> >> > >> > >> >> > > > wrote:
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > > > > Hi Joel,
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > > Thanks for the info.
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > > Regards,
> >> > >> > >> >> > > > > > Edwin
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <
> >> > >> joelsolr@gmail.com
> >> > >> > >
> >> > >> > >> >> wrote:
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > >> Also take a look at the documentation for the
> >> "fetch"
> >> > >> > >> streaming
> >> > >> > >> >> > > > > >> expression.
> >> > >> > >> >> > > > > >>
> >> > >> > >> >> > > > > >> Joel Bernstein
> >> > >> > >> >> > > > > >> http://joelsolr.blogspot.com/
> >> > >> > >> >> > > > > >>
> >> > >> > >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> >> > >> > >> >> > joelsolr@gmail.com>
> >> > >> > >> >> > > > > >> wrote:
> >> > >> > >> >> > > > > >>
> >> > >> > >> >> > > > > >> > Yes you join more then one collection with
> >> > Streaming
> >> > >> > >> >> > Expressions.
> >> > >> > >> >> > > > Here
> >> > >> > >> >> > > > > >> are
> >> > >> > >> >> > > > > >> > a few things to keep in mind.
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > * You'll likely want to use the parallel
> >> function
> >> > >> around
> >> > >> > >> the
> >> > >> > >> >> > > largest
> >> > >> > >> >> > > > > >> join.
> >> > >> > >> >> > > > > >> > You'll need to use the join keys as the
> >> > >> partitionKeys.
> >> > >> > >> >> > > > > >> > * innerJoin: requires that the streams be
> >> sorted on
> >> > >> the
> >> > >> > >> join
> >> > >> > >> >> > keys.
> >> > >> > >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > So a strategy for a three collection join
> might
> >> > look
> >> > >> > like
> >> > >> > >> >> this:
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
> >> > >> > bigStream)),
> >> > >> > >> >> > > > > smallerStream)
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > The largest join can be done in parallel using
> >> an
> >> > >> > >> innerJoin.
> >> > >> > >> >> You
> >> > >> > >> >> > > can
> >> > >> > >> >> > > > > >> then
> >> > >> > >> >> > > > > >> > wrap the stream coming out of the parallel
> >> function
> >> > >> in
> >> > >> > an
> >> > >> > >> >> > > > > innerHashJoin
> >> > >> > >> >> > > > > >> to
> >> > >> > >> >> > > > > >> > join it to another stream.
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > Joel Bernstein
> >> > >> > >> >> > > > > >> > http://joelsolr.blogspot.com/
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin
> Edwin
> >> > Yeo <
> >> > >> > >> >> > > > > >> edwinyeozl@gmail.com>
> >> > >> > >> >> > > > > >> > wrote:
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >> Hi,
> >> > >> > >> >> > > > > >> >>
> >> > >> > >> >> > > > > >> >> Is it possible to join more than 2
> collections
> >> > using
> >> > >> > one
> >> > >> > >> of
> >> > >> > >> >> the
> >> > >> > >> >> > > > > >> streaming
> >> > >> > >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there
> >> > other
> >> > >> > ways
> >> > >> > >> we
> >> > >> > >> >> can
> >> > >> > >> >> > > do
> >> > >> > >> >> > > > > it?
> >> > >> > >> >> > > > > >> >>
> >> > >> > >> >> > > > > >> >> Currently, I may need to join 3 or 4
> >> collections
> >> > >> > >> together,
> >> > >> > >> >> and
> >> > >> > >> >> > to
> >> > >> > >> >> > > > > >> output
> >> > >> > >> >> > > > > >> >> selected fields from all these collections
> >> > together.
> >> > >> > >> >> > > > > >> >>
> >> > >> > >> >> > > > > >> >> I'm using Solr 6.4.2.
> >> > >> > >> >> > > > > >> >>
> >> > >> > >> >> > > > > >> >> Regards,
> >> > >> > >> >> > > > > >> >> Edwin
> >> > >> > >> >> > > > > >> >>
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >> >
> >> > >> > >> >> > > > > >>
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > > >
> >> > >> > >> >> > > > >
> >> > >> > >> >> > > >
> >> > >> > >> >> > >
> >> > >> > >> >> >
> >> > >> > >> >>
> >> > >> > >> >
> >> > >> > >> >
> >> > >> > >>
> >> > >> > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
I found that using *:* will return the entire resultset, and cause the
result from the join query to blow up.

Like if from the query, there are 2 results in collection1, and 3 results
in collection2, I found that there could be 6 results that will be returned
in the join query (using hashJoin or innerJoin).

Is that correct?

Regards,
Edwin


On 5 May 2017 at 07:17, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:

> Hi Joel,
>
> Yes, the /export works after I remove the /export handler from
> solrconfig.xml. Thanks for the advice.
>
> For *:*, there will be result returned when using /export.
> But if one of the queries is *:*, this means the entire resultset will
> contains all the records from the query which has *:*?
>
> Regards,
> Edwin
>
>
> On 5 May 2017 at 01:46, Joel Bernstein <jo...@gmail.com> wrote:
>
>> No *:* will simply return all the results from one of the queries. It
>> should still join properly. If you are using the /select handler joins
>> will
>> not work properly.
>>
>>
>> This example worked properly for me:
>>
>> hashJoin(parallel(collection2, j
>>                             workers=3,
>>                             sort="id asc",
>>                             innerJoin(search(collection2, q="*:*",
>> fl="id",
>> sort="id asc", qt="/export", partitionKeys="id"),
>>                                             search(collection2,
>> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
>> partitionKeys="id"),
>>                                             on="id")),
>>                 hashed=search(collection2, q="day_i:7", fl="id, day_i",
>> sort="id asc", qt="/export"),
>>                 on="id")
>>
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Thu, May 4, 2017 at 12:28 PM, Zheng Lin Edwin Yeo <
>> edwinyeozl@gmail.com>
>> wrote:
>>
>> > Hi Joel,
>> >
>> > For the join queries, is it true that if we use q=*:* for the query for
>> one
>> > of the join, there will not be any results return?
>> >
>> > Currently I found this is the case, if I just put q=*:*.
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 4 May 2017 at 23:38, Zheng Lin Edwin Yeo <ed...@gmail.com>
>> wrote:
>> >
>> > > Hi Joel,
>> > >
>> > > I think that might be one of the reason.
>> > > This is what I have for the /export handler in my solrconfig.xml
>> > >
>> > > <requestHandler name="/export" class="solr.SearchHandler"> <lst name=
>> > > "invariants"> <str name="rq">{!xport}</str> <str
>> name="wt">xsort</str> <
>> > > str name="distrib">false</str> </lst> <arr name="components">
>> > <str>query</
>> > > str> </arr> </requestHandler>
>> > >
>> > > This is the error message that I get when I use the /export handler.
>> > >
>> > > java.io.IOException: java.util.concurrent.ExecutionException:
>> > > java.io.IOException: --> http://localhost:8983/solr/
>> > > collection1_shard1_replica1/: An exception has occurred on the server,
>> > > refer to server log for details.
>> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
>> > > openStreams(CloudSolrStream.java:451)
>> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
>> > > open(CloudSolrStream.java:308)
>> > > at org.apache.solr.client.solrj.io.stream.PushBackStream.open(
>> > > PushBackStream.java:70)
>> > > at org.apache.solr.client.solrj.io.stream.JoinStream.open(
>> > > JoinStream.java:147)
>> > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
>> > > open(ExceptionStream.java:51)
>> > > at org.apache.solr.handler.StreamHandler$TimerStream.
>> > > open(StreamHandler.java:457)
>> > > at org.apache.solr.client.solrj.io.stream.TupleStream.
>> > > writeMap(TupleStream.java:63)
>> > > at org.apache.solr.response.JSONWriter.writeMap(
>> > > JSONResponseWriter.java:547)
>> > > at org.apache.solr.response.TextResponseWriter.writeVal(
>> > > TextResponseWriter.java:193)
>> > > at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
>> > > JSONResponseWriter.java:209)
>> > > at org.apache.solr.response.JSONWriter.writeNamedList(
>> > > JSONResponseWriter.java:325)
>> > > at org.apache.solr.response.JSONWriter.writeResponse(
>> > > JSONResponseWriter.java:120)
>> > > at org.apache.solr.response.JSONResponseWriter.write(
>> > > JSONResponseWriter.java:71)
>> > > at org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
>> esponse(
>> > > QueryResponseWriterUtil.java:65)
>> > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
>> > > HttpSolrCall.java:732)
>> > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
>> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> > > SolrDispatchFilter.java:345)
>> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> > > SolrDispatchFilter.java:296)
>> > > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
>> > > doFilter(ServletHandler.java:1691)
>> > > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
>> > > ServletHandler.java:582)
>> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
>> > > ScopedHandler.java:143)
>> > > at org.eclipse.jetty.security.SecurityHandler.handle(
>> > > SecurityHandler.java:548)
>> > > at org.eclipse.jetty.server.session.SessionHandler.
>> > > doHandle(SessionHandler.java:226)
>> > > at org.eclipse.jetty.server.handler.ContextHandler.
>> > > doHandle(ContextHandler.java:1180)
>> > > at org.eclipse.jetty.servlet.ServletHandler.doScope(
>> > > ServletHandler.java:512)
>> > > at org.eclipse.jetty.server.session.SessionHandler.
>> > > doScope(SessionHandler.java:185)
>> > > at org.eclipse.jetty.server.handler.ContextHandler.
>> > > doScope(ContextHandler.java:1112)
>> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
>> > > ScopedHandler.java:141)
>> > > at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
>> > > ContextHandlerCollection.java:213)
>> > > at org.eclipse.jetty.server.handler.HandlerCollection.
>> > > handle(HandlerCollection.java:119)
>> > > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>> > > HandlerWrapper.java:134)
>> > > at org.eclipse.jetty.server.Server.handle(Server.java:534)
>> > > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>> > > at org.eclipse.jetty.server.HttpConnection.onFillable(
>> > > HttpConnection.java:251)
>> > > at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
>> > > AbstractConnection.java:273)
>> > > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>> > > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
>> > > SelectChannelEndPoint.java:93)
>> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
>> > > executeProduceConsume(ExecuteProduceConsume.java:303)
>> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
>> > > produceConsume(ExecuteProduceConsume.java:148)
>> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
>> > > ExecuteProduceConsume.java:136)
>> > > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
>> > > QueuedThreadPool.java:671)
>> > > at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
>> > > QueuedThreadPool.java:589)
>> > > at java.lang.Thread.run(Thread.java:745)
>> > > Caused by: java.util.concurrent.ExecutionException:
>> java.io.IOException:
>> > > --> http://localhost:8983/solr/collection1_shard1_replica1/: An
>> > exception
>> > > has occurred on the server, refer to server log for details.
>> > > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>> > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
>> > > openStreams(CloudSolrStream.java:445)
>> > > ... 42 more
>> > > Caused by: java.io.IOException: --> http://localhost:8983/solr/
>> > > collection1_shard1_replica1/: An exception has occurred on the server,
>> > > refer to server log for details.
>> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
>> > > SolrStream.java:238)
>> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
>> > > TupleWrapper.next(CloudSolrStream.java:541)
>> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
>> > > StreamOpener.call(CloudSolrStream.java:564)
>> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
>> > > StreamOpener.call(CloudSolrStream.java:551)
>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> > > at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
>> xecutor.
>> > > lambda$execute$0(ExecutorUtil.java:229)
>> > > at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> > > ThreadPoolExecutor.java:1142)
>> > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> > > ThreadPoolExecutor.java:617)
>> > > ... 1 more
>> > > Caused by: org.noggit.JSONParser$ParseException: JSON Parse Error:
>> > > char=<,position=0 BEFORE='<' AFTER='?xml version="1.0"
>> > encoding="UTF-8"?> <'
>> > > at org.noggit.JSONParser.err(JSONParser.java:356)
>> > > at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.
>> java:712)
>> > > at org.noggit.JSONParser.next(JSONParser.java:886)
>> > > at org.noggit.JSONParser.nextEvent(JSONParser.java:930)
>> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
>> > > expect(JSONTupleStream.java:97)
>> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
>> > > advanceToDocs(JSONTupleStream.java:179)
>> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
>> > > next(JSONTupleStream.java:77)
>> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
>> > > SolrStream.java:207)
>> > > ... 8 more
>> > >
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> > >
>> > > On 4 May 2017 at 22:54, Joel Bernstein <jo...@gmail.com> wrote:
>> > >
>> > >> I suspect that there is something not quite right about the how the
>> > >> /export
>> > >> handler is configured. Straight out of the box in solr 6.4.2  /export
>> > will
>> > >> be automatically configured. Are you using a Solr instance that has
>> been
>> > >> upgraded in the past and doesn't have standard 6.4.2 configs?
>> > >>
>> > >> To really do joins properly you'll have to use the /export handler
>> > because
>> > >> /select will not stream entire result sets (unless they are pretty
>> > small).
>> > >> So your results will be missing data possibly.
>> > >>
>> > >> I would take a close look at the logs and see what all the exceptions
>> > are
>> > >> when you run the a search using qt=/export. If you can post all the
>> > stack
>> > >> traces that get generated when you run the search we'll probably be
>> able
>> > >> to
>> > >> spot the issue.
>> > >>
>> > >> About the field ordering. There is support for field ordering in the
>> > >> Streaming classes but only a few places actually enforce the order.
>> The
>> > >> 6.5
>> > >> SQL interface does keep the fields in order as does the new Tuple
>> > >> expression in Solr 6.6. But the expressions you are working with
>> > currently
>> > >> don't enforce field ordering.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> Joel Bernstein
>> > >> http://joelsolr.blogspot.com/
>> > >>
>> > >> On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <
>> > edwinyeozl@gmail.com
>> > >> >
>> > >> wrote:
>> > >>
>> > >> > Hi Joel,
>> > >> >
>> > >> > I have managed to get the Join to work, but so far it is only
>> working
>> > >> when
>> > >> > I use qt="/select". It is not working when I use qt="/export".
>> > >> >
>> > >> > For the display of the field, is there a way to allow it to list
>> them
>> > in
>> > >> > the order which I want?
>> > >> > Currently, the display is quite random, and I can get a field in
>> > >> > collection1, followed by a field in collection3, then collection1
>> > again,
>> > >> > and then collection2.
>> > >> >
>> > >> > It will be good if we can arrange the field to display in the order
>> > >> that we
>> > >> > want.
>> > >> >
>> > >> > Regards,
>> > >> > Edwin
>> > >> >
>> > >> >
>> > >> >
>> > >> > On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <ed...@gmail.com>
>> > >> wrote:
>> > >> >
>> > >> > > Hi Joel,
>> > >> > >
>> > >> > > It works when I started off with just one expression.
>> > >> > >
>> > >> > > Could it be that the data size is too big for export after the
>> join,
>> > >> > which
>> > >> > > causes the error?
>> > >> > >
>> > >> > > Regards,
>> > >> > > Edwin
>> > >> > >
>> > >> > > On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com>
>> wrote:
>> > >> > >
>> > >> > >> I was just testing with the query below and it worked for me.
>> Some
>> > of
>> > >> > the
>> > >> > >> error messages I was getting with the syntax was not what I was
>> > >> > expecting
>> > >> > >> though, so I'll look into the error handling. But the joins do
>> work
>> > >> when
>> > >> > >> the syntax correct. The query below is joining to the same
>> > collection
>> > >> > >> three
>> > >> > >> times, but the mechanics are exactly the same joining three
>> > different
>> > >> > >> tables. In this example each join narrows down the result set.
>> > >> > >>
>> > >> > >> hashJoin(parallel(collection2,
>> > >> > >>                             workers=3,
>> > >> > >>                             sort="id asc",
>> > >> > >>                             innerJoin(search(collection2,
>> q="*:*",
>> > >> > >> fl="id",
>> > >> > >> sort="id asc", qt="/export", partitionKeys="id"),
>> > >> > >>                                             search(collection2,
>> > >> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
>> > >> > >> partitionKeys="id"),
>> > >> > >>                                             on="id")),
>> > >> > >>                 hashed=search(collection2, q="day_i:7", fl="id,
>> > >> day_i",
>> > >> > >> sort="id asc", qt="/export"),
>> > >> > >>                 on="id")
>> > >> > >>
>> > >> > >> Joel Bernstein
>> > >> > >> http://joelsolr.blogspot.com/
>> > >> > >>
>> > >> > >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <
>> joelsolr@gmail.com
>> > >
>> > >> > >> wrote:
>> > >> > >>
>> > >> > >> > Start off with just this expression:
>> > >> > >> >
>> > >> > >> > search(collection2,
>> > >> > >> >             q=*:*,
>> > >> > >> >             fl="a_s,b_s,c_s,d_s,e_s",
>> > >> > >> >             sort="a_s asc",
>> > >> > >> >             qt="/export")
>> > >> > >> >
>> > >> > >> > And then check the logs for exceptions.
>> > >> > >> >
>> > >> > >> > Joel Bernstein
>> > >> > >> > http://joelsolr.blogspot.com/
>> > >> > >> >
>> > >> > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
>> > >> > >> edwinyeozl@gmail.com
>> > >> > >> > > wrote:
>> > >> > >> >
>> > >> > >> >> Hi Joel,
>> > >> > >> >>
>> > >> > >> >> I am getting this error after I change add qt=/export and
>> > removed
>> > >> the
>> > >> > >> rows
>> > >> > >> >> param. Do you know what could be the reason?
>> > >> > >> >>
>> > >> > >> >> {
>> > >> > >> >>   "error":{
>> > >> > >> >>     "metadata":[
>> > >> > >> >>       "error-class","org.apache.solr.common.SolrException",
>> > >> > >> >>       "root-error-class","org.apache.http.
>> > MalformedChunkCodingExc
>> > >> e
>> > >> > >> >> ption"],
>> > >> > >> >>     "msg":"org.apache.http.MalformedChunkCodingException:
>> CRLF
>> > >> > >> expected
>> > >> > >> >> at
>> > >> > >> >> end of chunk",
>> > >> > >> >>     "trace":"org.apache.solr.common.SolrException:
>> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF
>> expected at
>> > >> end
>> > >> > of
>> > >> > >> >> chunk\r\n\tat
>> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> > >> > >> >> iteMap$0(TupleStream.java:79)\r\n\tat
>> > >> > >> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
>> > >> > >> >> seWriter.java:523)\r\n\tat
>> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> > >> > >> >> ponseWriter.java:175)\r\n\tat
>> > >> > >> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
>> > >> > >> >> .java:559)\r\n\tat
>> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
>> > >> > >> >> TupleStream.java:64)\r\n\tat
>> > >> > >> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
>> > >> > >> >> ter.java:547)\r\n\tat
>> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> > >> > >> >> ponseWriter.java:193)\r\n\tat
>> > >> > >> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
>> > >> > >> >> ups(JSONResponseWriter.java:209)\r\n\tat
>> > >> > >> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
>> > >> > >> >> nseWriter.java:325)\r\n\tat
>> > >> > >> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
>> > >> > >> >> seWriter.java:120)\r\n\tat
>> > >> > >> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
>> > >> > >> >> seWriter.java:71)\r\n\tat
>> > >> > >> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
>> > >> > >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
>> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
>> > >> > >> >> all.java:732)\r\n\tat
>> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
>> > >> > >> 473)\r\n\tat
>> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> > >> > >> >> atchFilter.java:345)\r\n\tat
>> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> > >> > >> >> atchFilter.java:296)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>> > >> > >> >> r(ServletHandler.java:1691)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
>> > >> > >> >> dler.java:582)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> > >> > >> >> Handler.java:143)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
>> > >> > >> >> ndler.java:548)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
>> > >> > >> >> SessionHandler.java:226)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>> > >> > >> >> ContextHandler.java:1180)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
>> > >> > >> >> ler.java:512)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
>> > >> > >> >> SessionHandler.java:185)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>> > >> > >> >> ContextHandler.java:1112)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> > >> > >> >> Handler.java:141)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
>> > >> > >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>> > >> > >> >> HandlerCollection.java:119)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
>> > >> > >> >> erWrapper.java:134)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\
>> tat
>> > >> > >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
>> > >> > >> java:320)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
>> > >> > >> >> ction.java:251)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
>> > >> > >> >> succeeded(AbstractConnection.java:273)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
>> > >> > >> java:95)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
>> > >> > >> >> elEndPoint.java:93)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> > >> > >> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\
>> tat
>> > >> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> > >> > >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> > >> > >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
>> > >> > >> >> ThreadPool.java:671)\r\n\tat
>> > >> > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
>> > >> > >> >> hreadPool.java:589)\r\n\tat
>> > >> > >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
>> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF
>> expected at
>> > >> end
>> > >> > of
>> > >> > >> >> chunk\r\n\tat
>> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
>> > >> > >> >> kedInputStream.java:255)\r\n\tat
>> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
>> > >> > >> >> InputStream.java:227)\r\n\tat
>> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> > >> > >> >> Stream.java:186)\r\n\tat
>> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> > >> > >> >> Stream.java:215)\r\n\tat
>> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
>> > >> > >> >> tStream.java:316)\r\n\tat
>> > >> > >> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
>> > >> > >> >> nagedEntity.java:164)\r\n\tat
>> > >> > >> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
>> > >> > >> >> orInputStream.java:228)\r\n\tat
>> > >> > >> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
>> > >> > >> >> utStream.java:174)\r\n\tat
>> > >> > >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\
>> > >> r\n\tat
>> > >> > >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\
>> tat
>> > >> > >> >> java.io.InputStreamReader.close(InputStreamReader.java:199)\
>> > >> r\n\tat
>> > >> > >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
>> > >> > >> >> (JSONTupleStream.java:92)\r\n\tat
>> > >> > >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
>> > >> > >> >> Stream.java:193)\r\n\tat
>> > >> > >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
>> > >> > >> >> (CloudSolrStream.java:464)\r\n\tat
>> > >> > >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
>> > >> > >> >> HashJoinStream.java:231)\r\n\tat
>> > >> > >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
>> > >> > >> >> (ExceptionStream.java:93)\r\n\tat
>> > >> > >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
>> > >> > >> >> StreamHandler.java:452)\r\n\tat
>> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> > >> > >> >> iteMap$0(TupleStream.java:71)\r\n\t...
>> > >> > >> >> 40 more\r\n",
>> > >> > >> >>     "code":500}}
>> > >> > >> >>
>> > >> > >> >>
>> > >> > >> >> Regards,
>> > >> > >> >> Edwin
>> > >> > >> >>
>> > >> > >> >>
>> > >> > >> >> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com>
>> > >> wrote:
>> > >> > >> >>
>> > >> > >> >> > I've reformatted the expression below and made a few
>> changes.
>> > >> You
>> > >> > >> have
>> > >> > >> >> put
>> > >> > >> >> > things together properly. But these are MapReduce joins
>> that
>> > >> > require
>> > >> > >> >> > exporting the entire result sets. So you will need to add
>> > >> > qt=/export
>> > >> > >> to
>> > >> > >> >> all
>> > >> > >> >> > the searches and remove the rows param. In Solr 6.6. there
>> is
>> > a
>> > >> new
>> > >> > >> >> > "shuffle" expression that does this automatically.
>> > >> > >> >> >
>> > >> > >> >> > To test things you'll want to break down each expression
>> and
>> > >> make
>> > >> > >> sure
>> > >> > >> >> it's
>> > >> > >> >> > behaving as expected.
>> > >> > >> >> >
>> > >> > >> >> > For example first run each search. Then run the innerJoin,
>> not
>> > >> in
>> > >> > >> >> parallel
>> > >> > >> >> > mode. Then run it in parallel mode. Then try the whole
>> thing.
>> > >> > >> >> >
>> > >> > >> >> > hashJoin(parallel(collection2,
>> > >> > >> >> >                             innerJoin(search(collection2,
>> > >> > >> >> >
>> q=*:*,
>> > >> > >> >> >
>> > >> > >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
>> > >> > >> >> >
>> > sort="a_s
>> > >> > >> asc",
>> > >> > >> >> >
>> > >> > >> >> partitionKeys="a_s",
>> > >> > >> >> >
>> > >> > qt="/export"),
>> > >> > >> >> >
>> search(collection1,
>> > >> > >> >> >
>> q=*:*,
>> > >> > >> >> >
>> > >> > >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> > >> > >> >> >
>> > sort="a_s
>> > >> > >> asc",
>> > >> > >> >> >
>> > >> > >> >>  partitionKeys="a_s",
>> > >> > >> >> >
>> > >> >  qt="/export"),
>> > >> > >> >> >                                            on="a_s"),
>> > >> > >> >> >                              workers="2",
>> > >> > >> >> >                              sort="a_s asc"),
>> > >> > >> >> >                hashed=search(collection3,
>> > >> > >> >> >                                          q=*:*,
>> > >> > >> >> >                                          fl="a_s,k_s,l_s",
>> > >> > >> >> >                                          sort="a_s asc",
>> > >> > >> >> >                                          qt="/export"),
>> > >> > >> >> >               on="a_s")
>> > >> > >> >> >
>> > >> > >> >> > Joel Bernstein
>> > >> > >> >> > http://joelsolr.blogspot.com/
>> > >> > >> >> >
>> > >> > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
>> > >> > >> >> edwinyeozl@gmail.com
>> > >> > >> >> > >
>> > >> > >> >> > wrote:
>> > >> > >> >> >
>> > >> > >> >> > > Hi Joel,
>> > >> > >> >> > >
>> > >> > >> >> > > Thanks for the clarification.
>> > >> > >> >> > >
>> > >> > >> >> > > Would like to check, is this the correct way to do the
>> join?
>> > >> > >> >> Currently, I
>> > >> > >> >> > > could not get any results after putting in the hashJoin
>> for
>> > >> the
>> > >> > >> 3rd,
>> > >> > >> >> > > smallerStream collection (collection3).
>> > >> > >> >> > >
>> > >> > >> >> > > http://localhost:8983/solr/collection1/stream?expr=
>> > >> > >> >> > > hashJoin(parallel(collection2
>> > >> > >> >> > > ,
>> > >> > >> >> > > innerJoin(
>> > >> > >> >> > >  search(collection2,
>> > >> > >> >> > > q=*:*,
>> > >> > >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
>> > >> > >> >> > >              sort="a_s asc",
>> > >> > >> >> > > partitionKeys="a_s",
>> > >> > >> >> > > rows=200),
>> > >> > >> >> > >  search(collection1,
>> > >> > >> >> > > q=*:*,
>> > >> > >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> > >> > >> >> > >              sort="a_s asc",
>> > >> > >> >> > > partitionKeys="a_s",
>> > >> > >> >> > > rows=200),
>> > >> > >> >> > >          on="a_s"),
>> > >> > >> >> > > workers="2",
>> > >> > >> >> > >                  sort="a_s asc"),
>> > >> > >> >> > >          hashed=search(collection3,
>> > >> > >> >> > > q=*:*,
>> > >> > >> >> > > fl="a_s,k_s,l_s",
>> > >> > >> >> > > sort="a_s asc",
>> > >> > >> >> > > rows=200),
>> > >> > >> >> > > on="a_s")
>> > >> > >> >> > > &indent=true
>> > >> > >> >> > >
>> > >> > >> >> > >
>> > >> > >> >> > > Regards,
>> > >> > >> >> > > Edwin
>> > >> > >> >> > >
>> > >> > >> >> > >
>> > >> > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <
>> joelsolr@gmail.com>
>> > >> > wrote:
>> > >> > >> >> > >
>> > >> > >> >> > > > Sorry, it's just called hashJoin
>> > >> > >> >> > > >
>> > >> > >> >> > > > Joel Bernstein
>> > >> > >> >> > > > http://joelsolr.blogspot.com/
>> > >> > >> >> > > >
>> > >> > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
>> > >> > >> >> > > edwinyeozl@gmail.com>
>> > >> > >> >> > > > wrote:
>> > >> > >> >> > > >
>> > >> > >> >> > > > > Hi Joel,
>> > >> > >> >> > > > >
>> > >> > >> >> > > > > I am getting this error when I used the
>> innerHashJoin.
>> > >> > >> >> > > > >
>> > >> > >> >> > > > >  "EXCEPTION":"Invalid stream expression
>> > >> > innerHashJoin(parallel(
>> > >> > >> >> > > innerJoin
>> > >> > >> >> > > > >
>> > >> > >> >> > > > > I also can't find the documentation on innerHashJoin
>> for
>> > >> the
>> > >> > >> >> > Streaming
>> > >> > >> >> > > > > Expressions.
>> > >> > >> >> > > > >
>> > >> > >> >> > > > > Are you referring to hashJoin?
>> > >> > >> >> > > > >
>> > >> > >> >> > > > > Regards,
>> > >> > >> >> > > > > Edwin
>> > >> > >> >> > > > >
>> > >> > >> >> > > > >
>> > >> > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
>> > >> > >> edwinyeozl@gmail.com
>> > >> > >> >> >
>> > >> > >> >> > > > wrote:
>> > >> > >> >> > > > >
>> > >> > >> >> > > > > > Hi Joel,
>> > >> > >> >> > > > > >
>> > >> > >> >> > > > > > Thanks for the info.
>> > >> > >> >> > > > > >
>> > >> > >> >> > > > > > Regards,
>> > >> > >> >> > > > > > Edwin
>> > >> > >> >> > > > > >
>> > >> > >> >> > > > > >
>> > >> > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <
>> > >> joelsolr@gmail.com
>> > >> > >
>> > >> > >> >> wrote:
>> > >> > >> >> > > > > >
>> > >> > >> >> > > > > >> Also take a look at the documentation for the
>> "fetch"
>> > >> > >> streaming
>> > >> > >> >> > > > > >> expression.
>> > >> > >> >> > > > > >>
>> > >> > >> >> > > > > >> Joel Bernstein
>> > >> > >> >> > > > > >> http://joelsolr.blogspot.com/
>> > >> > >> >> > > > > >>
>> > >> > >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
>> > >> > >> >> > joelsolr@gmail.com>
>> > >> > >> >> > > > > >> wrote:
>> > >> > >> >> > > > > >>
>> > >> > >> >> > > > > >> > Yes you join more then one collection with
>> > Streaming
>> > >> > >> >> > Expressions.
>> > >> > >> >> > > > Here
>> > >> > >> >> > > > > >> are
>> > >> > >> >> > > > > >> > a few things to keep in mind.
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> > * You'll likely want to use the parallel
>> function
>> > >> around
>> > >> > >> the
>> > >> > >> >> > > largest
>> > >> > >> >> > > > > >> join.
>> > >> > >> >> > > > > >> > You'll need to use the join keys as the
>> > >> partitionKeys.
>> > >> > >> >> > > > > >> > * innerJoin: requires that the streams be
>> sorted on
>> > >> the
>> > >> > >> join
>> > >> > >> >> > keys.
>> > >> > >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> > So a strategy for a three collection join might
>> > look
>> > >> > like
>> > >> > >> >> this:
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
>> > >> > bigStream)),
>> > >> > >> >> > > > > smallerStream)
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> > The largest join can be done in parallel using
>> an
>> > >> > >> innerJoin.
>> > >> > >> >> You
>> > >> > >> >> > > can
>> > >> > >> >> > > > > >> then
>> > >> > >> >> > > > > >> > wrap the stream coming out of the parallel
>> function
>> > >> in
>> > >> > an
>> > >> > >> >> > > > > innerHashJoin
>> > >> > >> >> > > > > >> to
>> > >> > >> >> > > > > >> > join it to another stream.
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> > Joel Bernstein
>> > >> > >> >> > > > > >> > http://joelsolr.blogspot.com/
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin
>> > Yeo <
>> > >> > >> >> > > > > >> edwinyeozl@gmail.com>
>> > >> > >> >> > > > > >> > wrote:
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >> Hi,
>> > >> > >> >> > > > > >> >>
>> > >> > >> >> > > > > >> >> Is it possible to join more than 2 collections
>> > using
>> > >> > one
>> > >> > >> of
>> > >> > >> >> the
>> > >> > >> >> > > > > >> streaming
>> > >> > >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there
>> > other
>> > >> > ways
>> > >> > >> we
>> > >> > >> >> can
>> > >> > >> >> > > do
>> > >> > >> >> > > > > it?
>> > >> > >> >> > > > > >> >>
>> > >> > >> >> > > > > >> >> Currently, I may need to join 3 or 4
>> collections
>> > >> > >> together,
>> > >> > >> >> and
>> > >> > >> >> > to
>> > >> > >> >> > > > > >> output
>> > >> > >> >> > > > > >> >> selected fields from all these collections
>> > together.
>> > >> > >> >> > > > > >> >>
>> > >> > >> >> > > > > >> >> I'm using Solr 6.4.2.
>> > >> > >> >> > > > > >> >>
>> > >> > >> >> > > > > >> >> Regards,
>> > >> > >> >> > > > > >> >> Edwin
>> > >> > >> >> > > > > >> >>
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >> >
>> > >> > >> >> > > > > >>
>> > >> > >> >> > > > > >
>> > >> > >> >> > > > > >
>> > >> > >> >> > > > >
>> > >> > >> >> > > >
>> > >> > >> >> > >
>> > >> > >> >> >
>> > >> > >> >>
>> > >> > >> >
>> > >> > >> >
>> > >> > >>
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Joel,

Yes, the /export works after I remove the /export handler from
solrconfig.xml. Thanks for the advice.

For *:*, there will be result returned when using /export.
But if one of the queries is *:*, this means the entire resultset will
contains all the records from the query which has *:*?

Regards,
Edwin


On 5 May 2017 at 01:46, Joel Bernstein <jo...@gmail.com> wrote:

> No *:* will simply return all the results from one of the queries. It
> should still join properly. If you are using the /select handler joins will
> not work properly.
>
>
> This example worked properly for me:
>
> hashJoin(parallel(collection2, j
>                             workers=3,
>                             sort="id asc",
>                             innerJoin(search(collection2, q="*:*", fl="id",
> sort="id asc", qt="/export", partitionKeys="id"),
>                                             search(collection2,
> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> partitionKeys="id"),
>                                             on="id")),
>                 hashed=search(collection2, q="day_i:7", fl="id, day_i",
> sort="id asc", qt="/export"),
>                 on="id")
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, May 4, 2017 at 12:28 PM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> wrote:
>
> > Hi Joel,
> >
> > For the join queries, is it true that if we use q=*:* for the query for
> one
> > of the join, there will not be any results return?
> >
> > Currently I found this is the case, if I just put q=*:*.
> >
> > Regards,
> > Edwin
> >
> >
> > On 4 May 2017 at 23:38, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> >
> > > Hi Joel,
> > >
> > > I think that might be one of the reason.
> > > This is what I have for the /export handler in my solrconfig.xml
> > >
> > > <requestHandler name="/export" class="solr.SearchHandler"> <lst name=
> > > "invariants"> <str name="rq">{!xport}</str> <str name="wt">xsort</str>
> <
> > > str name="distrib">false</str> </lst> <arr name="components">
> > <str>query</
> > > str> </arr> </requestHandler>
> > >
> > > This is the error message that I get when I use the /export handler.
> > >
> > > java.io.IOException: java.util.concurrent.ExecutionException:
> > > java.io.IOException: --> http://localhost:8983/solr/
> > > collection1_shard1_replica1/: An exception has occurred on the server,
> > > refer to server log for details.
> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > > openStreams(CloudSolrStream.java:451)
> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > > open(CloudSolrStream.java:308)
> > > at org.apache.solr.client.solrj.io.stream.PushBackStream.open(
> > > PushBackStream.java:70)
> > > at org.apache.solr.client.solrj.io.stream.JoinStream.open(
> > > JoinStream.java:147)
> > > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > > open(ExceptionStream.java:51)
> > > at org.apache.solr.handler.StreamHandler$TimerStream.
> > > open(StreamHandler.java:457)
> > > at org.apache.solr.client.solrj.io.stream.TupleStream.
> > > writeMap(TupleStream.java:63)
> > > at org.apache.solr.response.JSONWriter.writeMap(
> > > JSONResponseWriter.java:547)
> > > at org.apache.solr.response.TextResponseWriter.writeVal(
> > > TextResponseWriter.java:193)
> > > at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > > JSONResponseWriter.java:209)
> > > at org.apache.solr.response.JSONWriter.writeNamedList(
> > > JSONResponseWriter.java:325)
> > > at org.apache.solr.response.JSONWriter.writeResponse(
> > > JSONResponseWriter.java:120)
> > > at org.apache.solr.response.JSONResponseWriter.write(
> > > JSONResponseWriter.java:71)
> > > at org.apache.solr.response.QueryResponseWriterUtil.
> writeQueryResponse(
> > > QueryResponseWriterUtil.java:65)
> > > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > > HttpSolrCall.java:732)
> > > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > SolrDispatchFilter.java:345)
> > > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > > SolrDispatchFilter.java:296)
> > > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > > doFilter(ServletHandler.java:1691)
> > > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > > ServletHandler.java:582)
> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > > ScopedHandler.java:143)
> > > at org.eclipse.jetty.security.SecurityHandler.handle(
> > > SecurityHandler.java:548)
> > > at org.eclipse.jetty.server.session.SessionHandler.
> > > doHandle(SessionHandler.java:226)
> > > at org.eclipse.jetty.server.handler.ContextHandler.
> > > doHandle(ContextHandler.java:1180)
> > > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> > > ServletHandler.java:512)
> > > at org.eclipse.jetty.server.session.SessionHandler.
> > > doScope(SessionHandler.java:185)
> > > at org.eclipse.jetty.server.handler.ContextHandler.
> > > doScope(ContextHandler.java:1112)
> > > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > > ScopedHandler.java:141)
> > > at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > > ContextHandlerCollection.java:213)
> > > at org.eclipse.jetty.server.handler.HandlerCollection.
> > > handle(HandlerCollection.java:119)
> > > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > > HandlerWrapper.java:134)
> > > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> > > at org.eclipse.jetty.server.HttpConnection.onFillable(
> > > HttpConnection.java:251)
> > > at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> > > AbstractConnection.java:273)
> > > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> > > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> > > SelectChannelEndPoint.java:93)
> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > > executeProduceConsume(ExecuteProduceConsume.java:303)
> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > > produceConsume(ExecuteProduceConsume.java:148)
> > > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> > > ExecuteProduceConsume.java:136)
> > > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> > > QueuedThreadPool.java:671)
> > > at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> > > QueuedThreadPool.java:589)
> > > at java.lang.Thread.run(Thread.java:745)
> > > Caused by: java.util.concurrent.ExecutionException:
> java.io.IOException:
> > > --> http://localhost:8983/solr/collection1_shard1_replica1/: An
> > exception
> > > has occurred on the server, refer to server log for details.
> > > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > > openStreams(CloudSolrStream.java:445)
> > > ... 42 more
> > > Caused by: java.io.IOException: --> http://localhost:8983/solr/
> > > collection1_shard1_replica1/: An exception has occurred on the server,
> > > refer to server log for details.
> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > > SolrStream.java:238)
> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > > TupleWrapper.next(CloudSolrStream.java:541)
> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > > StreamOpener.call(CloudSolrStream.java:564)
> > > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > > StreamOpener.call(CloudSolrStream.java:551)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > at org.apache.solr.common.util.ExecutorUtil$
> MDCAwareThreadPoolExecutor.
> > > lambda$execute$0(ExecutorUtil.java:229)
> > > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > ThreadPoolExecutor.java:1142)
> > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > ThreadPoolExecutor.java:617)
> > > ... 1 more
> > > Caused by: org.noggit.JSONParser$ParseException: JSON Parse Error:
> > > char=<,position=0 BEFORE='<' AFTER='?xml version="1.0"
> > encoding="UTF-8"?> <'
> > > at org.noggit.JSONParser.err(JSONParser.java:356)
> > > at org.noggit.JSONParser.handleNonDoubleQuoteString(
> JSONParser.java:712)
> > > at org.noggit.JSONParser.next(JSONParser.java:886)
> > > at org.noggit.JSONParser.nextEvent(JSONParser.java:930)
> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > > expect(JSONTupleStream.java:97)
> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > > advanceToDocs(JSONTupleStream.java:179)
> > > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > > next(JSONTupleStream.java:77)
> > > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > > SolrStream.java:207)
> > > ... 8 more
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 4 May 2017 at 22:54, Joel Bernstein <jo...@gmail.com> wrote:
> > >
> > >> I suspect that there is something not quite right about the how the
> > >> /export
> > >> handler is configured. Straight out of the box in solr 6.4.2  /export
> > will
> > >> be automatically configured. Are you using a Solr instance that has
> been
> > >> upgraded in the past and doesn't have standard 6.4.2 configs?
> > >>
> > >> To really do joins properly you'll have to use the /export handler
> > because
> > >> /select will not stream entire result sets (unless they are pretty
> > small).
> > >> So your results will be missing data possibly.
> > >>
> > >> I would take a close look at the logs and see what all the exceptions
> > are
> > >> when you run the a search using qt=/export. If you can post all the
> > stack
> > >> traces that get generated when you run the search we'll probably be
> able
> > >> to
> > >> spot the issue.
> > >>
> > >> About the field ordering. There is support for field ordering in the
> > >> Streaming classes but only a few places actually enforce the order.
> The
> > >> 6.5
> > >> SQL interface does keep the fields in order as does the new Tuple
> > >> expression in Solr 6.6. But the expressions you are working with
> > currently
> > >> don't enforce field ordering.
> > >>
> > >>
> > >>
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <
> > edwinyeozl@gmail.com
> > >> >
> > >> wrote:
> > >>
> > >> > Hi Joel,
> > >> >
> > >> > I have managed to get the Join to work, but so far it is only
> working
> > >> when
> > >> > I use qt="/select". It is not working when I use qt="/export".
> > >> >
> > >> > For the display of the field, is there a way to allow it to list
> them
> > in
> > >> > the order which I want?
> > >> > Currently, the display is quite random, and I can get a field in
> > >> > collection1, followed by a field in collection3, then collection1
> > again,
> > >> > and then collection2.
> > >> >
> > >> > It will be good if we can arrange the field to display in the order
> > >> that we
> > >> > want.
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> >
> > >> >
> > >> >
> > >> > On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <ed...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hi Joel,
> > >> > >
> > >> > > It works when I started off with just one expression.
> > >> > >
> > >> > > Could it be that the data size is too big for export after the
> join,
> > >> > which
> > >> > > causes the error?
> > >> > >
> > >> > > Regards,
> > >> > > Edwin
> > >> > >
> > >> > > On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com>
> wrote:
> > >> > >
> > >> > >> I was just testing with the query below and it worked for me.
> Some
> > of
> > >> > the
> > >> > >> error messages I was getting with the syntax was not what I was
> > >> > expecting
> > >> > >> though, so I'll look into the error handling. But the joins do
> work
> > >> when
> > >> > >> the syntax correct. The query below is joining to the same
> > collection
> > >> > >> three
> > >> > >> times, but the mechanics are exactly the same joining three
> > different
> > >> > >> tables. In this example each join narrows down the result set.
> > >> > >>
> > >> > >> hashJoin(parallel(collection2,
> > >> > >>                             workers=3,
> > >> > >>                             sort="id asc",
> > >> > >>                             innerJoin(search(collection2,
> q="*:*",
> > >> > >> fl="id",
> > >> > >> sort="id asc", qt="/export", partitionKeys="id"),
> > >> > >>                                             search(collection2,
> > >> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> > >> > >> partitionKeys="id"),
> > >> > >>                                             on="id")),
> > >> > >>                 hashed=search(collection2, q="day_i:7", fl="id,
> > >> day_i",
> > >> > >> sort="id asc", qt="/export"),
> > >> > >>                 on="id")
> > >> > >>
> > >> > >> Joel Bernstein
> > >> > >> http://joelsolr.blogspot.com/
> > >> > >>
> > >> > >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <
> joelsolr@gmail.com
> > >
> > >> > >> wrote:
> > >> > >>
> > >> > >> > Start off with just this expression:
> > >> > >> >
> > >> > >> > search(collection2,
> > >> > >> >             q=*:*,
> > >> > >> >             fl="a_s,b_s,c_s,d_s,e_s",
> > >> > >> >             sort="a_s asc",
> > >> > >> >             qt="/export")
> > >> > >> >
> > >> > >> > And then check the logs for exceptions.
> > >> > >> >
> > >> > >> > Joel Bernstein
> > >> > >> > http://joelsolr.blogspot.com/
> > >> > >> >
> > >> > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> > >> > >> edwinyeozl@gmail.com
> > >> > >> > > wrote:
> > >> > >> >
> > >> > >> >> Hi Joel,
> > >> > >> >>
> > >> > >> >> I am getting this error after I change add qt=/export and
> > removed
> > >> the
> > >> > >> rows
> > >> > >> >> param. Do you know what could be the reason?
> > >> > >> >>
> > >> > >> >> {
> > >> > >> >>   "error":{
> > >> > >> >>     "metadata":[
> > >> > >> >>       "error-class","org.apache.solr.common.SolrException",
> > >> > >> >>       "root-error-class","org.apache.http.
> > MalformedChunkCodingExc
> > >> e
> > >> > >> >> ption"],
> > >> > >> >>     "msg":"org.apache.http.MalformedChunkCodingException:
> CRLF
> > >> > >> expected
> > >> > >> >> at
> > >> > >> >> end of chunk",
> > >> > >> >>     "trace":"org.apache.solr.common.SolrException:
> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF expected
> at
> > >> end
> > >> > of
> > >> > >> >> chunk\r\n\tat
> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> > >> > >> >> iteMap$0(TupleStream.java:79)\r\n\tat
> > >> > >> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
> > >> > >> >> seWriter.java:523)\r\n\tat
> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> > >> > >> >> ponseWriter.java:175)\r\n\tat
> > >> > >> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
> > >> > >> >> .java:559)\r\n\tat
> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
> > >> > >> >> TupleStream.java:64)\r\n\tat
> > >> > >> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
> > >> > >> >> ter.java:547)\r\n\tat
> > >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> > >> > >> >> ponseWriter.java:193)\r\n\tat
> > >> > >> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
> > >> > >> >> ups(JSONResponseWriter.java:209)\r\n\tat
> > >> > >> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
> > >> > >> >> nseWriter.java:325)\r\n\tat
> > >> > >> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
> > >> > >> >> seWriter.java:120)\r\n\tat
> > >> > >> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
> > >> > >> >> seWriter.java:71)\r\n\tat
> > >> > >> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> > >> > >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
> > >> > >> >> all.java:732)\r\n\tat
> > >> > >> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
> > >> > >> 473)\r\n\tat
> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> > >> > >> >> atchFilter.java:345)\r\n\tat
> > >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> > >> > >> >> atchFilter.java:296)\r\n\tat
> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
> > >> > >> >> r(ServletHandler.java:1691)\r\n\tat
> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
> > >> > >> >> dler.java:582)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> > >> > >> >> Handler.java:143)\r\n\tat
> > >> > >> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
> > >> > >> >> ndler.java:548)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
> > >> > >> >> SessionHandler.java:226)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> > >> > >> >> ContextHandler.java:1180)\r\n\tat
> > >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
> > >> > >> >> ler.java:512)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> > >> > >> >> SessionHandler.java:185)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> > >> > >> >> ContextHandler.java:1112)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> > >> > >> >> Handler.java:141)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
> > >> > >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
> > >> > >> >> HandlerCollection.java:119)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
> > >> > >> >> erWrapper.java:134)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)
> \r\n\tat
> > >> > >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> > >> > >> java:320)\r\n\tat
> > >> > >> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
> > >> > >> >> ction.java:251)\r\n\tat
> > >> > >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> > >> > >> >> succeeded(AbstractConnection.java:273)\r\n\tat
> > >> > >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> > >> > >> java:95)\r\n\tat
> > >> > >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
> > >> > >> >> elEndPoint.java:93)\r\n\tat
> > >> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> > >> > >> >> .executeProduceConsume(ExecuteProduceConsume.java:
> 303)\r\n\tat
> > >> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> > >> > >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> > >> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> > >> > >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> > >> > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
> > >> > >> >> ThreadPool.java:671)\r\n\tat
> > >> > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
> > >> > >> >> hreadPool.java:589)\r\n\tat
> > >> > >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> > >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF expected
> at
> > >> end
> > >> > of
> > >> > >> >> chunk\r\n\tat
> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
> > >> > >> >> kedInputStream.java:255)\r\n\tat
> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
> > >> > >> >> InputStream.java:227)\r\n\tat
> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> > >> > >> >> Stream.java:186)\r\n\tat
> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> > >> > >> >> Stream.java:215)\r\n\tat
> > >> > >> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
> > >> > >> >> tStream.java:316)\r\n\tat
> > >> > >> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
> > >> > >> >> nagedEntity.java:164)\r\n\tat
> > >> > >> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
> > >> > >> >> orInputStream.java:228)\r\n\tat
> > >> > >> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
> > >> > >> >> utStream.java:174)\r\n\tat
> > >> > >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\
> > >> r\n\tat
> > >> > >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\
> r\n\tat
> > >> > >> >> java.io.InputStreamReader.close(InputStreamReader.java:199)\
> > >> r\n\tat
> > >> > >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
> > >> > >> >> (JSONTupleStream.java:92)\r\n\tat
> > >> > >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
> > >> > >> >> Stream.java:193)\r\n\tat
> > >> > >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
> > >> > >> >> (CloudSolrStream.java:464)\r\n\tat
> > >> > >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
> > >> > >> >> HashJoinStream.java:231)\r\n\tat
> > >> > >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
> > >> > >> >> (ExceptionStream.java:93)\r\n\tat
> > >> > >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> > >> > >> >> StreamHandler.java:452)\r\n\tat
> > >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> > >> > >> >> iteMap$0(TupleStream.java:71)\r\n\t...
> > >> > >> >> 40 more\r\n",
> > >> > >> >>     "code":500}}
> > >> > >> >>
> > >> > >> >>
> > >> > >> >> Regards,
> > >> > >> >> Edwin
> > >> > >> >>
> > >> > >> >>
> > >> > >> >> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com>
> > >> wrote:
> > >> > >> >>
> > >> > >> >> > I've reformatted the expression below and made a few
> changes.
> > >> You
> > >> > >> have
> > >> > >> >> put
> > >> > >> >> > things together properly. But these are MapReduce joins that
> > >> > require
> > >> > >> >> > exporting the entire result sets. So you will need to add
> > >> > qt=/export
> > >> > >> to
> > >> > >> >> all
> > >> > >> >> > the searches and remove the rows param. In Solr 6.6. there
> is
> > a
> > >> new
> > >> > >> >> > "shuffle" expression that does this automatically.
> > >> > >> >> >
> > >> > >> >> > To test things you'll want to break down each expression and
> > >> make
> > >> > >> sure
> > >> > >> >> it's
> > >> > >> >> > behaving as expected.
> > >> > >> >> >
> > >> > >> >> > For example first run each search. Then run the innerJoin,
> not
> > >> in
> > >> > >> >> parallel
> > >> > >> >> > mode. Then run it in parallel mode. Then try the whole
> thing.
> > >> > >> >> >
> > >> > >> >> > hashJoin(parallel(collection2,
> > >> > >> >> >                             innerJoin(search(collection2,
> > >> > >> >> >
> q=*:*,
> > >> > >> >> >
> > >> > >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> > >> > >> >> >
> > sort="a_s
> > >> > >> asc",
> > >> > >> >> >
> > >> > >> >> partitionKeys="a_s",
> > >> > >> >> >
> > >> > qt="/export"),
> > >> > >> >> >
> search(collection1,
> > >> > >> >> >
> q=*:*,
> > >> > >> >> >
> > >> > >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > >> > >> >> >
> > sort="a_s
> > >> > >> asc",
> > >> > >> >> >
> > >> > >> >>  partitionKeys="a_s",
> > >> > >> >> >
> > >> >  qt="/export"),
> > >> > >> >> >                                            on="a_s"),
> > >> > >> >> >                              workers="2",
> > >> > >> >> >                              sort="a_s asc"),
> > >> > >> >> >                hashed=search(collection3,
> > >> > >> >> >                                          q=*:*,
> > >> > >> >> >                                          fl="a_s,k_s,l_s",
> > >> > >> >> >                                          sort="a_s asc",
> > >> > >> >> >                                          qt="/export"),
> > >> > >> >> >               on="a_s")
> > >> > >> >> >
> > >> > >> >> > Joel Bernstein
> > >> > >> >> > http://joelsolr.blogspot.com/
> > >> > >> >> >
> > >> > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> > >> > >> >> edwinyeozl@gmail.com
> > >> > >> >> > >
> > >> > >> >> > wrote:
> > >> > >> >> >
> > >> > >> >> > > Hi Joel,
> > >> > >> >> > >
> > >> > >> >> > > Thanks for the clarification.
> > >> > >> >> > >
> > >> > >> >> > > Would like to check, is this the correct way to do the
> join?
> > >> > >> >> Currently, I
> > >> > >> >> > > could not get any results after putting in the hashJoin
> for
> > >> the
> > >> > >> 3rd,
> > >> > >> >> > > smallerStream collection (collection3).
> > >> > >> >> > >
> > >> > >> >> > > http://localhost:8983/solr/collection1/stream?expr=
> > >> > >> >> > > hashJoin(parallel(collection2
> > >> > >> >> > > ,
> > >> > >> >> > > innerJoin(
> > >> > >> >> > >  search(collection2,
> > >> > >> >> > > q=*:*,
> > >> > >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> > >> > >> >> > >              sort="a_s asc",
> > >> > >> >> > > partitionKeys="a_s",
> > >> > >> >> > > rows=200),
> > >> > >> >> > >  search(collection1,
> > >> > >> >> > > q=*:*,
> > >> > >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > >> > >> >> > >              sort="a_s asc",
> > >> > >> >> > > partitionKeys="a_s",
> > >> > >> >> > > rows=200),
> > >> > >> >> > >          on="a_s"),
> > >> > >> >> > > workers="2",
> > >> > >> >> > >                  sort="a_s asc"),
> > >> > >> >> > >          hashed=search(collection3,
> > >> > >> >> > > q=*:*,
> > >> > >> >> > > fl="a_s,k_s,l_s",
> > >> > >> >> > > sort="a_s asc",
> > >> > >> >> > > rows=200),
> > >> > >> >> > > on="a_s")
> > >> > >> >> > > &indent=true
> > >> > >> >> > >
> > >> > >> >> > >
> > >> > >> >> > > Regards,
> > >> > >> >> > > Edwin
> > >> > >> >> > >
> > >> > >> >> > >
> > >> > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <
> joelsolr@gmail.com>
> > >> > wrote:
> > >> > >> >> > >
> > >> > >> >> > > > Sorry, it's just called hashJoin
> > >> > >> >> > > >
> > >> > >> >> > > > Joel Bernstein
> > >> > >> >> > > > http://joelsolr.blogspot.com/
> > >> > >> >> > > >
> > >> > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> > >> > >> >> > > edwinyeozl@gmail.com>
> > >> > >> >> > > > wrote:
> > >> > >> >> > > >
> > >> > >> >> > > > > Hi Joel,
> > >> > >> >> > > > >
> > >> > >> >> > > > > I am getting this error when I used the innerHashJoin.
> > >> > >> >> > > > >
> > >> > >> >> > > > >  "EXCEPTION":"Invalid stream expression
> > >> > innerHashJoin(parallel(
> > >> > >> >> > > innerJoin
> > >> > >> >> > > > >
> > >> > >> >> > > > > I also can't find the documentation on innerHashJoin
> for
> > >> the
> > >> > >> >> > Streaming
> > >> > >> >> > > > > Expressions.
> > >> > >> >> > > > >
> > >> > >> >> > > > > Are you referring to hashJoin?
> > >> > >> >> > > > >
> > >> > >> >> > > > > Regards,
> > >> > >> >> > > > > Edwin
> > >> > >> >> > > > >
> > >> > >> >> > > > >
> > >> > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> > >> > >> edwinyeozl@gmail.com
> > >> > >> >> >
> > >> > >> >> > > > wrote:
> > >> > >> >> > > > >
> > >> > >> >> > > > > > Hi Joel,
> > >> > >> >> > > > > >
> > >> > >> >> > > > > > Thanks for the info.
> > >> > >> >> > > > > >
> > >> > >> >> > > > > > Regards,
> > >> > >> >> > > > > > Edwin
> > >> > >> >> > > > > >
> > >> > >> >> > > > > >
> > >> > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <
> > >> joelsolr@gmail.com
> > >> > >
> > >> > >> >> wrote:
> > >> > >> >> > > > > >
> > >> > >> >> > > > > >> Also take a look at the documentation for the
> "fetch"
> > >> > >> streaming
> > >> > >> >> > > > > >> expression.
> > >> > >> >> > > > > >>
> > >> > >> >> > > > > >> Joel Bernstein
> > >> > >> >> > > > > >> http://joelsolr.blogspot.com/
> > >> > >> >> > > > > >>
> > >> > >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> > >> > >> >> > joelsolr@gmail.com>
> > >> > >> >> > > > > >> wrote:
> > >> > >> >> > > > > >>
> > >> > >> >> > > > > >> > Yes you join more then one collection with
> > Streaming
> > >> > >> >> > Expressions.
> > >> > >> >> > > > Here
> > >> > >> >> > > > > >> are
> > >> > >> >> > > > > >> > a few things to keep in mind.
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> > * You'll likely want to use the parallel function
> > >> around
> > >> > >> the
> > >> > >> >> > > largest
> > >> > >> >> > > > > >> join.
> > >> > >> >> > > > > >> > You'll need to use the join keys as the
> > >> partitionKeys.
> > >> > >> >> > > > > >> > * innerJoin: requires that the streams be sorted
> on
> > >> the
> > >> > >> join
> > >> > >> >> > keys.
> > >> > >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> > So a strategy for a three collection join might
> > look
> > >> > like
> > >> > >> >> this:
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
> > >> > bigStream)),
> > >> > >> >> > > > > smallerStream)
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> > The largest join can be done in parallel using an
> > >> > >> innerJoin.
> > >> > >> >> You
> > >> > >> >> > > can
> > >> > >> >> > > > > >> then
> > >> > >> >> > > > > >> > wrap the stream coming out of the parallel
> function
> > >> in
> > >> > an
> > >> > >> >> > > > > innerHashJoin
> > >> > >> >> > > > > >> to
> > >> > >> >> > > > > >> > join it to another stream.
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> > Joel Bernstein
> > >> > >> >> > > > > >> > http://joelsolr.blogspot.com/
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin
> > Yeo <
> > >> > >> >> > > > > >> edwinyeozl@gmail.com>
> > >> > >> >> > > > > >> > wrote:
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >> Hi,
> > >> > >> >> > > > > >> >>
> > >> > >> >> > > > > >> >> Is it possible to join more than 2 collections
> > using
> > >> > one
> > >> > >> of
> > >> > >> >> the
> > >> > >> >> > > > > >> streaming
> > >> > >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there
> > other
> > >> > ways
> > >> > >> we
> > >> > >> >> can
> > >> > >> >> > > do
> > >> > >> >> > > > > it?
> > >> > >> >> > > > > >> >>
> > >> > >> >> > > > > >> >> Currently, I may need to join 3 or 4 collections
> > >> > >> together,
> > >> > >> >> and
> > >> > >> >> > to
> > >> > >> >> > > > > >> output
> > >> > >> >> > > > > >> >> selected fields from all these collections
> > together.
> > >> > >> >> > > > > >> >>
> > >> > >> >> > > > > >> >> I'm using Solr 6.4.2.
> > >> > >> >> > > > > >> >>
> > >> > >> >> > > > > >> >> Regards,
> > >> > >> >> > > > > >> >> Edwin
> > >> > >> >> > > > > >> >>
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >> >
> > >> > >> >> > > > > >>
> > >> > >> >> > > > > >
> > >> > >> >> > > > > >
> > >> > >> >> > > > >
> > >> > >> >> > > >
> > >> > >> >> > >
> > >> > >> >> >
> > >> > >> >>
> > >> > >> >
> > >> > >> >
> > >> > >>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Joining more than 2 collections

Posted by Joel Bernstein <jo...@gmail.com>.
No *:* will simply return all the results from one of the queries. It
should still join properly. If you are using the /select handler joins will
not work properly.


This example worked properly for me:

hashJoin(parallel(collection2, j
                            workers=3,
                            sort="id asc",
                            innerJoin(search(collection2, q="*:*", fl="id",
sort="id asc", qt="/export", partitionKeys="id"),
                                            search(collection2,
q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
partitionKeys="id"),
                                            on="id")),
                hashed=search(collection2, q="day_i:7", fl="id, day_i",
sort="id asc", qt="/export"),
                on="id")




Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 4, 2017 at 12:28 PM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi Joel,
>
> For the join queries, is it true that if we use q=*:* for the query for one
> of the join, there will not be any results return?
>
> Currently I found this is the case, if I just put q=*:*.
>
> Regards,
> Edwin
>
>
> On 4 May 2017 at 23:38, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
>
> > Hi Joel,
> >
> > I think that might be one of the reason.
> > This is what I have for the /export handler in my solrconfig.xml
> >
> > <requestHandler name="/export" class="solr.SearchHandler"> <lst name=
> > "invariants"> <str name="rq">{!xport}</str> <str name="wt">xsort</str> <
> > str name="distrib">false</str> </lst> <arr name="components">
> <str>query</
> > str> </arr> </requestHandler>
> >
> > This is the error message that I get when I use the /export handler.
> >
> > java.io.IOException: java.util.concurrent.ExecutionException:
> > java.io.IOException: --> http://localhost:8983/solr/
> > collection1_shard1_replica1/: An exception has occurred on the server,
> > refer to server log for details.
> > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > openStreams(CloudSolrStream.java:451)
> > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > open(CloudSolrStream.java:308)
> > at org.apache.solr.client.solrj.io.stream.PushBackStream.open(
> > PushBackStream.java:70)
> > at org.apache.solr.client.solrj.io.stream.JoinStream.open(
> > JoinStream.java:147)
> > at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > open(ExceptionStream.java:51)
> > at org.apache.solr.handler.StreamHandler$TimerStream.
> > open(StreamHandler.java:457)
> > at org.apache.solr.client.solrj.io.stream.TupleStream.
> > writeMap(TupleStream.java:63)
> > at org.apache.solr.response.JSONWriter.writeMap(
> > JSONResponseWriter.java:547)
> > at org.apache.solr.response.TextResponseWriter.writeVal(
> > TextResponseWriter.java:193)
> > at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > JSONResponseWriter.java:209)
> > at org.apache.solr.response.JSONWriter.writeNamedList(
> > JSONResponseWriter.java:325)
> > at org.apache.solr.response.JSONWriter.writeResponse(
> > JSONResponseWriter.java:120)
> > at org.apache.solr.response.JSONResponseWriter.write(
> > JSONResponseWriter.java:71)
> > at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> > QueryResponseWriterUtil.java:65)
> > at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> > HttpSolrCall.java:732)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:345)
> > at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:296)
> > at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > doFilter(ServletHandler.java:1691)
> > at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> > ServletHandler.java:582)
> > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:143)
> > at org.eclipse.jetty.security.SecurityHandler.handle(
> > SecurityHandler.java:548)
> > at org.eclipse.jetty.server.session.SessionHandler.
> > doHandle(SessionHandler.java:226)
> > at org.eclipse.jetty.server.handler.ContextHandler.
> > doHandle(ContextHandler.java:1180)
> > at org.eclipse.jetty.servlet.ServletHandler.doScope(
> > ServletHandler.java:512)
> > at org.eclipse.jetty.server.session.SessionHandler.
> > doScope(SessionHandler.java:185)
> > at org.eclipse.jetty.server.handler.ContextHandler.
> > doScope(ContextHandler.java:1112)
> > at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:141)
> > at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > ContextHandlerCollection.java:213)
> > at org.eclipse.jetty.server.handler.HandlerCollection.
> > handle(HandlerCollection.java:119)
> > at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> > HandlerWrapper.java:134)
> > at org.eclipse.jetty.server.Server.handle(Server.java:534)
> > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> > at org.eclipse.jetty.server.HttpConnection.onFillable(
> > HttpConnection.java:251)
> > at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> > AbstractConnection.java:273)
> > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> > SelectChannelEndPoint.java:93)
> > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > executeProduceConsume(ExecuteProduceConsume.java:303)
> > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> > produceConsume(ExecuteProduceConsume.java:148)
> > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> > ExecuteProduceConsume.java:136)
> > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> > QueuedThreadPool.java:671)
> > at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> > QueuedThreadPool.java:589)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.util.concurrent.ExecutionException: java.io.IOException:
> > --> http://localhost:8983/solr/collection1_shard1_replica1/: An
> exception
> > has occurred on the server, refer to server log for details.
> > at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> > at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> > at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > openStreams(CloudSolrStream.java:445)
> > ... 42 more
> > Caused by: java.io.IOException: --> http://localhost:8983/solr/
> > collection1_shard1_replica1/: An exception has occurred on the server,
> > refer to server log for details.
> > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > SolrStream.java:238)
> > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > TupleWrapper.next(CloudSolrStream.java:541)
> > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > StreamOpener.call(CloudSolrStream.java:564)
> > at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> > StreamOpener.call(CloudSolrStream.java:551)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.
> > lambda$execute$0(ExecutorUtil.java:229)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1142)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:617)
> > ... 1 more
> > Caused by: org.noggit.JSONParser$ParseException: JSON Parse Error:
> > char=<,position=0 BEFORE='<' AFTER='?xml version="1.0"
> encoding="UTF-8"?> <'
> > at org.noggit.JSONParser.err(JSONParser.java:356)
> > at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.java:712)
> > at org.noggit.JSONParser.next(JSONParser.java:886)
> > at org.noggit.JSONParser.nextEvent(JSONParser.java:930)
> > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > expect(JSONTupleStream.java:97)
> > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > advanceToDocs(JSONTupleStream.java:179)
> > at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> > next(JSONTupleStream.java:77)
> > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> > SolrStream.java:207)
> > ... 8 more
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 4 May 2017 at 22:54, Joel Bernstein <jo...@gmail.com> wrote:
> >
> >> I suspect that there is something not quite right about the how the
> >> /export
> >> handler is configured. Straight out of the box in solr 6.4.2  /export
> will
> >> be automatically configured. Are you using a Solr instance that has been
> >> upgraded in the past and doesn't have standard 6.4.2 configs?
> >>
> >> To really do joins properly you'll have to use the /export handler
> because
> >> /select will not stream entire result sets (unless they are pretty
> small).
> >> So your results will be missing data possibly.
> >>
> >> I would take a close look at the logs and see what all the exceptions
> are
> >> when you run the a search using qt=/export. If you can post all the
> stack
> >> traces that get generated when you run the search we'll probably be able
> >> to
> >> spot the issue.
> >>
> >> About the field ordering. There is support for field ordering in the
> >> Streaming classes but only a few places actually enforce the order. The
> >> 6.5
> >> SQL interface does keep the fields in order as does the new Tuple
> >> expression in Solr 6.6. But the expressions you are working with
> currently
> >> don't enforce field ordering.
> >>
> >>
> >>
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com
> >> >
> >> wrote:
> >>
> >> > Hi Joel,
> >> >
> >> > I have managed to get the Join to work, but so far it is only working
> >> when
> >> > I use qt="/select". It is not working when I use qt="/export".
> >> >
> >> > For the display of the field, is there a way to allow it to list them
> in
> >> > the order which I want?
> >> > Currently, the display is quite random, and I can get a field in
> >> > collection1, followed by a field in collection3, then collection1
> again,
> >> > and then collection2.
> >> >
> >> > It will be good if we can arrange the field to display in the order
> >> that we
> >> > want.
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> >
> >> > On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <ed...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi Joel,
> >> > >
> >> > > It works when I started off with just one expression.
> >> > >
> >> > > Could it be that the data size is too big for export after the join,
> >> > which
> >> > > causes the error?
> >> > >
> >> > > Regards,
> >> > > Edwin
> >> > >
> >> > > On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com> wrote:
> >> > >
> >> > >> I was just testing with the query below and it worked for me. Some
> of
> >> > the
> >> > >> error messages I was getting with the syntax was not what I was
> >> > expecting
> >> > >> though, so I'll look into the error handling. But the joins do work
> >> when
> >> > >> the syntax correct. The query below is joining to the same
> collection
> >> > >> three
> >> > >> times, but the mechanics are exactly the same joining three
> different
> >> > >> tables. In this example each join narrows down the result set.
> >> > >>
> >> > >> hashJoin(parallel(collection2,
> >> > >>                             workers=3,
> >> > >>                             sort="id asc",
> >> > >>                             innerJoin(search(collection2, q="*:*",
> >> > >> fl="id",
> >> > >> sort="id asc", qt="/export", partitionKeys="id"),
> >> > >>                                             search(collection2,
> >> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> >> > >> partitionKeys="id"),
> >> > >>                                             on="id")),
> >> > >>                 hashed=search(collection2, q="day_i:7", fl="id,
> >> day_i",
> >> > >> sort="id asc", qt="/export"),
> >> > >>                 on="id")
> >> > >>
> >> > >> Joel Bernstein
> >> > >> http://joelsolr.blogspot.com/
> >> > >>
> >> > >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <joelsolr@gmail.com
> >
> >> > >> wrote:
> >> > >>
> >> > >> > Start off with just this expression:
> >> > >> >
> >> > >> > search(collection2,
> >> > >> >             q=*:*,
> >> > >> >             fl="a_s,b_s,c_s,d_s,e_s",
> >> > >> >             sort="a_s asc",
> >> > >> >             qt="/export")
> >> > >> >
> >> > >> > And then check the logs for exceptions.
> >> > >> >
> >> > >> > Joel Bernstein
> >> > >> > http://joelsolr.blogspot.com/
> >> > >> >
> >> > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> >> > >> edwinyeozl@gmail.com
> >> > >> > > wrote:
> >> > >> >
> >> > >> >> Hi Joel,
> >> > >> >>
> >> > >> >> I am getting this error after I change add qt=/export and
> removed
> >> the
> >> > >> rows
> >> > >> >> param. Do you know what could be the reason?
> >> > >> >>
> >> > >> >> {
> >> > >> >>   "error":{
> >> > >> >>     "metadata":[
> >> > >> >>       "error-class","org.apache.solr.common.SolrException",
> >> > >> >>       "root-error-class","org.apache.http.
> MalformedChunkCodingExc
> >> e
> >> > >> >> ption"],
> >> > >> >>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF
> >> > >> expected
> >> > >> >> at
> >> > >> >> end of chunk",
> >> > >> >>     "trace":"org.apache.solr.common.SolrException:
> >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at
> >> end
> >> > of
> >> > >> >> chunk\r\n\tat
> >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> >> > >> >> iteMap$0(TupleStream.java:79)\r\n\tat
> >> > >> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
> >> > >> >> seWriter.java:523)\r\n\tat
> >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> >> > >> >> ponseWriter.java:175)\r\n\tat
> >> > >> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
> >> > >> >> .java:559)\r\n\tat
> >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
> >> > >> >> TupleStream.java:64)\r\n\tat
> >> > >> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
> >> > >> >> ter.java:547)\r\n\tat
> >> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> >> > >> >> ponseWriter.java:193)\r\n\tat
> >> > >> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
> >> > >> >> ups(JSONResponseWriter.java:209)\r\n\tat
> >> > >> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
> >> > >> >> nseWriter.java:325)\r\n\tat
> >> > >> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
> >> > >> >> seWriter.java:120)\r\n\tat
> >> > >> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
> >> > >> >> seWriter.java:71)\r\n\tat
> >> > >> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> >> > >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> >> > >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
> >> > >> >> all.java:732)\r\n\tat
> >> > >> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
> >> > >> 473)\r\n\tat
> >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> >> > >> >> atchFilter.java:345)\r\n\tat
> >> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> >> > >> >> atchFilter.java:296)\r\n\tat
> >> > >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
> >> > >> >> r(ServletHandler.java:1691)\r\n\tat
> >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
> >> > >> >> dler.java:582)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> >> > >> >> Handler.java:143)\r\n\tat
> >> > >> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
> >> > >> >> ndler.java:548)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
> >> > >> >> SessionHandler.java:226)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> >> > >> >> ContextHandler.java:1180)\r\n\tat
> >> > >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
> >> > >> >> ler.java:512)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> >> > >> >> SessionHandler.java:185)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> >> > >> >> ContextHandler.java:1112)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> >> > >> >> Handler.java:141)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
> >> > >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
> >> > >> >> HandlerCollection.java:119)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
> >> > >> >> erWrapper.java:134)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> >> > >> java:320)\r\n\tat
> >> > >> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
> >> > >> >> ction.java:251)\r\n\tat
> >> > >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> >> > >> >> succeeded(AbstractConnection.java:273)\r\n\tat
> >> > >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> >> > >> java:95)\r\n\tat
> >> > >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
> >> > >> >> elEndPoint.java:93)\r\n\tat
> >> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> > >> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
> >> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> > >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> >> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> > >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> >> > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
> >> > >> >> ThreadPool.java:671)\r\n\tat
> >> > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
> >> > >> >> hreadPool.java:589)\r\n\tat
> >> > >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> >> > >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at
> >> end
> >> > of
> >> > >> >> chunk\r\n\tat
> >> > >> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
> >> > >> >> kedInputStream.java:255)\r\n\tat
> >> > >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
> >> > >> >> InputStream.java:227)\r\n\tat
> >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> >> > >> >> Stream.java:186)\r\n\tat
> >> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> >> > >> >> Stream.java:215)\r\n\tat
> >> > >> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
> >> > >> >> tStream.java:316)\r\n\tat
> >> > >> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
> >> > >> >> nagedEntity.java:164)\r\n\tat
> >> > >> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
> >> > >> >> orInputStream.java:228)\r\n\tat
> >> > >> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
> >> > >> >> utStream.java:174)\r\n\tat
> >> > >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\
> >> r\n\tat
> >> > >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
> >> > >> >> java.io.InputStreamReader.close(InputStreamReader.java:199)\
> >> r\n\tat
> >> > >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
> >> > >> >> (JSONTupleStream.java:92)\r\n\tat
> >> > >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
> >> > >> >> Stream.java:193)\r\n\tat
> >> > >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
> >> > >> >> (CloudSolrStream.java:464)\r\n\tat
> >> > >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
> >> > >> >> HashJoinStream.java:231)\r\n\tat
> >> > >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
> >> > >> >> (ExceptionStream.java:93)\r\n\tat
> >> > >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> >> > >> >> StreamHandler.java:452)\r\n\tat
> >> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> >> > >> >> iteMap$0(TupleStream.java:71)\r\n\t...
> >> > >> >> 40 more\r\n",
> >> > >> >>     "code":500}}
> >> > >> >>
> >> > >> >>
> >> > >> >> Regards,
> >> > >> >> Edwin
> >> > >> >>
> >> > >> >>
> >> > >> >> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com>
> >> wrote:
> >> > >> >>
> >> > >> >> > I've reformatted the expression below and made a few changes.
> >> You
> >> > >> have
> >> > >> >> put
> >> > >> >> > things together properly. But these are MapReduce joins that
> >> > require
> >> > >> >> > exporting the entire result sets. So you will need to add
> >> > qt=/export
> >> > >> to
> >> > >> >> all
> >> > >> >> > the searches and remove the rows param. In Solr 6.6. there is
> a
> >> new
> >> > >> >> > "shuffle" expression that does this automatically.
> >> > >> >> >
> >> > >> >> > To test things you'll want to break down each expression and
> >> make
> >> > >> sure
> >> > >> >> it's
> >> > >> >> > behaving as expected.
> >> > >> >> >
> >> > >> >> > For example first run each search. Then run the innerJoin, not
> >> in
> >> > >> >> parallel
> >> > >> >> > mode. Then run it in parallel mode. Then try the whole thing.
> >> > >> >> >
> >> > >> >> > hashJoin(parallel(collection2,
> >> > >> >> >                             innerJoin(search(collection2,
> >> > >> >> >                                                        q=*:*,
> >> > >> >> >
> >> > >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> >> > >> >> >
> sort="a_s
> >> > >> asc",
> >> > >> >> >
> >> > >> >> partitionKeys="a_s",
> >> > >> >> >
> >> > qt="/export"),
> >> > >> >> >                                            search(collection1,
> >> > >> >> >                                                        q=*:*,
> >> > >> >> >
> >> > >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> > >> >> >
> sort="a_s
> >> > >> asc",
> >> > >> >> >
> >> > >> >>  partitionKeys="a_s",
> >> > >> >> >
> >> >  qt="/export"),
> >> > >> >> >                                            on="a_s"),
> >> > >> >> >                              workers="2",
> >> > >> >> >                              sort="a_s asc"),
> >> > >> >> >                hashed=search(collection3,
> >> > >> >> >                                          q=*:*,
> >> > >> >> >                                          fl="a_s,k_s,l_s",
> >> > >> >> >                                          sort="a_s asc",
> >> > >> >> >                                          qt="/export"),
> >> > >> >> >               on="a_s")
> >> > >> >> >
> >> > >> >> > Joel Bernstein
> >> > >> >> > http://joelsolr.blogspot.com/
> >> > >> >> >
> >> > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> >> > >> >> edwinyeozl@gmail.com
> >> > >> >> > >
> >> > >> >> > wrote:
> >> > >> >> >
> >> > >> >> > > Hi Joel,
> >> > >> >> > >
> >> > >> >> > > Thanks for the clarification.
> >> > >> >> > >
> >> > >> >> > > Would like to check, is this the correct way to do the join?
> >> > >> >> Currently, I
> >> > >> >> > > could not get any results after putting in the hashJoin for
> >> the
> >> > >> 3rd,
> >> > >> >> > > smallerStream collection (collection3).
> >> > >> >> > >
> >> > >> >> > > http://localhost:8983/solr/collection1/stream?expr=
> >> > >> >> > > hashJoin(parallel(collection2
> >> > >> >> > > ,
> >> > >> >> > > innerJoin(
> >> > >> >> > >  search(collection2,
> >> > >> >> > > q=*:*,
> >> > >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> >> > >> >> > >              sort="a_s asc",
> >> > >> >> > > partitionKeys="a_s",
> >> > >> >> > > rows=200),
> >> > >> >> > >  search(collection1,
> >> > >> >> > > q=*:*,
> >> > >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> > >> >> > >              sort="a_s asc",
> >> > >> >> > > partitionKeys="a_s",
> >> > >> >> > > rows=200),
> >> > >> >> > >          on="a_s"),
> >> > >> >> > > workers="2",
> >> > >> >> > >                  sort="a_s asc"),
> >> > >> >> > >          hashed=search(collection3,
> >> > >> >> > > q=*:*,
> >> > >> >> > > fl="a_s,k_s,l_s",
> >> > >> >> > > sort="a_s asc",
> >> > >> >> > > rows=200),
> >> > >> >> > > on="a_s")
> >> > >> >> > > &indent=true
> >> > >> >> > >
> >> > >> >> > >
> >> > >> >> > > Regards,
> >> > >> >> > > Edwin
> >> > >> >> > >
> >> > >> >> > >
> >> > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com>
> >> > wrote:
> >> > >> >> > >
> >> > >> >> > > > Sorry, it's just called hashJoin
> >> > >> >> > > >
> >> > >> >> > > > Joel Bernstein
> >> > >> >> > > > http://joelsolr.blogspot.com/
> >> > >> >> > > >
> >> > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> >> > >> >> > > edwinyeozl@gmail.com>
> >> > >> >> > > > wrote:
> >> > >> >> > > >
> >> > >> >> > > > > Hi Joel,
> >> > >> >> > > > >
> >> > >> >> > > > > I am getting this error when I used the innerHashJoin.
> >> > >> >> > > > >
> >> > >> >> > > > >  "EXCEPTION":"Invalid stream expression
> >> > innerHashJoin(parallel(
> >> > >> >> > > innerJoin
> >> > >> >> > > > >
> >> > >> >> > > > > I also can't find the documentation on innerHashJoin for
> >> the
> >> > >> >> > Streaming
> >> > >> >> > > > > Expressions.
> >> > >> >> > > > >
> >> > >> >> > > > > Are you referring to hashJoin?
> >> > >> >> > > > >
> >> > >> >> > > > > Regards,
> >> > >> >> > > > > Edwin
> >> > >> >> > > > >
> >> > >> >> > > > >
> >> > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> >> > >> edwinyeozl@gmail.com
> >> > >> >> >
> >> > >> >> > > > wrote:
> >> > >> >> > > > >
> >> > >> >> > > > > > Hi Joel,
> >> > >> >> > > > > >
> >> > >> >> > > > > > Thanks for the info.
> >> > >> >> > > > > >
> >> > >> >> > > > > > Regards,
> >> > >> >> > > > > > Edwin
> >> > >> >> > > > > >
> >> > >> >> > > > > >
> >> > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <
> >> joelsolr@gmail.com
> >> > >
> >> > >> >> wrote:
> >> > >> >> > > > > >
> >> > >> >> > > > > >> Also take a look at the documentation for the "fetch"
> >> > >> streaming
> >> > >> >> > > > > >> expression.
> >> > >> >> > > > > >>
> >> > >> >> > > > > >> Joel Bernstein
> >> > >> >> > > > > >> http://joelsolr.blogspot.com/
> >> > >> >> > > > > >>
> >> > >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> >> > >> >> > joelsolr@gmail.com>
> >> > >> >> > > > > >> wrote:
> >> > >> >> > > > > >>
> >> > >> >> > > > > >> > Yes you join more then one collection with
> Streaming
> >> > >> >> > Expressions.
> >> > >> >> > > > Here
> >> > >> >> > > > > >> are
> >> > >> >> > > > > >> > a few things to keep in mind.
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> > * You'll likely want to use the parallel function
> >> around
> >> > >> the
> >> > >> >> > > largest
> >> > >> >> > > > > >> join.
> >> > >> >> > > > > >> > You'll need to use the join keys as the
> >> partitionKeys.
> >> > >> >> > > > > >> > * innerJoin: requires that the streams be sorted on
> >> the
> >> > >> join
> >> > >> >> > keys.
> >> > >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> > So a strategy for a three collection join might
> look
> >> > like
> >> > >> >> this:
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
> >> > bigStream)),
> >> > >> >> > > > > smallerStream)
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> > The largest join can be done in parallel using an
> >> > >> innerJoin.
> >> > >> >> You
> >> > >> >> > > can
> >> > >> >> > > > > >> then
> >> > >> >> > > > > >> > wrap the stream coming out of the parallel function
> >> in
> >> > an
> >> > >> >> > > > > innerHashJoin
> >> > >> >> > > > > >> to
> >> > >> >> > > > > >> > join it to another stream.
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> > Joel Bernstein
> >> > >> >> > > > > >> > http://joelsolr.blogspot.com/
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin
> Yeo <
> >> > >> >> > > > > >> edwinyeozl@gmail.com>
> >> > >> >> > > > > >> > wrote:
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >> Hi,
> >> > >> >> > > > > >> >>
> >> > >> >> > > > > >> >> Is it possible to join more than 2 collections
> using
> >> > one
> >> > >> of
> >> > >> >> the
> >> > >> >> > > > > >> streaming
> >> > >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there
> other
> >> > ways
> >> > >> we
> >> > >> >> can
> >> > >> >> > > do
> >> > >> >> > > > > it?
> >> > >> >> > > > > >> >>
> >> > >> >> > > > > >> >> Currently, I may need to join 3 or 4 collections
> >> > >> together,
> >> > >> >> and
> >> > >> >> > to
> >> > >> >> > > > > >> output
> >> > >> >> > > > > >> >> selected fields from all these collections
> together.
> >> > >> >> > > > > >> >>
> >> > >> >> > > > > >> >> I'm using Solr 6.4.2.
> >> > >> >> > > > > >> >>
> >> > >> >> > > > > >> >> Regards,
> >> > >> >> > > > > >> >> Edwin
> >> > >> >> > > > > >> >>
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >> >
> >> > >> >> > > > > >>
> >> > >> >> > > > > >
> >> > >> >> > > > > >
> >> > >> >> > > > >
> >> > >> >> > > >
> >> > >> >> > >
> >> > >> >> >
> >> > >> >>
> >> > >> >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Joel,

For the join queries, is it true that if we use q=*:* for the query for one
of the join, there will not be any results return?

Currently I found this is the case, if I just put q=*:*.

Regards,
Edwin


On 4 May 2017 at 23:38, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:

> Hi Joel,
>
> I think that might be one of the reason.
> This is what I have for the /export handler in my solrconfig.xml
>
> <requestHandler name="/export" class="solr.SearchHandler"> <lst name=
> "invariants"> <str name="rq">{!xport}</str> <str name="wt">xsort</str> <
> str name="distrib">false</str> </lst> <arr name="components"> <str>query</
> str> </arr> </requestHandler>
>
> This is the error message that I get when I use the /export handler.
>
> java.io.IOException: java.util.concurrent.ExecutionException:
> java.io.IOException: --> http://localhost:8983/solr/
> collection1_shard1_replica1/: An exception has occurred on the server,
> refer to server log for details.
> at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> openStreams(CloudSolrStream.java:451)
> at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> open(CloudSolrStream.java:308)
> at org.apache.solr.client.solrj.io.stream.PushBackStream.open(
> PushBackStream.java:70)
> at org.apache.solr.client.solrj.io.stream.JoinStream.open(
> JoinStream.java:147)
> at org.apache.solr.client.solrj.io.stream.ExceptionStream.
> open(ExceptionStream.java:51)
> at org.apache.solr.handler.StreamHandler$TimerStream.
> open(StreamHandler.java:457)
> at org.apache.solr.client.solrj.io.stream.TupleStream.
> writeMap(TupleStream.java:63)
> at org.apache.solr.response.JSONWriter.writeMap(
> JSONResponseWriter.java:547)
> at org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWriter.java:193)
> at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> JSONResponseWriter.java:209)
> at org.apache.solr.response.JSONWriter.writeNamedList(
> JSONResponseWriter.java:325)
> at org.apache.solr.response.JSONWriter.writeResponse(
> JSONResponseWriter.java:120)
> at org.apache.solr.response.JSONResponseWriter.write(
> JSONResponseWriter.java:71)
> at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> QueryResponseWriterUtil.java:65)
> at org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrCall.java:732)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:345)
> at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:296)
> at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)
> at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)
> at org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)
> at org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)
> at org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)
> at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)
> at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)
> at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)
> at org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)
> at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)
> at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> executeProduceConsume(ExecuteProduceConsume.java:303)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceConsume(ExecuteProduceConsume.java:148)
> at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:136)
> at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:671)
> at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.ExecutionException: java.io.IOException:
> --> http://localhost:8983/solr/collection1_shard1_replica1/: An exception
> has occurred on the server, refer to server log for details.
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> openStreams(CloudSolrStream.java:445)
> ... 42 more
> Caused by: java.io.IOException: --> http://localhost:8983/solr/
> collection1_shard1_replica1/: An exception has occurred on the server,
> refer to server log for details.
> at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> SolrStream.java:238)
> at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> TupleWrapper.next(CloudSolrStream.java:541)
> at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> StreamOpener.call(CloudSolrStream.java:564)
> at org.apache.solr.client.solrj.io.stream.CloudSolrStream$
> StreamOpener.call(CloudSolrStream.java:551)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.
> lambda$execute$0(ExecutorUtil.java:229)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> ... 1 more
> Caused by: org.noggit.JSONParser$ParseException: JSON Parse Error:
> char=<,position=0 BEFORE='<' AFTER='?xml version="1.0" encoding="UTF-8"?> <'
> at org.noggit.JSONParser.err(JSONParser.java:356)
> at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.java:712)
> at org.noggit.JSONParser.next(JSONParser.java:886)
> at org.noggit.JSONParser.nextEvent(JSONParser.java:930)
> at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> expect(JSONTupleStream.java:97)
> at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> advanceToDocs(JSONTupleStream.java:179)
> at org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> next(JSONTupleStream.java:77)
> at org.apache.solr.client.solrj.io.stream.SolrStream.read(
> SolrStream.java:207)
> ... 8 more
>
>
> Regards,
> Edwin
>
>
> On 4 May 2017 at 22:54, Joel Bernstein <jo...@gmail.com> wrote:
>
>> I suspect that there is something not quite right about the how the
>> /export
>> handler is configured. Straight out of the box in solr 6.4.2  /export will
>> be automatically configured. Are you using a Solr instance that has been
>> upgraded in the past and doesn't have standard 6.4.2 configs?
>>
>> To really do joins properly you'll have to use the /export handler because
>> /select will not stream entire result sets (unless they are pretty small).
>> So your results will be missing data possibly.
>>
>> I would take a close look at the logs and see what all the exceptions are
>> when you run the a search using qt=/export. If you can post all the stack
>> traces that get generated when you run the search we'll probably be able
>> to
>> spot the issue.
>>
>> About the field ordering. There is support for field ordering in the
>> Streaming classes but only a few places actually enforce the order. The
>> 6.5
>> SQL interface does keep the fields in order as does the new Tuple
>> expression in Solr 6.6. But the expressions you are working with currently
>> don't enforce field ordering.
>>
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
>> >
>> wrote:
>>
>> > Hi Joel,
>> >
>> > I have managed to get the Join to work, but so far it is only working
>> when
>> > I use qt="/select". It is not working when I use qt="/export".
>> >
>> > For the display of the field, is there a way to allow it to list them in
>> > the order which I want?
>> > Currently, the display is quite random, and I can get a field in
>> > collection1, followed by a field in collection3, then collection1 again,
>> > and then collection2.
>> >
>> > It will be good if we can arrange the field to display in the order
>> that we
>> > want.
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> >
>> > On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <ed...@gmail.com>
>> wrote:
>> >
>> > > Hi Joel,
>> > >
>> > > It works when I started off with just one expression.
>> > >
>> > > Could it be that the data size is too big for export after the join,
>> > which
>> > > causes the error?
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> > > On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com> wrote:
>> > >
>> > >> I was just testing with the query below and it worked for me. Some of
>> > the
>> > >> error messages I was getting with the syntax was not what I was
>> > expecting
>> > >> though, so I'll look into the error handling. But the joins do work
>> when
>> > >> the syntax correct. The query below is joining to the same collection
>> > >> three
>> > >> times, but the mechanics are exactly the same joining three different
>> > >> tables. In this example each join narrows down the result set.
>> > >>
>> > >> hashJoin(parallel(collection2,
>> > >>                             workers=3,
>> > >>                             sort="id asc",
>> > >>                             innerJoin(search(collection2, q="*:*",
>> > >> fl="id",
>> > >> sort="id asc", qt="/export", partitionKeys="id"),
>> > >>                                             search(collection2,
>> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
>> > >> partitionKeys="id"),
>> > >>                                             on="id")),
>> > >>                 hashed=search(collection2, q="day_i:7", fl="id,
>> day_i",
>> > >> sort="id asc", qt="/export"),
>> > >>                 on="id")
>> > >>
>> > >> Joel Bernstein
>> > >> http://joelsolr.blogspot.com/
>> > >>
>> > >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <jo...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Start off with just this expression:
>> > >> >
>> > >> > search(collection2,
>> > >> >             q=*:*,
>> > >> >             fl="a_s,b_s,c_s,d_s,e_s",
>> > >> >             sort="a_s asc",
>> > >> >             qt="/export")
>> > >> >
>> > >> > And then check the logs for exceptions.
>> > >> >
>> > >> > Joel Bernstein
>> > >> > http://joelsolr.blogspot.com/
>> > >> >
>> > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
>> > >> edwinyeozl@gmail.com
>> > >> > > wrote:
>> > >> >
>> > >> >> Hi Joel,
>> > >> >>
>> > >> >> I am getting this error after I change add qt=/export and removed
>> the
>> > >> rows
>> > >> >> param. Do you know what could be the reason?
>> > >> >>
>> > >> >> {
>> > >> >>   "error":{
>> > >> >>     "metadata":[
>> > >> >>       "error-class","org.apache.solr.common.SolrException",
>> > >> >>       "root-error-class","org.apache.http.MalformedChunkCodingExc
>> e
>> > >> >> ption"],
>> > >> >>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF
>> > >> expected
>> > >> >> at
>> > >> >> end of chunk",
>> > >> >>     "trace":"org.apache.solr.common.SolrException:
>> > >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at
>> end
>> > of
>> > >> >> chunk\r\n\tat
>> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> > >> >> iteMap$0(TupleStream.java:79)\r\n\tat
>> > >> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
>> > >> >> seWriter.java:523)\r\n\tat
>> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> > >> >> ponseWriter.java:175)\r\n\tat
>> > >> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
>> > >> >> .java:559)\r\n\tat
>> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
>> > >> >> TupleStream.java:64)\r\n\tat
>> > >> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
>> > >> >> ter.java:547)\r\n\tat
>> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> > >> >> ponseWriter.java:193)\r\n\tat
>> > >> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
>> > >> >> ups(JSONResponseWriter.java:209)\r\n\tat
>> > >> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
>> > >> >> nseWriter.java:325)\r\n\tat
>> > >> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
>> > >> >> seWriter.java:120)\r\n\tat
>> > >> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
>> > >> >> seWriter.java:71)\r\n\tat
>> > >> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
>> > >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
>> > >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
>> > >> >> all.java:732)\r\n\tat
>> > >> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
>> > >> 473)\r\n\tat
>> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> > >> >> atchFilter.java:345)\r\n\tat
>> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> > >> >> atchFilter.java:296)\r\n\tat
>> > >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>> > >> >> r(ServletHandler.java:1691)\r\n\tat
>> > >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
>> > >> >> dler.java:582)\r\n\tat
>> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> > >> >> Handler.java:143)\r\n\tat
>> > >> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
>> > >> >> ndler.java:548)\r\n\tat
>> > >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
>> > >> >> SessionHandler.java:226)\r\n\tat
>> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>> > >> >> ContextHandler.java:1180)\r\n\tat
>> > >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
>> > >> >> ler.java:512)\r\n\tat
>> > >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
>> > >> >> SessionHandler.java:185)\r\n\tat
>> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>> > >> >> ContextHandler.java:1112)\r\n\tat
>> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> > >> >> Handler.java:141)\r\n\tat
>> > >> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
>> > >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
>> > >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>> > >> >> HandlerCollection.java:119)\r\n\tat
>> > >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
>> > >> >> erWrapper.java:134)\r\n\tat
>> > >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
>> > >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
>> > >> java:320)\r\n\tat
>> > >> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
>> > >> >> ction.java:251)\r\n\tat
>> > >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
>> > >> >> succeeded(AbstractConnection.java:273)\r\n\tat
>> > >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
>> > >> java:95)\r\n\tat
>> > >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
>> > >> >> elEndPoint.java:93)\r\n\tat
>> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> > >> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
>> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> > >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
>> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> > >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
>> > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
>> > >> >> ThreadPool.java:671)\r\n\tat
>> > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
>> > >> >> hreadPool.java:589)\r\n\tat
>> > >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
>> > >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at
>> end
>> > of
>> > >> >> chunk\r\n\tat
>> > >> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
>> > >> >> kedInputStream.java:255)\r\n\tat
>> > >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
>> > >> >> InputStream.java:227)\r\n\tat
>> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> > >> >> Stream.java:186)\r\n\tat
>> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> > >> >> Stream.java:215)\r\n\tat
>> > >> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
>> > >> >> tStream.java:316)\r\n\tat
>> > >> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
>> > >> >> nagedEntity.java:164)\r\n\tat
>> > >> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
>> > >> >> orInputStream.java:228)\r\n\tat
>> > >> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
>> > >> >> utStream.java:174)\r\n\tat
>> > >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\
>> r\n\tat
>> > >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
>> > >> >> java.io.InputStreamReader.close(InputStreamReader.java:199)\
>> r\n\tat
>> > >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
>> > >> >> (JSONTupleStream.java:92)\r\n\tat
>> > >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
>> > >> >> Stream.java:193)\r\n\tat
>> > >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
>> > >> >> (CloudSolrStream.java:464)\r\n\tat
>> > >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
>> > >> >> HashJoinStream.java:231)\r\n\tat
>> > >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
>> > >> >> (ExceptionStream.java:93)\r\n\tat
>> > >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
>> > >> >> StreamHandler.java:452)\r\n\tat
>> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> > >> >> iteMap$0(TupleStream.java:71)\r\n\t...
>> > >> >> 40 more\r\n",
>> > >> >>     "code":500}}
>> > >> >>
>> > >> >>
>> > >> >> Regards,
>> > >> >> Edwin
>> > >> >>
>> > >> >>
>> > >> >> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com>
>> wrote:
>> > >> >>
>> > >> >> > I've reformatted the expression below and made a few changes.
>> You
>> > >> have
>> > >> >> put
>> > >> >> > things together properly. But these are MapReduce joins that
>> > require
>> > >> >> > exporting the entire result sets. So you will need to add
>> > qt=/export
>> > >> to
>> > >> >> all
>> > >> >> > the searches and remove the rows param. In Solr 6.6. there is a
>> new
>> > >> >> > "shuffle" expression that does this automatically.
>> > >> >> >
>> > >> >> > To test things you'll want to break down each expression and
>> make
>> > >> sure
>> > >> >> it's
>> > >> >> > behaving as expected.
>> > >> >> >
>> > >> >> > For example first run each search. Then run the innerJoin, not
>> in
>> > >> >> parallel
>> > >> >> > mode. Then run it in parallel mode. Then try the whole thing.
>> > >> >> >
>> > >> >> > hashJoin(parallel(collection2,
>> > >> >> >                             innerJoin(search(collection2,
>> > >> >> >                                                        q=*:*,
>> > >> >> >
>> > >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
>> > >> >> >                                                        sort="a_s
>> > >> asc",
>> > >> >> >
>> > >> >> partitionKeys="a_s",
>> > >> >> >
>> > qt="/export"),
>> > >> >> >                                            search(collection1,
>> > >> >> >                                                        q=*:*,
>> > >> >> >
>> > >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> > >> >> >                                                        sort="a_s
>> > >> asc",
>> > >> >> >
>> > >> >>  partitionKeys="a_s",
>> > >> >> >
>> >  qt="/export"),
>> > >> >> >                                            on="a_s"),
>> > >> >> >                              workers="2",
>> > >> >> >                              sort="a_s asc"),
>> > >> >> >                hashed=search(collection3,
>> > >> >> >                                          q=*:*,
>> > >> >> >                                          fl="a_s,k_s,l_s",
>> > >> >> >                                          sort="a_s asc",
>> > >> >> >                                          qt="/export"),
>> > >> >> >               on="a_s")
>> > >> >> >
>> > >> >> > Joel Bernstein
>> > >> >> > http://joelsolr.blogspot.com/
>> > >> >> >
>> > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
>> > >> >> edwinyeozl@gmail.com
>> > >> >> > >
>> > >> >> > wrote:
>> > >> >> >
>> > >> >> > > Hi Joel,
>> > >> >> > >
>> > >> >> > > Thanks for the clarification.
>> > >> >> > >
>> > >> >> > > Would like to check, is this the correct way to do the join?
>> > >> >> Currently, I
>> > >> >> > > could not get any results after putting in the hashJoin for
>> the
>> > >> 3rd,
>> > >> >> > > smallerStream collection (collection3).
>> > >> >> > >
>> > >> >> > > http://localhost:8983/solr/collection1/stream?expr=
>> > >> >> > > hashJoin(parallel(collection2
>> > >> >> > > ,
>> > >> >> > > innerJoin(
>> > >> >> > >  search(collection2,
>> > >> >> > > q=*:*,
>> > >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
>> > >> >> > >              sort="a_s asc",
>> > >> >> > > partitionKeys="a_s",
>> > >> >> > > rows=200),
>> > >> >> > >  search(collection1,
>> > >> >> > > q=*:*,
>> > >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> > >> >> > >              sort="a_s asc",
>> > >> >> > > partitionKeys="a_s",
>> > >> >> > > rows=200),
>> > >> >> > >          on="a_s"),
>> > >> >> > > workers="2",
>> > >> >> > >                  sort="a_s asc"),
>> > >> >> > >          hashed=search(collection3,
>> > >> >> > > q=*:*,
>> > >> >> > > fl="a_s,k_s,l_s",
>> > >> >> > > sort="a_s asc",
>> > >> >> > > rows=200),
>> > >> >> > > on="a_s")
>> > >> >> > > &indent=true
>> > >> >> > >
>> > >> >> > >
>> > >> >> > > Regards,
>> > >> >> > > Edwin
>> > >> >> > >
>> > >> >> > >
>> > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com>
>> > wrote:
>> > >> >> > >
>> > >> >> > > > Sorry, it's just called hashJoin
>> > >> >> > > >
>> > >> >> > > > Joel Bernstein
>> > >> >> > > > http://joelsolr.blogspot.com/
>> > >> >> > > >
>> > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
>> > >> >> > > edwinyeozl@gmail.com>
>> > >> >> > > > wrote:
>> > >> >> > > >
>> > >> >> > > > > Hi Joel,
>> > >> >> > > > >
>> > >> >> > > > > I am getting this error when I used the innerHashJoin.
>> > >> >> > > > >
>> > >> >> > > > >  "EXCEPTION":"Invalid stream expression
>> > innerHashJoin(parallel(
>> > >> >> > > innerJoin
>> > >> >> > > > >
>> > >> >> > > > > I also can't find the documentation on innerHashJoin for
>> the
>> > >> >> > Streaming
>> > >> >> > > > > Expressions.
>> > >> >> > > > >
>> > >> >> > > > > Are you referring to hashJoin?
>> > >> >> > > > >
>> > >> >> > > > > Regards,
>> > >> >> > > > > Edwin
>> > >> >> > > > >
>> > >> >> > > > >
>> > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
>> > >> edwinyeozl@gmail.com
>> > >> >> >
>> > >> >> > > > wrote:
>> > >> >> > > > >
>> > >> >> > > > > > Hi Joel,
>> > >> >> > > > > >
>> > >> >> > > > > > Thanks for the info.
>> > >> >> > > > > >
>> > >> >> > > > > > Regards,
>> > >> >> > > > > > Edwin
>> > >> >> > > > > >
>> > >> >> > > > > >
>> > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <
>> joelsolr@gmail.com
>> > >
>> > >> >> wrote:
>> > >> >> > > > > >
>> > >> >> > > > > >> Also take a look at the documentation for the "fetch"
>> > >> streaming
>> > >> >> > > > > >> expression.
>> > >> >> > > > > >>
>> > >> >> > > > > >> Joel Bernstein
>> > >> >> > > > > >> http://joelsolr.blogspot.com/
>> > >> >> > > > > >>
>> > >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
>> > >> >> > joelsolr@gmail.com>
>> > >> >> > > > > >> wrote:
>> > >> >> > > > > >>
>> > >> >> > > > > >> > Yes you join more then one collection with Streaming
>> > >> >> > Expressions.
>> > >> >> > > > Here
>> > >> >> > > > > >> are
>> > >> >> > > > > >> > a few things to keep in mind.
>> > >> >> > > > > >> >
>> > >> >> > > > > >> > * You'll likely want to use the parallel function
>> around
>> > >> the
>> > >> >> > > largest
>> > >> >> > > > > >> join.
>> > >> >> > > > > >> > You'll need to use the join keys as the
>> partitionKeys.
>> > >> >> > > > > >> > * innerJoin: requires that the streams be sorted on
>> the
>> > >> join
>> > >> >> > keys.
>> > >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
>> > >> >> > > > > >> >
>> > >> >> > > > > >> > So a strategy for a three collection join might look
>> > like
>> > >> >> this:
>> > >> >> > > > > >> >
>> > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
>> > bigStream)),
>> > >> >> > > > > smallerStream)
>> > >> >> > > > > >> >
>> > >> >> > > > > >> > The largest join can be done in parallel using an
>> > >> innerJoin.
>> > >> >> You
>> > >> >> > > can
>> > >> >> > > > > >> then
>> > >> >> > > > > >> > wrap the stream coming out of the parallel function
>> in
>> > an
>> > >> >> > > > > innerHashJoin
>> > >> >> > > > > >> to
>> > >> >> > > > > >> > join it to another stream.
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >> > Joel Bernstein
>> > >> >> > > > > >> > http://joelsolr.blogspot.com/
>> > >> >> > > > > >> >
>> > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
>> > >> >> > > > > >> edwinyeozl@gmail.com>
>> > >> >> > > > > >> > wrote:
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >> Hi,
>> > >> >> > > > > >> >>
>> > >> >> > > > > >> >> Is it possible to join more than 2 collections using
>> > one
>> > >> of
>> > >> >> the
>> > >> >> > > > > >> streaming
>> > >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there other
>> > ways
>> > >> we
>> > >> >> can
>> > >> >> > > do
>> > >> >> > > > > it?
>> > >> >> > > > > >> >>
>> > >> >> > > > > >> >> Currently, I may need to join 3 or 4 collections
>> > >> together,
>> > >> >> and
>> > >> >> > to
>> > >> >> > > > > >> output
>> > >> >> > > > > >> >> selected fields from all these collections together.
>> > >> >> > > > > >> >>
>> > >> >> > > > > >> >> I'm using Solr 6.4.2.
>> > >> >> > > > > >> >>
>> > >> >> > > > > >> >> Regards,
>> > >> >> > > > > >> >> Edwin
>> > >> >> > > > > >> >>
>> > >> >> > > > > >> >
>> > >> >> > > > > >> >
>> > >> >> > > > > >>
>> > >> >> > > > > >
>> > >> >> > > > > >
>> > >> >> > > > >
>> > >> >> > > >
>> > >> >> > >
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Joel,

I think that might be one of the reason.
This is what I have for the /export handler in my solrconfig.xml

<requestHandler name="/export" class="solr.SearchHandler"> <lst name=
"invariants"> <str name="rq">{!xport}</str> <str name="wt">xsort</str> <str
name="distrib">false</str> </lst> <arr name="components"> <str>query</str>
</arr> </requestHandler>

This is the error message that I get when I use the /export handler.

java.io.IOException: java.util.concurrent.ExecutionException:
java.io.IOException: -->
http://localhost:8983/solr/collection1_shard1_replica1/: An exception has
occurred on the server, refer to server log for details.
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:451)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:308)
at
org.apache.solr.client.solrj.io.stream.PushBackStream.open(PushBackStream.java:70)
at
org.apache.solr.client.solrj.io.stream.JoinStream.open(JoinStream.java:147)
at
org.apache.solr.client.solrj.io.stream.ExceptionStream.open(ExceptionStream.java:51)
at
org.apache.solr.handler.StreamHandler$TimerStream.open(StreamHandler.java:457)
at
org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:63)
at org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:193)
at
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:209)
at
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:325)
at
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:120)
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:71)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:732)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException:
--> http://localhost:8983/solr/collection1_shard1_replica1/: An exception
has occurred on the server, refer to server log for details.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:445)
... 42 more
Caused by: java.io.IOException: -->
http://localhost:8983/solr/collection1_shard1_replica1/: An exception has
occurred on the server, refer to server log for details.
at
org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:238)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream$TupleWrapper.next(CloudSolrStream.java:541)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream$StreamOpener.call(CloudSolrStream.java:564)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream$StreamOpener.call(CloudSolrStream.java:551)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: org.noggit.JSONParser$ParseException: JSON Parse Error:
char=<,position=0 BEFORE='<' AFTER='?xml version="1.0" encoding="UTF-8"?> <'
at org.noggit.JSONParser.err(JSONParser.java:356)
at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.java:712)
at org.noggit.JSONParser.next(JSONParser.java:886)
at org.noggit.JSONParser.nextEvent(JSONParser.java:930)
at
org.apache.solr.client.solrj.io.stream.JSONTupleStream.expect(JSONTupleStream.java:97)
at
org.apache.solr.client.solrj.io.stream.JSONTupleStream.advanceToDocs(JSONTupleStream.java:179)
at
org.apache.solr.client.solrj.io.stream.JSONTupleStream.next(JSONTupleStream.java:77)
at
org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:207)
... 8 more


Regards,
Edwin


On 4 May 2017 at 22:54, Joel Bernstein <jo...@gmail.com> wrote:

> I suspect that there is something not quite right about the how the /export
> handler is configured. Straight out of the box in solr 6.4.2  /export will
> be automatically configured. Are you using a Solr instance that has been
> upgraded in the past and doesn't have standard 6.4.2 configs?
>
> To really do joins properly you'll have to use the /export handler because
> /select will not stream entire result sets (unless they are pretty small).
> So your results will be missing data possibly.
>
> I would take a close look at the logs and see what all the exceptions are
> when you run the a search using qt=/export. If you can post all the stack
> traces that get generated when you run the search we'll probably be able to
> spot the issue.
>
> About the field ordering. There is support for field ordering in the
> Streaming classes but only a few places actually enforce the order. The 6.5
> SQL interface does keep the fields in order as does the new Tuple
> expression in Solr 6.6. But the expressions you are working with currently
> don't enforce field ordering.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
>
> > Hi Joel,
> >
> > I have managed to get the Join to work, but so far it is only working
> when
> > I use qt="/select". It is not working when I use qt="/export".
> >
> > For the display of the field, is there a way to allow it to list them in
> > the order which I want?
> > Currently, the display is quite random, and I can get a field in
> > collection1, followed by a field in collection3, then collection1 again,
> > and then collection2.
> >
> > It will be good if we can arrange the field to display in the order that
> we
> > want.
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> >
> > > Hi Joel,
> > >
> > > It works when I started off with just one expression.
> > >
> > > Could it be that the data size is too big for export after the join,
> > which
> > > causes the error?
> > >
> > > Regards,
> > > Edwin
> > >
> > > On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com> wrote:
> > >
> > >> I was just testing with the query below and it worked for me. Some of
> > the
> > >> error messages I was getting with the syntax was not what I was
> > expecting
> > >> though, so I'll look into the error handling. But the joins do work
> when
> > >> the syntax correct. The query below is joining to the same collection
> > >> three
> > >> times, but the mechanics are exactly the same joining three different
> > >> tables. In this example each join narrows down the result set.
> > >>
> > >> hashJoin(parallel(collection2,
> > >>                             workers=3,
> > >>                             sort="id asc",
> > >>                             innerJoin(search(collection2, q="*:*",
> > >> fl="id",
> > >> sort="id asc", qt="/export", partitionKeys="id"),
> > >>                                             search(collection2,
> > >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> > >> partitionKeys="id"),
> > >>                                             on="id")),
> > >>                 hashed=search(collection2, q="day_i:7", fl="id,
> day_i",
> > >> sort="id asc", qt="/export"),
> > >>                 on="id")
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <jo...@gmail.com>
> > >> wrote:
> > >>
> > >> > Start off with just this expression:
> > >> >
> > >> > search(collection2,
> > >> >             q=*:*,
> > >> >             fl="a_s,b_s,c_s,d_s,e_s",
> > >> >             sort="a_s asc",
> > >> >             qt="/export")
> > >> >
> > >> > And then check the logs for exceptions.
> > >> >
> > >> > Joel Bernstein
> > >> > http://joelsolr.blogspot.com/
> > >> >
> > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> > >> edwinyeozl@gmail.com
> > >> > > wrote:
> > >> >
> > >> >> Hi Joel,
> > >> >>
> > >> >> I am getting this error after I change add qt=/export and removed
> the
> > >> rows
> > >> >> param. Do you know what could be the reason?
> > >> >>
> > >> >> {
> > >> >>   "error":{
> > >> >>     "metadata":[
> > >> >>       "error-class","org.apache.solr.common.SolrException",
> > >> >>       "root-error-class","org.apache.http.MalformedChunkCodingExce
> > >> >> ption"],
> > >> >>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF
> > >> expected
> > >> >> at
> > >> >> end of chunk",
> > >> >>     "trace":"org.apache.solr.common.SolrException:
> > >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at
> end
> > of
> > >> >> chunk\r\n\tat
> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> > >> >> iteMap$0(TupleStream.java:79)\r\n\tat
> > >> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
> > >> >> seWriter.java:523)\r\n\tat
> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> > >> >> ponseWriter.java:175)\r\n\tat
> > >> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
> > >> >> .java:559)\r\n\tat
> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
> > >> >> TupleStream.java:64)\r\n\tat
> > >> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
> > >> >> ter.java:547)\r\n\tat
> > >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> > >> >> ponseWriter.java:193)\r\n\tat
> > >> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
> > >> >> ups(JSONResponseWriter.java:209)\r\n\tat
> > >> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
> > >> >> nseWriter.java:325)\r\n\tat
> > >> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
> > >> >> seWriter.java:120)\r\n\tat
> > >> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
> > >> >> seWriter.java:71)\r\n\tat
> > >> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> > >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> > >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
> > >> >> all.java:732)\r\n\tat
> > >> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
> > >> 473)\r\n\tat
> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> > >> >> atchFilter.java:345)\r\n\tat
> > >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> > >> >> atchFilter.java:296)\r\n\tat
> > >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
> > >> >> r(ServletHandler.java:1691)\r\n\tat
> > >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
> > >> >> dler.java:582)\r\n\tat
> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> > >> >> Handler.java:143)\r\n\tat
> > >> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
> > >> >> ndler.java:548)\r\n\tat
> > >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
> > >> >> SessionHandler.java:226)\r\n\tat
> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> > >> >> ContextHandler.java:1180)\r\n\tat
> > >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
> > >> >> ler.java:512)\r\n\tat
> > >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> > >> >> SessionHandler.java:185)\r\n\tat
> > >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> > >> >> ContextHandler.java:1112)\r\n\tat
> > >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> > >> >> Handler.java:141)\r\n\tat
> > >> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
> > >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> > >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
> > >> >> HandlerCollection.java:119)\r\n\tat
> > >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
> > >> >> erWrapper.java:134)\r\n\tat
> > >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
> > >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> > >> java:320)\r\n\tat
> > >> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
> > >> >> ction.java:251)\r\n\tat
> > >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> > >> >> succeeded(AbstractConnection.java:273)\r\n\tat
> > >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> > >> java:95)\r\n\tat
> > >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
> > >> >> elEndPoint.java:93)\r\n\tat
> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> > >> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> > >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> > >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> > >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
> > >> >> ThreadPool.java:671)\r\n\tat
> > >> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
> > >> >> hreadPool.java:589)\r\n\tat
> > >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> > >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at
> end
> > of
> > >> >> chunk\r\n\tat
> > >> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
> > >> >> kedInputStream.java:255)\r\n\tat
> > >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
> > >> >> InputStream.java:227)\r\n\tat
> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> > >> >> Stream.java:186)\r\n\tat
> > >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> > >> >> Stream.java:215)\r\n\tat
> > >> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
> > >> >> tStream.java:316)\r\n\tat
> > >> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
> > >> >> nagedEntity.java:164)\r\n\tat
> > >> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
> > >> >> orInputStream.java:228)\r\n\tat
> > >> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
> > >> >> utStream.java:174)\r\n\tat
> > >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
> > >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
> > >> >> java.io.InputStreamReader.close(InputStreamReader.java:
> 199)\r\n\tat
> > >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
> > >> >> (JSONTupleStream.java:92)\r\n\tat
> > >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
> > >> >> Stream.java:193)\r\n\tat
> > >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
> > >> >> (CloudSolrStream.java:464)\r\n\tat
> > >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
> > >> >> HashJoinStream.java:231)\r\n\tat
> > >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
> > >> >> (ExceptionStream.java:93)\r\n\tat
> > >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> > >> >> StreamHandler.java:452)\r\n\tat
> > >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> > >> >> iteMap$0(TupleStream.java:71)\r\n\t...
> > >> >> 40 more\r\n",
> > >> >>     "code":500}}
> > >> >>
> > >> >>
> > >> >> Regards,
> > >> >> Edwin
> > >> >>
> > >> >>
> > >> >> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com> wrote:
> > >> >>
> > >> >> > I've reformatted the expression below and made a few changes. You
> > >> have
> > >> >> put
> > >> >> > things together properly. But these are MapReduce joins that
> > require
> > >> >> > exporting the entire result sets. So you will need to add
> > qt=/export
> > >> to
> > >> >> all
> > >> >> > the searches and remove the rows param. In Solr 6.6. there is a
> new
> > >> >> > "shuffle" expression that does this automatically.
> > >> >> >
> > >> >> > To test things you'll want to break down each expression and make
> > >> sure
> > >> >> it's
> > >> >> > behaving as expected.
> > >> >> >
> > >> >> > For example first run each search. Then run the innerJoin, not in
> > >> >> parallel
> > >> >> > mode. Then run it in parallel mode. Then try the whole thing.
> > >> >> >
> > >> >> > hashJoin(parallel(collection2,
> > >> >> >                             innerJoin(search(collection2,
> > >> >> >                                                        q=*:*,
> > >> >> >
> > >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> > >> >> >                                                        sort="a_s
> > >> asc",
> > >> >> >
> > >> >> partitionKeys="a_s",
> > >> >> >
> > qt="/export"),
> > >> >> >                                            search(collection1,
> > >> >> >                                                        q=*:*,
> > >> >> >
> > >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > >> >> >                                                        sort="a_s
> > >> asc",
> > >> >> >
> > >> >>  partitionKeys="a_s",
> > >> >> >
> >  qt="/export"),
> > >> >> >                                            on="a_s"),
> > >> >> >                              workers="2",
> > >> >> >                              sort="a_s asc"),
> > >> >> >                hashed=search(collection3,
> > >> >> >                                          q=*:*,
> > >> >> >                                          fl="a_s,k_s,l_s",
> > >> >> >                                          sort="a_s asc",
> > >> >> >                                          qt="/export"),
> > >> >> >               on="a_s")
> > >> >> >
> > >> >> > Joel Bernstein
> > >> >> > http://joelsolr.blogspot.com/
> > >> >> >
> > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> > >> >> edwinyeozl@gmail.com
> > >> >> > >
> > >> >> > wrote:
> > >> >> >
> > >> >> > > Hi Joel,
> > >> >> > >
> > >> >> > > Thanks for the clarification.
> > >> >> > >
> > >> >> > > Would like to check, is this the correct way to do the join?
> > >> >> Currently, I
> > >> >> > > could not get any results after putting in the hashJoin for the
> > >> 3rd,
> > >> >> > > smallerStream collection (collection3).
> > >> >> > >
> > >> >> > > http://localhost:8983/solr/collection1/stream?expr=
> > >> >> > > hashJoin(parallel(collection2
> > >> >> > > ,
> > >> >> > > innerJoin(
> > >> >> > >  search(collection2,
> > >> >> > > q=*:*,
> > >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> > >> >> > >              sort="a_s asc",
> > >> >> > > partitionKeys="a_s",
> > >> >> > > rows=200),
> > >> >> > >  search(collection1,
> > >> >> > > q=*:*,
> > >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > >> >> > >              sort="a_s asc",
> > >> >> > > partitionKeys="a_s",
> > >> >> > > rows=200),
> > >> >> > >          on="a_s"),
> > >> >> > > workers="2",
> > >> >> > >                  sort="a_s asc"),
> > >> >> > >          hashed=search(collection3,
> > >> >> > > q=*:*,
> > >> >> > > fl="a_s,k_s,l_s",
> > >> >> > > sort="a_s asc",
> > >> >> > > rows=200),
> > >> >> > > on="a_s")
> > >> >> > > &indent=true
> > >> >> > >
> > >> >> > >
> > >> >> > > Regards,
> > >> >> > > Edwin
> > >> >> > >
> > >> >> > >
> > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com>
> > wrote:
> > >> >> > >
> > >> >> > > > Sorry, it's just called hashJoin
> > >> >> > > >
> > >> >> > > > Joel Bernstein
> > >> >> > > > http://joelsolr.blogspot.com/
> > >> >> > > >
> > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> > >> >> > > edwinyeozl@gmail.com>
> > >> >> > > > wrote:
> > >> >> > > >
> > >> >> > > > > Hi Joel,
> > >> >> > > > >
> > >> >> > > > > I am getting this error when I used the innerHashJoin.
> > >> >> > > > >
> > >> >> > > > >  "EXCEPTION":"Invalid stream expression
> > innerHashJoin(parallel(
> > >> >> > > innerJoin
> > >> >> > > > >
> > >> >> > > > > I also can't find the documentation on innerHashJoin for
> the
> > >> >> > Streaming
> > >> >> > > > > Expressions.
> > >> >> > > > >
> > >> >> > > > > Are you referring to hashJoin?
> > >> >> > > > >
> > >> >> > > > > Regards,
> > >> >> > > > > Edwin
> > >> >> > > > >
> > >> >> > > > >
> > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> > >> edwinyeozl@gmail.com
> > >> >> >
> > >> >> > > > wrote:
> > >> >> > > > >
> > >> >> > > > > > Hi Joel,
> > >> >> > > > > >
> > >> >> > > > > > Thanks for the info.
> > >> >> > > > > >
> > >> >> > > > > > Regards,
> > >> >> > > > > > Edwin
> > >> >> > > > > >
> > >> >> > > > > >
> > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <
> joelsolr@gmail.com
> > >
> > >> >> wrote:
> > >> >> > > > > >
> > >> >> > > > > >> Also take a look at the documentation for the "fetch"
> > >> streaming
> > >> >> > > > > >> expression.
> > >> >> > > > > >>
> > >> >> > > > > >> Joel Bernstein
> > >> >> > > > > >> http://joelsolr.blogspot.com/
> > >> >> > > > > >>
> > >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> > >> >> > joelsolr@gmail.com>
> > >> >> > > > > >> wrote:
> > >> >> > > > > >>
> > >> >> > > > > >> > Yes you join more then one collection with Streaming
> > >> >> > Expressions.
> > >> >> > > > Here
> > >> >> > > > > >> are
> > >> >> > > > > >> > a few things to keep in mind.
> > >> >> > > > > >> >
> > >> >> > > > > >> > * You'll likely want to use the parallel function
> around
> > >> the
> > >> >> > > largest
> > >> >> > > > > >> join.
> > >> >> > > > > >> > You'll need to use the join keys as the partitionKeys.
> > >> >> > > > > >> > * innerJoin: requires that the streams be sorted on
> the
> > >> join
> > >> >> > keys.
> > >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> > >> >> > > > > >> >
> > >> >> > > > > >> > So a strategy for a three collection join might look
> > like
> > >> >> this:
> > >> >> > > > > >> >
> > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
> > bigStream)),
> > >> >> > > > > smallerStream)
> > >> >> > > > > >> >
> > >> >> > > > > >> > The largest join can be done in parallel using an
> > >> innerJoin.
> > >> >> You
> > >> >> > > can
> > >> >> > > > > >> then
> > >> >> > > > > >> > wrap the stream coming out of the parallel function in
> > an
> > >> >> > > > > innerHashJoin
> > >> >> > > > > >> to
> > >> >> > > > > >> > join it to another stream.
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >> > Joel Bernstein
> > >> >> > > > > >> > http://joelsolr.blogspot.com/
> > >> >> > > > > >> >
> > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
> > >> >> > > > > >> edwinyeozl@gmail.com>
> > >> >> > > > > >> > wrote:
> > >> >> > > > > >> >
> > >> >> > > > > >> >> Hi,
> > >> >> > > > > >> >>
> > >> >> > > > > >> >> Is it possible to join more than 2 collections using
> > one
> > >> of
> > >> >> the
> > >> >> > > > > >> streaming
> > >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there other
> > ways
> > >> we
> > >> >> can
> > >> >> > > do
> > >> >> > > > > it?
> > >> >> > > > > >> >>
> > >> >> > > > > >> >> Currently, I may need to join 3 or 4 collections
> > >> together,
> > >> >> and
> > >> >> > to
> > >> >> > > > > >> output
> > >> >> > > > > >> >> selected fields from all these collections together.
> > >> >> > > > > >> >>
> > >> >> > > > > >> >> I'm using Solr 6.4.2.
> > >> >> > > > > >> >>
> > >> >> > > > > >> >> Regards,
> > >> >> > > > > >> >> Edwin
> > >> >> > > > > >> >>
> > >> >> > > > > >> >
> > >> >> > > > > >> >
> > >> >> > > > > >>
> > >> >> > > > > >
> > >> >> > > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Joining more than 2 collections

Posted by Joel Bernstein <jo...@gmail.com>.
I suspect that there is something not quite right about the how the /export
handler is configured. Straight out of the box in solr 6.4.2  /export will
be automatically configured. Are you using a Solr instance that has been
upgraded in the past and doesn't have standard 6.4.2 configs?

To really do joins properly you'll have to use the /export handler because
/select will not stream entire result sets (unless they are pretty small).
So your results will be missing data possibly.

I would take a close look at the logs and see what all the exceptions are
when you run the a search using qt=/export. If you can post all the stack
traces that get generated when you run the search we'll probably be able to
spot the issue.

About the field ordering. There is support for field ordering in the
Streaming classes but only a few places actually enforce the order. The 6.5
SQL interface does keep the fields in order as does the new Tuple
expression in Solr 6.6. But the expressions you are working with currently
don't enforce field ordering.




Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 4, 2017 at 2:41 AM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi Joel,
>
> I have managed to get the Join to work, but so far it is only working when
> I use qt="/select". It is not working when I use qt="/export".
>
> For the display of the field, is there a way to allow it to list them in
> the order which I want?
> Currently, the display is quite random, and I can get a field in
> collection1, followed by a field in collection3, then collection1 again,
> and then collection2.
>
> It will be good if we can arrange the field to display in the order that we
> want.
>
> Regards,
> Edwin
>
>
>
> On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
>
> > Hi Joel,
> >
> > It works when I started off with just one expression.
> >
> > Could it be that the data size is too big for export after the join,
> which
> > causes the error?
> >
> > Regards,
> > Edwin
> >
> > On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com> wrote:
> >
> >> I was just testing with the query below and it worked for me. Some of
> the
> >> error messages I was getting with the syntax was not what I was
> expecting
> >> though, so I'll look into the error handling. But the joins do work when
> >> the syntax correct. The query below is joining to the same collection
> >> three
> >> times, but the mechanics are exactly the same joining three different
> >> tables. In this example each join narrows down the result set.
> >>
> >> hashJoin(parallel(collection2,
> >>                             workers=3,
> >>                             sort="id asc",
> >>                             innerJoin(search(collection2, q="*:*",
> >> fl="id",
> >> sort="id asc", qt="/export", partitionKeys="id"),
> >>                                             search(collection2,
> >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> >> partitionKeys="id"),
> >>                                             on="id")),
> >>                 hashed=search(collection2, q="day_i:7", fl="id, day_i",
> >> sort="id asc", qt="/export"),
> >>                 on="id")
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <jo...@gmail.com>
> >> wrote:
> >>
> >> > Start off with just this expression:
> >> >
> >> > search(collection2,
> >> >             q=*:*,
> >> >             fl="a_s,b_s,c_s,d_s,e_s",
> >> >             sort="a_s asc",
> >> >             qt="/export")
> >> >
> >> > And then check the logs for exceptions.
> >> >
> >> > Joel Bernstein
> >> > http://joelsolr.blogspot.com/
> >> >
> >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> >> edwinyeozl@gmail.com
> >> > > wrote:
> >> >
> >> >> Hi Joel,
> >> >>
> >> >> I am getting this error after I change add qt=/export and removed the
> >> rows
> >> >> param. Do you know what could be the reason?
> >> >>
> >> >> {
> >> >>   "error":{
> >> >>     "metadata":[
> >> >>       "error-class","org.apache.solr.common.SolrException",
> >> >>       "root-error-class","org.apache.http.MalformedChunkCodingExce
> >> >> ption"],
> >> >>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF
> >> expected
> >> >> at
> >> >> end of chunk",
> >> >>     "trace":"org.apache.solr.common.SolrException:
> >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end
> of
> >> >> chunk\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> >> >> iteMap$0(TupleStream.java:79)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
> >> >> seWriter.java:523)\r\n\tat
> >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> >> >> ponseWriter.java:175)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
> >> >> .java:559)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
> >> >> TupleStream.java:64)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
> >> >> ter.java:547)\r\n\tat
> >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> >> >> ponseWriter.java:193)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
> >> >> ups(JSONResponseWriter.java:209)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
> >> >> nseWriter.java:325)\r\n\tat
> >> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
> >> >> seWriter.java:120)\r\n\tat
> >> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
> >> >> seWriter.java:71)\r\n\tat
> >> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
> >> >> all.java:732)\r\n\tat
> >> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
> >> 473)\r\n\tat
> >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> >> >> atchFilter.java:345)\r\n\tat
> >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> >> >> atchFilter.java:296)\r\n\tat
> >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
> >> >> r(ServletHandler.java:1691)\r\n\tat
> >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
> >> >> dler.java:582)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> >> >> Handler.java:143)\r\n\tat
> >> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
> >> >> ndler.java:548)\r\n\tat
> >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
> >> >> SessionHandler.java:226)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> >> >> ContextHandler.java:1180)\r\n\tat
> >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
> >> >> ler.java:512)\r\n\tat
> >> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> >> >> SessionHandler.java:185)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> >> >> ContextHandler.java:1112)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> >> >> Handler.java:141)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
> >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
> >> >> HandlerCollection.java:119)\r\n\tat
> >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
> >> >> erWrapper.java:134)\r\n\tat
> >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
> >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
> >> java:320)\r\n\tat
> >> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
> >> >> ction.java:251)\r\n\tat
> >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> >> >> succeeded(AbstractConnection.java:273)\r\n\tat
> >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
> >> java:95)\r\n\tat
> >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
> >> >> elEndPoint.java:93)\r\n\tat
> >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
> >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> >> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
> >> >> ThreadPool.java:671)\r\n\tat
> >> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
> >> >> hreadPool.java:589)\r\n\tat
> >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end
> of
> >> >> chunk\r\n\tat
> >> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
> >> >> kedInputStream.java:255)\r\n\tat
> >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
> >> >> InputStream.java:227)\r\n\tat
> >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> >> >> Stream.java:186)\r\n\tat
> >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> >> >> Stream.java:215)\r\n\tat
> >> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
> >> >> tStream.java:316)\r\n\tat
> >> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
> >> >> nagedEntity.java:164)\r\n\tat
> >> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
> >> >> orInputStream.java:228)\r\n\tat
> >> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
> >> >> utStream.java:174)\r\n\tat
> >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
> >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
> >> >> java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
> >> >> (JSONTupleStream.java:92)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
> >> >> Stream.java:193)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
> >> >> (CloudSolrStream.java:464)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
> >> >> HashJoinStream.java:231)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
> >> >> (ExceptionStream.java:93)\r\n\tat
> >> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> >> >> StreamHandler.java:452)\r\n\tat
> >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> >> >> iteMap$0(TupleStream.java:71)\r\n\t...
> >> >> 40 more\r\n",
> >> >>     "code":500}}
> >> >>
> >> >>
> >> >> Regards,
> >> >> Edwin
> >> >>
> >> >>
> >> >> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com> wrote:
> >> >>
> >> >> > I've reformatted the expression below and made a few changes. You
> >> have
> >> >> put
> >> >> > things together properly. But these are MapReduce joins that
> require
> >> >> > exporting the entire result sets. So you will need to add
> qt=/export
> >> to
> >> >> all
> >> >> > the searches and remove the rows param. In Solr 6.6. there is a new
> >> >> > "shuffle" expression that does this automatically.
> >> >> >
> >> >> > To test things you'll want to break down each expression and make
> >> sure
> >> >> it's
> >> >> > behaving as expected.
> >> >> >
> >> >> > For example first run each search. Then run the innerJoin, not in
> >> >> parallel
> >> >> > mode. Then run it in parallel mode. Then try the whole thing.
> >> >> >
> >> >> > hashJoin(parallel(collection2,
> >> >> >                             innerJoin(search(collection2,
> >> >> >                                                        q=*:*,
> >> >> >
> >> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> >> >> >                                                        sort="a_s
> >> asc",
> >> >> >
> >> >> partitionKeys="a_s",
> >> >> >
> qt="/export"),
> >> >> >                                            search(collection1,
> >> >> >                                                        q=*:*,
> >> >> >
> >> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> >> >                                                        sort="a_s
> >> asc",
> >> >> >
> >> >>  partitionKeys="a_s",
> >> >> >
>  qt="/export"),
> >> >> >                                            on="a_s"),
> >> >> >                              workers="2",
> >> >> >                              sort="a_s asc"),
> >> >> >                hashed=search(collection3,
> >> >> >                                          q=*:*,
> >> >> >                                          fl="a_s,k_s,l_s",
> >> >> >                                          sort="a_s asc",
> >> >> >                                          qt="/export"),
> >> >> >               on="a_s")
> >> >> >
> >> >> > Joel Bernstein
> >> >> > http://joelsolr.blogspot.com/
> >> >> >
> >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> >> >> edwinyeozl@gmail.com
> >> >> > >
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Joel,
> >> >> > >
> >> >> > > Thanks for the clarification.
> >> >> > >
> >> >> > > Would like to check, is this the correct way to do the join?
> >> >> Currently, I
> >> >> > > could not get any results after putting in the hashJoin for the
> >> 3rd,
> >> >> > > smallerStream collection (collection3).
> >> >> > >
> >> >> > > http://localhost:8983/solr/collection1/stream?expr=
> >> >> > > hashJoin(parallel(collection2
> >> >> > > ,
> >> >> > > innerJoin(
> >> >> > >  search(collection2,
> >> >> > > q=*:*,
> >> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> >> >> > >              sort="a_s asc",
> >> >> > > partitionKeys="a_s",
> >> >> > > rows=200),
> >> >> > >  search(collection1,
> >> >> > > q=*:*,
> >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> >> > >              sort="a_s asc",
> >> >> > > partitionKeys="a_s",
> >> >> > > rows=200),
> >> >> > >          on="a_s"),
> >> >> > > workers="2",
> >> >> > >                  sort="a_s asc"),
> >> >> > >          hashed=search(collection3,
> >> >> > > q=*:*,
> >> >> > > fl="a_s,k_s,l_s",
> >> >> > > sort="a_s asc",
> >> >> > > rows=200),
> >> >> > > on="a_s")
> >> >> > > &indent=true
> >> >> > >
> >> >> > >
> >> >> > > Regards,
> >> >> > > Edwin
> >> >> > >
> >> >> > >
> >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com>
> wrote:
> >> >> > >
> >> >> > > > Sorry, it's just called hashJoin
> >> >> > > >
> >> >> > > > Joel Bernstein
> >> >> > > > http://joelsolr.blogspot.com/
> >> >> > > >
> >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> >> >> > > edwinyeozl@gmail.com>
> >> >> > > > wrote:
> >> >> > > >
> >> >> > > > > Hi Joel,
> >> >> > > > >
> >> >> > > > > I am getting this error when I used the innerHashJoin.
> >> >> > > > >
> >> >> > > > >  "EXCEPTION":"Invalid stream expression
> innerHashJoin(parallel(
> >> >> > > innerJoin
> >> >> > > > >
> >> >> > > > > I also can't find the documentation on innerHashJoin for the
> >> >> > Streaming
> >> >> > > > > Expressions.
> >> >> > > > >
> >> >> > > > > Are you referring to hashJoin?
> >> >> > > > >
> >> >> > > > > Regards,
> >> >> > > > > Edwin
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> >> edwinyeozl@gmail.com
> >> >> >
> >> >> > > > wrote:
> >> >> > > > >
> >> >> > > > > > Hi Joel,
> >> >> > > > > >
> >> >> > > > > > Thanks for the info.
> >> >> > > > > >
> >> >> > > > > > Regards,
> >> >> > > > > > Edwin
> >> >> > > > > >
> >> >> > > > > >
> >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <joelsolr@gmail.com
> >
> >> >> wrote:
> >> >> > > > > >
> >> >> > > > > >> Also take a look at the documentation for the "fetch"
> >> streaming
> >> >> > > > > >> expression.
> >> >> > > > > >>
> >> >> > > > > >> Joel Bernstein
> >> >> > > > > >> http://joelsolr.blogspot.com/
> >> >> > > > > >>
> >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> >> >> > joelsolr@gmail.com>
> >> >> > > > > >> wrote:
> >> >> > > > > >>
> >> >> > > > > >> > Yes you join more then one collection with Streaming
> >> >> > Expressions.
> >> >> > > > Here
> >> >> > > > > >> are
> >> >> > > > > >> > a few things to keep in mind.
> >> >> > > > > >> >
> >> >> > > > > >> > * You'll likely want to use the parallel function around
> >> the
> >> >> > > largest
> >> >> > > > > >> join.
> >> >> > > > > >> > You'll need to use the join keys as the partitionKeys.
> >> >> > > > > >> > * innerJoin: requires that the streams be sorted on the
> >> join
> >> >> > keys.
> >> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> >> >> > > > > >> >
> >> >> > > > > >> > So a strategy for a three collection join might look
> like
> >> >> this:
> >> >> > > > > >> >
> >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream,
> bigStream)),
> >> >> > > > > smallerStream)
> >> >> > > > > >> >
> >> >> > > > > >> > The largest join can be done in parallel using an
> >> innerJoin.
> >> >> You
> >> >> > > can
> >> >> > > > > >> then
> >> >> > > > > >> > wrap the stream coming out of the parallel function in
> an
> >> >> > > > > innerHashJoin
> >> >> > > > > >> to
> >> >> > > > > >> > join it to another stream.
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >> > Joel Bernstein
> >> >> > > > > >> > http://joelsolr.blogspot.com/
> >> >> > > > > >> >
> >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
> >> >> > > > > >> edwinyeozl@gmail.com>
> >> >> > > > > >> > wrote:
> >> >> > > > > >> >
> >> >> > > > > >> >> Hi,
> >> >> > > > > >> >>
> >> >> > > > > >> >> Is it possible to join more than 2 collections using
> one
> >> of
> >> >> the
> >> >> > > > > >> streaming
> >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there other
> ways
> >> we
> >> >> can
> >> >> > > do
> >> >> > > > > it?
> >> >> > > > > >> >>
> >> >> > > > > >> >> Currently, I may need to join 3 or 4 collections
> >> together,
> >> >> and
> >> >> > to
> >> >> > > > > >> output
> >> >> > > > > >> >> selected fields from all these collections together.
> >> >> > > > > >> >>
> >> >> > > > > >> >> I'm using Solr 6.4.2.
> >> >> > > > > >> >>
> >> >> > > > > >> >> Regards,
> >> >> > > > > >> >> Edwin
> >> >> > > > > >> >>
> >> >> > > > > >> >
> >> >> > > > > >> >
> >> >> > > > > >>
> >> >> > > > > >
> >> >> > > > > >
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Joel,

I have managed to get the Join to work, but so far it is only working when
I use qt="/select". It is not working when I use qt="/export".

For the display of the field, is there a way to allow it to list them in
the order which I want?
Currently, the display is quite random, and I can get a field in
collection1, followed by a field in collection3, then collection1 again,
and then collection2.

It will be good if we can arrange the field to display in the order that we
want.

Regards,
Edwin



On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:

> Hi Joel,
>
> It works when I started off with just one expression.
>
> Could it be that the data size is too big for export after the join, which
> causes the error?
>
> Regards,
> Edwin
>
> On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com> wrote:
>
>> I was just testing with the query below and it worked for me. Some of the
>> error messages I was getting with the syntax was not what I was expecting
>> though, so I'll look into the error handling. But the joins do work when
>> the syntax correct. The query below is joining to the same collection
>> three
>> times, but the mechanics are exactly the same joining three different
>> tables. In this example each join narrows down the result set.
>>
>> hashJoin(parallel(collection2,
>>                             workers=3,
>>                             sort="id asc",
>>                             innerJoin(search(collection2, q="*:*",
>> fl="id",
>> sort="id asc", qt="/export", partitionKeys="id"),
>>                                             search(collection2,
>> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
>> partitionKeys="id"),
>>                                             on="id")),
>>                 hashed=search(collection2, q="day_i:7", fl="id, day_i",
>> sort="id asc", qt="/export"),
>>                 on="id")
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <jo...@gmail.com>
>> wrote:
>>
>> > Start off with just this expression:
>> >
>> > search(collection2,
>> >             q=*:*,
>> >             fl="a_s,b_s,c_s,d_s,e_s",
>> >             sort="a_s asc",
>> >             qt="/export")
>> >
>> > And then check the logs for exceptions.
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
>> edwinyeozl@gmail.com
>> > > wrote:
>> >
>> >> Hi Joel,
>> >>
>> >> I am getting this error after I change add qt=/export and removed the
>> rows
>> >> param. Do you know what could be the reason?
>> >>
>> >> {
>> >>   "error":{
>> >>     "metadata":[
>> >>       "error-class","org.apache.solr.common.SolrException",
>> >>       "root-error-class","org.apache.http.MalformedChunkCodingExce
>> >> ption"],
>> >>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF
>> expected
>> >> at
>> >> end of chunk",
>> >>     "trace":"org.apache.solr.common.SolrException:
>> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
>> >> chunk\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> >> iteMap$0(TupleStream.java:79)\r\n\tat
>> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
>> >> seWriter.java:523)\r\n\tat
>> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> >> ponseWriter.java:175)\r\n\tat
>> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
>> >> .java:559)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
>> >> TupleStream.java:64)\r\n\tat
>> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
>> >> ter.java:547)\r\n\tat
>> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> >> ponseWriter.java:193)\r\n\tat
>> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
>> >> ups(JSONResponseWriter.java:209)\r\n\tat
>> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
>> >> nseWriter.java:325)\r\n\tat
>> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
>> >> seWriter.java:120)\r\n\tat
>> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
>> >> seWriter.java:71)\r\n\tat
>> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
>> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
>> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
>> >> all.java:732)\r\n\tat
>> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:
>> 473)\r\n\tat
>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> >> atchFilter.java:345)\r\n\tat
>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> >> atchFilter.java:296)\r\n\tat
>> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>> >> r(ServletHandler.java:1691)\r\n\tat
>> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
>> >> dler.java:582)\r\n\tat
>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> >> Handler.java:143)\r\n\tat
>> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
>> >> ndler.java:548)\r\n\tat
>> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
>> >> SessionHandler.java:226)\r\n\tat
>> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>> >> ContextHandler.java:1180)\r\n\tat
>> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
>> >> ler.java:512)\r\n\tat
>> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
>> >> SessionHandler.java:185)\r\n\tat
>> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>> >> ContextHandler.java:1112)\r\n\tat
>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> >> Handler.java:141)\r\n\tat
>> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
>> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
>> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>> >> HandlerCollection.java:119)\r\n\tat
>> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
>> >> erWrapper.java:134)\r\n\tat
>> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
>> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.
>> java:320)\r\n\tat
>> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
>> >> ction.java:251)\r\n\tat
>> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
>> >> succeeded(AbstractConnection.java:273)\r\n\tat
>> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.
>> java:95)\r\n\tat
>> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
>> >> elEndPoint.java:93)\r\n\tat
>> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
>> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
>> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
>> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
>> >> ThreadPool.java:671)\r\n\tat
>> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
>> >> hreadPool.java:589)\r\n\tat
>> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
>> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
>> >> chunk\r\n\tat
>> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
>> >> kedInputStream.java:255)\r\n\tat
>> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
>> >> InputStream.java:227)\r\n\tat
>> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> >> Stream.java:186)\r\n\tat
>> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> >> Stream.java:215)\r\n\tat
>> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
>> >> tStream.java:316)\r\n\tat
>> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
>> >> nagedEntity.java:164)\r\n\tat
>> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
>> >> orInputStream.java:228)\r\n\tat
>> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
>> >> utStream.java:174)\r\n\tat
>> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
>> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
>> >> java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
>> >> (JSONTupleStream.java:92)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
>> >> Stream.java:193)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
>> >> (CloudSolrStream.java:464)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
>> >> HashJoinStream.java:231)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
>> >> (ExceptionStream.java:93)\r\n\tat
>> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
>> >> StreamHandler.java:452)\r\n\tat
>> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> >> iteMap$0(TupleStream.java:71)\r\n\t...
>> >> 40 more\r\n",
>> >>     "code":500}}
>> >>
>> >>
>> >> Regards,
>> >> Edwin
>> >>
>> >>
>> >> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com> wrote:
>> >>
>> >> > I've reformatted the expression below and made a few changes. You
>> have
>> >> put
>> >> > things together properly. But these are MapReduce joins that require
>> >> > exporting the entire result sets. So you will need to add qt=/export
>> to
>> >> all
>> >> > the searches and remove the rows param. In Solr 6.6. there is a new
>> >> > "shuffle" expression that does this automatically.
>> >> >
>> >> > To test things you'll want to break down each expression and make
>> sure
>> >> it's
>> >> > behaving as expected.
>> >> >
>> >> > For example first run each search. Then run the innerJoin, not in
>> >> parallel
>> >> > mode. Then run it in parallel mode. Then try the whole thing.
>> >> >
>> >> > hashJoin(parallel(collection2,
>> >> >                             innerJoin(search(collection2,
>> >> >                                                        q=*:*,
>> >> >
>> >> >  fl="a_s,b_s,c_s,d_s,e_s",
>> >> >                                                        sort="a_s
>> asc",
>> >> >
>> >> partitionKeys="a_s",
>> >> >                                                        qt="/export"),
>> >> >                                            search(collection1,
>> >> >                                                        q=*:*,
>> >> >
>> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> >> >                                                        sort="a_s
>> asc",
>> >> >
>> >>  partitionKeys="a_s",
>> >> >                                                       qt="/export"),
>> >> >                                            on="a_s"),
>> >> >                              workers="2",
>> >> >                              sort="a_s asc"),
>> >> >                hashed=search(collection3,
>> >> >                                          q=*:*,
>> >> >                                          fl="a_s,k_s,l_s",
>> >> >                                          sort="a_s asc",
>> >> >                                          qt="/export"),
>> >> >               on="a_s")
>> >> >
>> >> > Joel Bernstein
>> >> > http://joelsolr.blogspot.com/
>> >> >
>> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
>> >> edwinyeozl@gmail.com
>> >> > >
>> >> > wrote:
>> >> >
>> >> > > Hi Joel,
>> >> > >
>> >> > > Thanks for the clarification.
>> >> > >
>> >> > > Would like to check, is this the correct way to do the join?
>> >> Currently, I
>> >> > > could not get any results after putting in the hashJoin for the
>> 3rd,
>> >> > > smallerStream collection (collection3).
>> >> > >
>> >> > > http://localhost:8983/solr/collection1/stream?expr=
>> >> > > hashJoin(parallel(collection2
>> >> > > ,
>> >> > > innerJoin(
>> >> > >  search(collection2,
>> >> > > q=*:*,
>> >> > > fl="a_s,b_s,c_s,d_s,e_s",
>> >> > >              sort="a_s asc",
>> >> > > partitionKeys="a_s",
>> >> > > rows=200),
>> >> > >  search(collection1,
>> >> > > q=*:*,
>> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> >> > >              sort="a_s asc",
>> >> > > partitionKeys="a_s",
>> >> > > rows=200),
>> >> > >          on="a_s"),
>> >> > > workers="2",
>> >> > >                  sort="a_s asc"),
>> >> > >          hashed=search(collection3,
>> >> > > q=*:*,
>> >> > > fl="a_s,k_s,l_s",
>> >> > > sort="a_s asc",
>> >> > > rows=200),
>> >> > > on="a_s")
>> >> > > &indent=true
>> >> > >
>> >> > >
>> >> > > Regards,
>> >> > > Edwin
>> >> > >
>> >> > >
>> >> > > On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com> wrote:
>> >> > >
>> >> > > > Sorry, it's just called hashJoin
>> >> > > >
>> >> > > > Joel Bernstein
>> >> > > > http://joelsolr.blogspot.com/
>> >> > > >
>> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
>> >> > > edwinyeozl@gmail.com>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > Hi Joel,
>> >> > > > >
>> >> > > > > I am getting this error when I used the innerHashJoin.
>> >> > > > >
>> >> > > > >  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(
>> >> > > innerJoin
>> >> > > > >
>> >> > > > > I also can't find the documentation on innerHashJoin for the
>> >> > Streaming
>> >> > > > > Expressions.
>> >> > > > >
>> >> > > > > Are you referring to hashJoin?
>> >> > > > >
>> >> > > > > Regards,
>> >> > > > > Edwin
>> >> > > > >
>> >> > > > >
>> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
>> edwinyeozl@gmail.com
>> >> >
>> >> > > > wrote:
>> >> > > > >
>> >> > > > > > Hi Joel,
>> >> > > > > >
>> >> > > > > > Thanks for the info.
>> >> > > > > >
>> >> > > > > > Regards,
>> >> > > > > > Edwin
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <jo...@gmail.com>
>> >> wrote:
>> >> > > > > >
>> >> > > > > >> Also take a look at the documentation for the "fetch"
>> streaming
>> >> > > > > >> expression.
>> >> > > > > >>
>> >> > > > > >> Joel Bernstein
>> >> > > > > >> http://joelsolr.blogspot.com/
>> >> > > > > >>
>> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
>> >> > joelsolr@gmail.com>
>> >> > > > > >> wrote:
>> >> > > > > >>
>> >> > > > > >> > Yes you join more then one collection with Streaming
>> >> > Expressions.
>> >> > > > Here
>> >> > > > > >> are
>> >> > > > > >> > a few things to keep in mind.
>> >> > > > > >> >
>> >> > > > > >> > * You'll likely want to use the parallel function around
>> the
>> >> > > largest
>> >> > > > > >> join.
>> >> > > > > >> > You'll need to use the join keys as the partitionKeys.
>> >> > > > > >> > * innerJoin: requires that the streams be sorted on the
>> join
>> >> > keys.
>> >> > > > > >> > * innerHashJoin: has no sorting requirement.
>> >> > > > > >> >
>> >> > > > > >> > So a strategy for a three collection join might look like
>> >> this:
>> >> > > > > >> >
>> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)),
>> >> > > > > smallerStream)
>> >> > > > > >> >
>> >> > > > > >> > The largest join can be done in parallel using an
>> innerJoin.
>> >> You
>> >> > > can
>> >> > > > > >> then
>> >> > > > > >> > wrap the stream coming out of the parallel function in an
>> >> > > > > innerHashJoin
>> >> > > > > >> to
>> >> > > > > >> > join it to another stream.
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> > Joel Bernstein
>> >> > > > > >> > http://joelsolr.blogspot.com/
>> >> > > > > >> >
>> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
>> >> > > > > >> edwinyeozl@gmail.com>
>> >> > > > > >> > wrote:
>> >> > > > > >> >
>> >> > > > > >> >> Hi,
>> >> > > > > >> >>
>> >> > > > > >> >> Is it possible to join more than 2 collections using one
>> of
>> >> the
>> >> > > > > >> streaming
>> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there other ways
>> we
>> >> can
>> >> > > do
>> >> > > > > it?
>> >> > > > > >> >>
>> >> > > > > >> >> Currently, I may need to join 3 or 4 collections
>> together,
>> >> and
>> >> > to
>> >> > > > > >> output
>> >> > > > > >> >> selected fields from all these collections together.
>> >> > > > > >> >>
>> >> > > > > >> >> I'm using Solr 6.4.2.
>> >> > > > > >> >>
>> >> > > > > >> >> Regards,
>> >> > > > > >> >> Edwin
>> >> > > > > >> >>
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >>
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Joel,

It works when I started off with just one expression.

Could it be that the data size is too big for export after the join, which
causes the error?

Regards,
Edwin

On 4 May 2017 at 02:53, Joel Bernstein <jo...@gmail.com> wrote:

> I was just testing with the query below and it worked for me. Some of the
> error messages I was getting with the syntax was not what I was expecting
> though, so I'll look into the error handling. But the joins do work when
> the syntax correct. The query below is joining to the same collection three
> times, but the mechanics are exactly the same joining three different
> tables. In this example each join narrows down the result set.
>
> hashJoin(parallel(collection2,
>                             workers=3,
>                             sort="id asc",
>                             innerJoin(search(collection2, q="*:*", fl="id",
> sort="id asc", qt="/export", partitionKeys="id"),
>                                             search(collection2,
> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
> partitionKeys="id"),
>                                             on="id")),
>                 hashed=search(collection2, q="day_i:7", fl="id, day_i",
> sort="id asc", qt="/export"),
>                 on="id")
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <jo...@gmail.com> wrote:
>
> > Start off with just this expression:
> >
> > search(collection2,
> >             q=*:*,
> >             fl="a_s,b_s,c_s,d_s,e_s",
> >             sort="a_s asc",
> >             qt="/export")
> >
> > And then check the logs for exceptions.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com
> > > wrote:
> >
> >> Hi Joel,
> >>
> >> I am getting this error after I change add qt=/export and removed the
> rows
> >> param. Do you know what could be the reason?
> >>
> >> {
> >>   "error":{
> >>     "metadata":[
> >>       "error-class","org.apache.solr.common.SolrException",
> >>       "root-error-class","org.apache.http.MalformedChunkCodingExce
> >> ption"],
> >>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF expected
> >> at
> >> end of chunk",
> >>     "trace":"org.apache.solr.common.SolrException:
> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
> >> chunk\r\n\tat
> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> >> iteMap$0(TupleStream.java:79)\r\n\tat
> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
> >> seWriter.java:523)\r\n\tat
> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> >> ponseWriter.java:175)\r\n\tat
> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
> >> .java:559)\r\n\tat
> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
> >> TupleStream.java:64)\r\n\tat
> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
> >> ter.java:547)\r\n\tat
> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
> >> ponseWriter.java:193)\r\n\tat
> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
> >> ups(JSONResponseWriter.java:209)\r\n\tat
> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
> >> nseWriter.java:325)\r\n\tat
> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
> >> seWriter.java:120)\r\n\tat
> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
> >> seWriter.java:71)\r\n\tat
> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
> >> all.java:732)\r\n\tat
> >> org.apache.solr.servlet.HttpSolrCall.call(
> HttpSolrCall.java:473)\r\n\tat
> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> >> atchFilter.java:345)\r\n\tat
> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
> >> atchFilter.java:296)\r\n\tat
> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
> >> r(ServletHandler.java:1691)\r\n\tat
> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
> >> dler.java:582)\r\n\tat
> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> >> Handler.java:143)\r\n\tat
> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
> >> ndler.java:548)\r\n\tat
> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(
> >> SessionHandler.java:226)\r\n\tat
> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
> >> ContextHandler.java:1180)\r\n\tat
> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
> >> ler.java:512)\r\n\tat
> >> org.eclipse.jetty.server.session.SessionHandler.doScope(
> >> SessionHandler.java:185)\r\n\tat
> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(
> >> ContextHandler.java:1112)\r\n\tat
> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
> >> Handler.java:141)\r\n\tat
> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
> >> ndle(ContextHandlerCollection.java:213)\r\n\tat
> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(
> >> HandlerCollection.java:119)\r\n\tat
> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
> >> erWrapper.java:134)\r\n\tat
> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
> >> org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:320)\r\n\tat
> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
> >> ction.java:251)\r\n\tat
> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
> >> succeeded(AbstractConnection.java:273)\r\n\tat
> >> org.eclipse.jetty.io.FillInterest.fillable(
> FillInterest.java:95)\r\n\tat
> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
> >> elEndPoint.java:93)\r\n\tat
> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
> >> .run(ExecuteProduceConsume.java:136)\r\n\tat
> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
> >> ThreadPool.java:671)\r\n\tat
> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
> >> hreadPool.java:589)\r\n\tat
> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
> >> chunk\r\n\tat
> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
> >> kedInputStream.java:255)\r\n\tat
> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
> >> InputStream.java:227)\r\n\tat
> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> >> Stream.java:186)\r\n\tat
> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
> >> Stream.java:215)\r\n\tat
> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
> >> tStream.java:316)\r\n\tat
> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
> >> nagedEntity.java:164)\r\n\tat
> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
> >> orInputStream.java:228)\r\n\tat
> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
> >> utStream.java:174)\r\n\tat
> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
> >> java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat
> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
> >> (JSONTupleStream.java:92)\r\n\tat
> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
> >> Stream.java:193)\r\n\tat
> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
> >> (CloudSolrStream.java:464)\r\n\tat
> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
> >> HashJoinStream.java:231)\r\n\tat
> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
> >> (ExceptionStream.java:93)\r\n\tat
> >> org.apache.solr.handler.StreamHandler$TimerStream.close(
> >> StreamHandler.java:452)\r\n\tat
> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
> >> iteMap$0(TupleStream.java:71)\r\n\t...
> >> 40 more\r\n",
> >>     "code":500}}
> >>
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com> wrote:
> >>
> >> > I've reformatted the expression below and made a few changes. You have
> >> put
> >> > things together properly. But these are MapReduce joins that require
> >> > exporting the entire result sets. So you will need to add qt=/export
> to
> >> all
> >> > the searches and remove the rows param. In Solr 6.6. there is a new
> >> > "shuffle" expression that does this automatically.
> >> >
> >> > To test things you'll want to break down each expression and make sure
> >> it's
> >> > behaving as expected.
> >> >
> >> > For example first run each search. Then run the innerJoin, not in
> >> parallel
> >> > mode. Then run it in parallel mode. Then try the whole thing.
> >> >
> >> > hashJoin(parallel(collection2,
> >> >                             innerJoin(search(collection2,
> >> >                                                        q=*:*,
> >> >
> >> >  fl="a_s,b_s,c_s,d_s,e_s",
> >> >                                                        sort="a_s asc",
> >> >
> >> partitionKeys="a_s",
> >> >                                                        qt="/export"),
> >> >                                            search(collection1,
> >> >                                                        q=*:*,
> >> >
> >> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> >                                                        sort="a_s asc",
> >> >
> >>  partitionKeys="a_s",
> >> >                                                       qt="/export"),
> >> >                                            on="a_s"),
> >> >                              workers="2",
> >> >                              sort="a_s asc"),
> >> >                hashed=search(collection3,
> >> >                                          q=*:*,
> >> >                                          fl="a_s,k_s,l_s",
> >> >                                          sort="a_s asc",
> >> >                                          qt="/export"),
> >> >               on="a_s")
> >> >
> >> > Joel Bernstein
> >> > http://joelsolr.blogspot.com/
> >> >
> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> >> edwinyeozl@gmail.com
> >> > >
> >> > wrote:
> >> >
> >> > > Hi Joel,
> >> > >
> >> > > Thanks for the clarification.
> >> > >
> >> > > Would like to check, is this the correct way to do the join?
> >> Currently, I
> >> > > could not get any results after putting in the hashJoin for the 3rd,
> >> > > smallerStream collection (collection3).
> >> > >
> >> > > http://localhost:8983/solr/collection1/stream?expr=
> >> > > hashJoin(parallel(collection2
> >> > > ,
> >> > > innerJoin(
> >> > >  search(collection2,
> >> > > q=*:*,
> >> > > fl="a_s,b_s,c_s,d_s,e_s",
> >> > >              sort="a_s asc",
> >> > > partitionKeys="a_s",
> >> > > rows=200),
> >> > >  search(collection1,
> >> > > q=*:*,
> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >> > >              sort="a_s asc",
> >> > > partitionKeys="a_s",
> >> > > rows=200),
> >> > >          on="a_s"),
> >> > > workers="2",
> >> > >                  sort="a_s asc"),
> >> > >          hashed=search(collection3,
> >> > > q=*:*,
> >> > > fl="a_s,k_s,l_s",
> >> > > sort="a_s asc",
> >> > > rows=200),
> >> > > on="a_s")
> >> > > &indent=true
> >> > >
> >> > >
> >> > > Regards,
> >> > > Edwin
> >> > >
> >> > >
> >> > > On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com> wrote:
> >> > >
> >> > > > Sorry, it's just called hashJoin
> >> > > >
> >> > > > Joel Bernstein
> >> > > > http://joelsolr.blogspot.com/
> >> > > >
> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> >> > > edwinyeozl@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hi Joel,
> >> > > > >
> >> > > > > I am getting this error when I used the innerHashJoin.
> >> > > > >
> >> > > > >  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(
> >> > > innerJoin
> >> > > > >
> >> > > > > I also can't find the documentation on innerHashJoin for the
> >> > Streaming
> >> > > > > Expressions.
> >> > > > >
> >> > > > > Are you referring to hashJoin?
> >> > > > >
> >> > > > > Regards,
> >> > > > > Edwin
> >> > > > >
> >> > > > >
> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > > Hi Joel,
> >> > > > > >
> >> > > > > > Thanks for the info.
> >> > > > > >
> >> > > > > > Regards,
> >> > > > > > Edwin
> >> > > > > >
> >> > > > > >
> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <jo...@gmail.com>
> >> wrote:
> >> > > > > >
> >> > > > > >> Also take a look at the documentation for the "fetch"
> streaming
> >> > > > > >> expression.
> >> > > > > >>
> >> > > > > >> Joel Bernstein
> >> > > > > >> http://joelsolr.blogspot.com/
> >> > > > > >>
> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> >> > joelsolr@gmail.com>
> >> > > > > >> wrote:
> >> > > > > >>
> >> > > > > >> > Yes you join more then one collection with Streaming
> >> > Expressions.
> >> > > > Here
> >> > > > > >> are
> >> > > > > >> > a few things to keep in mind.
> >> > > > > >> >
> >> > > > > >> > * You'll likely want to use the parallel function around
> the
> >> > > largest
> >> > > > > >> join.
> >> > > > > >> > You'll need to use the join keys as the partitionKeys.
> >> > > > > >> > * innerJoin: requires that the streams be sorted on the
> join
> >> > keys.
> >> > > > > >> > * innerHashJoin: has no sorting requirement.
> >> > > > > >> >
> >> > > > > >> > So a strategy for a three collection join might look like
> >> this:
> >> > > > > >> >
> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)),
> >> > > > > smallerStream)
> >> > > > > >> >
> >> > > > > >> > The largest join can be done in parallel using an
> innerJoin.
> >> You
> >> > > can
> >> > > > > >> then
> >> > > > > >> > wrap the stream coming out of the parallel function in an
> >> > > > > innerHashJoin
> >> > > > > >> to
> >> > > > > >> > join it to another stream.
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> > Joel Bernstein
> >> > > > > >> > http://joelsolr.blogspot.com/
> >> > > > > >> >
> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
> >> > > > > >> edwinyeozl@gmail.com>
> >> > > > > >> > wrote:
> >> > > > > >> >
> >> > > > > >> >> Hi,
> >> > > > > >> >>
> >> > > > > >> >> Is it possible to join more than 2 collections using one
> of
> >> the
> >> > > > > >> streaming
> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there other ways
> we
> >> can
> >> > > do
> >> > > > > it?
> >> > > > > >> >>
> >> > > > > >> >> Currently, I may need to join 3 or 4 collections together,
> >> and
> >> > to
> >> > > > > >> output
> >> > > > > >> >> selected fields from all these collections together.
> >> > > > > >> >>
> >> > > > > >> >> I'm using Solr 6.4.2.
> >> > > > > >> >>
> >> > > > > >> >> Regards,
> >> > > > > >> >> Edwin
> >> > > > > >> >>
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >>
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Joining more than 2 collections

Posted by Joel Bernstein <jo...@gmail.com>.
I was just testing with the query below and it worked for me. Some of the
error messages I was getting with the syntax was not what I was expecting
though, so I'll look into the error handling. But the joins do work when
the syntax correct. The query below is joining to the same collection three
times, but the mechanics are exactly the same joining three different
tables. In this example each join narrows down the result set.

hashJoin(parallel(collection2,
                            workers=3,
                            sort="id asc",
                            innerJoin(search(collection2, q="*:*", fl="id",
sort="id asc", qt="/export", partitionKeys="id"),
                                            search(collection2,
q="year_i:42", fl="id, year_i", sort="id asc", qt="/export",
partitionKeys="id"),
                                            on="id")),
                hashed=search(collection2, q="day_i:7", fl="id, day_i",
sort="id asc", qt="/export"),
                on="id")

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <jo...@gmail.com> wrote:

> Start off with just this expression:
>
> search(collection2,
>             q=*:*,
>             fl="a_s,b_s,c_s,d_s,e_s",
>             sort="a_s asc",
>             qt="/export")
>
> And then check the logs for exceptions.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> > wrote:
>
>> Hi Joel,
>>
>> I am getting this error after I change add qt=/export and removed the rows
>> param. Do you know what could be the reason?
>>
>> {
>>   "error":{
>>     "metadata":[
>>       "error-class","org.apache.solr.common.SolrException",
>>       "root-error-class","org.apache.http.MalformedChunkCodingExce
>> ption"],
>>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF expected
>> at
>> end of chunk",
>>     "trace":"org.apache.solr.common.SolrException:
>> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
>> chunk\r\n\tat
>> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> iteMap$0(TupleStream.java:79)\r\n\tat
>> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon
>> seWriter.java:523)\r\n\tat
>> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> ponseWriter.java:175)\r\n\tat
>> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter
>> .java:559)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(
>> TupleStream.java:64)\r\n\tat
>> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri
>> ter.java:547)\r\n\tat
>> org.apache.solr.response.TextResponseWriter.writeVal(TextRes
>> ponseWriter.java:193)\r\n\tat
>> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD
>> ups(JSONResponseWriter.java:209)\r\n\tat
>> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo
>> nseWriter.java:325)\r\n\tat
>> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon
>> seWriter.java:120)\r\n\tat
>> org.apache.solr.response.JSONResponseWriter.write(JSONRespon
>> seWriter.java:71)\r\n\tat
>> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR
>> esponse(QueryResponseWriterUtil.java:65)\r\n\tat
>> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC
>> all.java:732)\r\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)\r\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> atchFilter.java:345)\r\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp
>> atchFilter.java:296)\r\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte
>> r(ServletHandler.java:1691)\r\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan
>> dler.java:582)\r\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> Handler.java:143)\r\n\tat
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa
>> ndler.java:548)\r\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(
>> SessionHandler.java:226)\r\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>> ContextHandler.java:1180)\r\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand
>> ler.java:512)\r\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doScope(
>> SessionHandler.java:185)\r\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>> ContextHandler.java:1112)\r\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped
>> Handler.java:141)\r\n\tat
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha
>> ndle(ContextHandlerCollection.java:213)\r\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>> HandlerCollection.java:119)\r\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl
>> erWrapper.java:134)\r\n\tat
>> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne
>> ction.java:251)\r\n\tat
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.
>> succeeded(AbstractConnection.java:273)\r\n\tat
>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\r\n\tat
>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann
>> elEndPoint.java:93)\r\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume
>> .run(ExecuteProduceConsume.java:136)\r\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued
>> ThreadPool.java:671)\r\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT
>> hreadPool.java:589)\r\n\tat
>> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
>> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
>> chunk\r\n\tat
>> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun
>> kedInputStream.java:255)\r\n\tat
>> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked
>> InputStream.java:227)\r\n\tat
>> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> Stream.java:186)\r\n\tat
>> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput
>> Stream.java:215)\r\n\tat
>> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu
>> tStream.java:316)\r\n\tat
>> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa
>> nagedEntity.java:164)\r\n\tat
>> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens
>> orInputStream.java:228)\r\n\tat
>> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp
>> utStream.java:174)\r\n\tat
>> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
>> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
>> java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close
>> (JSONTupleStream.java:92)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr
>> Stream.java:193)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close
>> (CloudSolrStream.java:464)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.HashJoinStream.close(
>> HashJoinStream.java:231)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.ExceptionStream.close
>> (ExceptionStream.java:93)\r\n\tat
>> org.apache.solr.handler.StreamHandler$TimerStream.close(
>> StreamHandler.java:452)\r\n\tat
>> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr
>> iteMap$0(TupleStream.java:71)\r\n\t...
>> 40 more\r\n",
>>     "code":500}}
>>
>>
>> Regards,
>> Edwin
>>
>>
>> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com> wrote:
>>
>> > I've reformatted the expression below and made a few changes. You have
>> put
>> > things together properly. But these are MapReduce joins that require
>> > exporting the entire result sets. So you will need to add qt=/export to
>> all
>> > the searches and remove the rows param. In Solr 6.6. there is a new
>> > "shuffle" expression that does this automatically.
>> >
>> > To test things you'll want to break down each expression and make sure
>> it's
>> > behaving as expected.
>> >
>> > For example first run each search. Then run the innerJoin, not in
>> parallel
>> > mode. Then run it in parallel mode. Then try the whole thing.
>> >
>> > hashJoin(parallel(collection2,
>> >                             innerJoin(search(collection2,
>> >                                                        q=*:*,
>> >
>> >  fl="a_s,b_s,c_s,d_s,e_s",
>> >                                                        sort="a_s asc",
>> >
>> partitionKeys="a_s",
>> >                                                        qt="/export"),
>> >                                            search(collection1,
>> >                                                        q=*:*,
>> >
>> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> >                                                        sort="a_s asc",
>> >
>>  partitionKeys="a_s",
>> >                                                       qt="/export"),
>> >                                            on="a_s"),
>> >                              workers="2",
>> >                              sort="a_s asc"),
>> >                hashed=search(collection3,
>> >                                          q=*:*,
>> >                                          fl="a_s,k_s,l_s",
>> >                                          sort="a_s asc",
>> >                                          qt="/export"),
>> >               on="a_s")
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
>> edwinyeozl@gmail.com
>> > >
>> > wrote:
>> >
>> > > Hi Joel,
>> > >
>> > > Thanks for the clarification.
>> > >
>> > > Would like to check, is this the correct way to do the join?
>> Currently, I
>> > > could not get any results after putting in the hashJoin for the 3rd,
>> > > smallerStream collection (collection3).
>> > >
>> > > http://localhost:8983/solr/collection1/stream?expr=
>> > > hashJoin(parallel(collection2
>> > > ,
>> > > innerJoin(
>> > >  search(collection2,
>> > > q=*:*,
>> > > fl="a_s,b_s,c_s,d_s,e_s",
>> > >              sort="a_s asc",
>> > > partitionKeys="a_s",
>> > > rows=200),
>> > >  search(collection1,
>> > > q=*:*,
>> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
>> > >              sort="a_s asc",
>> > > partitionKeys="a_s",
>> > > rows=200),
>> > >          on="a_s"),
>> > > workers="2",
>> > >                  sort="a_s asc"),
>> > >          hashed=search(collection3,
>> > > q=*:*,
>> > > fl="a_s,k_s,l_s",
>> > > sort="a_s asc",
>> > > rows=200),
>> > > on="a_s")
>> > > &indent=true
>> > >
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> > >
>> > > On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com> wrote:
>> > >
>> > > > Sorry, it's just called hashJoin
>> > > >
>> > > > Joel Bernstein
>> > > > http://joelsolr.blogspot.com/
>> > > >
>> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
>> > > edwinyeozl@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Joel,
>> > > > >
>> > > > > I am getting this error when I used the innerHashJoin.
>> > > > >
>> > > > >  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(
>> > > innerJoin
>> > > > >
>> > > > > I also can't find the documentation on innerHashJoin for the
>> > Streaming
>> > > > > Expressions.
>> > > > >
>> > > > > Are you referring to hashJoin?
>> > > > >
>> > > > > Regards,
>> > > > > Edwin
>> > > > >
>> > > > >
>> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi Joel,
>> > > > > >
>> > > > > > Thanks for the info.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Edwin
>> > > > > >
>> > > > > >
>> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <jo...@gmail.com>
>> wrote:
>> > > > > >
>> > > > > >> Also take a look at the documentation for the "fetch" streaming
>> > > > > >> expression.
>> > > > > >>
>> > > > > >> Joel Bernstein
>> > > > > >> http://joelsolr.blogspot.com/
>> > > > > >>
>> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
>> > joelsolr@gmail.com>
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >> > Yes you join more then one collection with Streaming
>> > Expressions.
>> > > > Here
>> > > > > >> are
>> > > > > >> > a few things to keep in mind.
>> > > > > >> >
>> > > > > >> > * You'll likely want to use the parallel function around the
>> > > largest
>> > > > > >> join.
>> > > > > >> > You'll need to use the join keys as the partitionKeys.
>> > > > > >> > * innerJoin: requires that the streams be sorted on the join
>> > keys.
>> > > > > >> > * innerHashJoin: has no sorting requirement.
>> > > > > >> >
>> > > > > >> > So a strategy for a three collection join might look like
>> this:
>> > > > > >> >
>> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)),
>> > > > > smallerStream)
>> > > > > >> >
>> > > > > >> > The largest join can be done in parallel using an innerJoin.
>> You
>> > > can
>> > > > > >> then
>> > > > > >> > wrap the stream coming out of the parallel function in an
>> > > > > innerHashJoin
>> > > > > >> to
>> > > > > >> > join it to another stream.
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > Joel Bernstein
>> > > > > >> > http://joelsolr.blogspot.com/
>> > > > > >> >
>> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
>> > > > > >> edwinyeozl@gmail.com>
>> > > > > >> > wrote:
>> > > > > >> >
>> > > > > >> >> Hi,
>> > > > > >> >>
>> > > > > >> >> Is it possible to join more than 2 collections using one of
>> the
>> > > > > >> streaming
>> > > > > >> >> expressions (Eg: innerJoin)? If not, is there other ways we
>> can
>> > > do
>> > > > > it?
>> > > > > >> >>
>> > > > > >> >> Currently, I may need to join 3 or 4 collections together,
>> and
>> > to
>> > > > > >> output
>> > > > > >> >> selected fields from all these collections together.
>> > > > > >> >>
>> > > > > >> >> I'm using Solr 6.4.2.
>> > > > > >> >>
>> > > > > >> >> Regards,
>> > > > > >> >> Edwin
>> > > > > >> >>
>> > > > > >> >
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Joining more than 2 collections

Posted by Joel Bernstein <jo...@gmail.com>.
Start off with just this expression:

search(collection2,
            q=*:*,
            fl="a_s,b_s,c_s,d_s,e_s",
            sort="a_s asc",
            qt="/export")

And then check the logs for exceptions.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi Joel,
>
> I am getting this error after I change add qt=/export and removed the rows
> param. Do you know what could be the reason?
>
> {
>   "error":{
>     "metadata":[
>       "error-class","org.apache.solr.common.SolrException",
>       "root-error-class","org.apache.http.MalformedChunkCodingException"],
>     "msg":"org.apache.http.MalformedChunkCodingException: CRLF expected at
> end of chunk",
>     "trace":"org.apache.solr.common.SolrException:
> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
> chunk\r\n\tat
> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> writeMap$0(TupleStream.java:79)\r\n\tat
> org.apache.solr.response.JSONWriter.writeIterator(
> JSONResponseWriter.java:523)\r\n\tat
> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWriter.java:175)\r\n\tat
> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559)\
> r\n\tat
> org.apache.solr.client.solrj.io.stream.TupleStream.
> writeMap(TupleStream.java:64)\r\n\tat
> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)\
> r\n\tat
> org.apache.solr.response.TextResponseWriter.writeVal(
> TextResponseWriter.java:193)\r\n\tat
> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> JSONResponseWriter.java:209)\r\n\tat
> org.apache.solr.response.JSONWriter.writeNamedList(
> JSONResponseWriter.java:325)\r\n\tat
> org.apache.solr.response.JSONWriter.writeResponse(
> JSONResponseWriter.java:120)\r\n\tat
> org.apache.solr.response.JSONResponseWriter.write(
> JSONResponseWriter.java:71)\r\n\tat
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> QueryResponseWriterUtil.java:65)\r\n\tat
> org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrCall.java:732)\r\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)\r\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:345)\r\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:296)\r\n\tat
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> doFilter(ServletHandler.java:1691)\r\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)\r\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)\r\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:548)\r\n\tat
> org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:226)\r\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1180)\r\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)\r\n\tat
> org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)\r\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1112)\r\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)\r\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> ContextHandlerCollection.java:213)\r\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.
> handle(HandlerCollection.java:119)\r\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:134)\r\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat
> org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:251)\r\n\tat
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
> AbstractConnection.java:273)\r\n\tat
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\r\n\tat
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
> SelectChannelEndPoint.java:93)\r\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
> produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
> ExecuteProduceConsume.java:136)\r\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:671)\r\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
> QueuedThreadPool.java:589)\r\n\tat
> java.lang.Thread.run(Thread.java:745)\r\nCaused by:
> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
> chunk\r\n\tat
> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(
> ChunkedInputStream.java:255)\r\n\tat
> org.apache.http.impl.io.ChunkedInputStream.nextChunk(
> ChunkedInputStream.java:227)\r\n\tat
> org.apache.http.impl.io.ChunkedInputStream.read(
> ChunkedInputStream.java:186)\r\n\tat
> org.apache.http.impl.io.ChunkedInputStream.read(
> ChunkedInputStream.java:215)\r\n\tat
> org.apache.http.impl.io.ChunkedInputStream.close(
> ChunkedInputStream.java:316)\r\n\tat
> org.apache.http.conn.BasicManagedEntity.streamClosed(
> BasicManagedEntity.java:164)\r\n\tat
> org.apache.http.conn.EofSensorInputStream.checkClose(
> EofSensorInputStream.java:228)\r\n\tat
> org.apache.http.conn.EofSensorInputStream.close(
> EofSensorInputStream.java:174)\r\n\tat
> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
> java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat
> org.apache.solr.client.solrj.io.stream.JSONTupleStream.
> close(JSONTupleStream.java:92)\r\n\tat
> org.apache.solr.client.solrj.io.stream.SolrStream.close(
> SolrStream.java:193)\r\n\tat
> org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> close(CloudSolrStream.java:464)\r\n\tat
> org.apache.solr.client.solrj.io.stream.HashJoinStream.
> close(HashJoinStream.java:231)\r\n\tat
> org.apache.solr.client.solrj.io.stream.ExceptionStream.
> close(ExceptionStream.java:93)\r\n\tat
> org.apache.solr.handler.StreamHandler$TimerStream.
> close(StreamHandler.java:452)\r\n\tat
> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$
> writeMap$0(TupleStream.java:71)\r\n\t...
> 40 more\r\n",
>     "code":500}}
>
>
> Regards,
> Edwin
>
>
> On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com> wrote:
>
> > I've reformatted the expression below and made a few changes. You have
> put
> > things together properly. But these are MapReduce joins that require
> > exporting the entire result sets. So you will need to add qt=/export to
> all
> > the searches and remove the rows param. In Solr 6.6. there is a new
> > "shuffle" expression that does this automatically.
> >
> > To test things you'll want to break down each expression and make sure
> it's
> > behaving as expected.
> >
> > For example first run each search. Then run the innerJoin, not in
> parallel
> > mode. Then run it in parallel mode. Then try the whole thing.
> >
> > hashJoin(parallel(collection2,
> >                             innerJoin(search(collection2,
> >                                                        q=*:*,
> >
> >  fl="a_s,b_s,c_s,d_s,e_s",
> >                                                        sort="a_s asc",
> >
> partitionKeys="a_s",
> >                                                        qt="/export"),
> >                                            search(collection1,
> >                                                        q=*:*,
> >
> >  fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >                                                        sort="a_s asc",
> >
>  partitionKeys="a_s",
> >                                                       qt="/export"),
> >                                            on="a_s"),
> >                              workers="2",
> >                              sort="a_s asc"),
> >                hashed=search(collection3,
> >                                          q=*:*,
> >                                          fl="a_s,k_s,l_s",
> >                                          sort="a_s asc",
> >                                          qt="/export"),
> >               on="a_s")
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com
> > >
> > wrote:
> >
> > > Hi Joel,
> > >
> > > Thanks for the clarification.
> > >
> > > Would like to check, is this the correct way to do the join?
> Currently, I
> > > could not get any results after putting in the hashJoin for the 3rd,
> > > smallerStream collection (collection3).
> > >
> > > http://localhost:8983/solr/collection1/stream?expr=
> > > hashJoin(parallel(collection2
> > > ,
> > > innerJoin(
> > >  search(collection2,
> > > q=*:*,
> > > fl="a_s,b_s,c_s,d_s,e_s",
> > >              sort="a_s asc",
> > > partitionKeys="a_s",
> > > rows=200),
> > >  search(collection1,
> > > q=*:*,
> > > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> > >              sort="a_s asc",
> > > partitionKeys="a_s",
> > > rows=200),
> > >          on="a_s"),
> > > workers="2",
> > >                  sort="a_s asc"),
> > >          hashed=search(collection3,
> > > q=*:*,
> > > fl="a_s,k_s,l_s",
> > > sort="a_s asc",
> > > rows=200),
> > > on="a_s")
> > > &indent=true
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com> wrote:
> > >
> > > > Sorry, it's just called hashJoin
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> > > edwinyeozl@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Joel,
> > > > >
> > > > > I am getting this error when I used the innerHashJoin.
> > > > >
> > > > >  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(
> > > innerJoin
> > > > >
> > > > > I also can't find the documentation on innerHashJoin for the
> > Streaming
> > > > > Expressions.
> > > > >
> > > > > Are you referring to hashJoin?
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > > >
> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <ed...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Joel,
> > > > > >
> > > > > > Thanks for the info.
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > > >
> > > > > >
> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <jo...@gmail.com>
> wrote:
> > > > > >
> > > > > >> Also take a look at the documentation for the "fetch" streaming
> > > > > >> expression.
> > > > > >>
> > > > > >> Joel Bernstein
> > > > > >> http://joelsolr.blogspot.com/
> > > > > >>
> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> > joelsolr@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Yes you join more then one collection with Streaming
> > Expressions.
> > > > Here
> > > > > >> are
> > > > > >> > a few things to keep in mind.
> > > > > >> >
> > > > > >> > * You'll likely want to use the parallel function around the
> > > largest
> > > > > >> join.
> > > > > >> > You'll need to use the join keys as the partitionKeys.
> > > > > >> > * innerJoin: requires that the streams be sorted on the join
> > keys.
> > > > > >> > * innerHashJoin: has no sorting requirement.
> > > > > >> >
> > > > > >> > So a strategy for a three collection join might look like
> this:
> > > > > >> >
> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)),
> > > > > smallerStream)
> > > > > >> >
> > > > > >> > The largest join can be done in parallel using an innerJoin.
> You
> > > can
> > > > > >> then
> > > > > >> > wrap the stream coming out of the parallel function in an
> > > > > innerHashJoin
> > > > > >> to
> > > > > >> > join it to another stream.
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > Joel Bernstein
> > > > > >> > http://joelsolr.blogspot.com/
> > > > > >> >
> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
> > > > > >> edwinyeozl@gmail.com>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> >> Hi,
> > > > > >> >>
> > > > > >> >> Is it possible to join more than 2 collections using one of
> the
> > > > > >> streaming
> > > > > >> >> expressions (Eg: innerJoin)? If not, is there other ways we
> can
> > > do
> > > > > it?
> > > > > >> >>
> > > > > >> >> Currently, I may need to join 3 or 4 collections together,
> and
> > to
> > > > > >> output
> > > > > >> >> selected fields from all these collections together.
> > > > > >> >>
> > > > > >> >> I'm using Solr 6.4.2.
> > > > > >> >>
> > > > > >> >> Regards,
> > > > > >> >> Edwin
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Joel,

I am getting this error after I change add qt=/export and removed the rows
param. Do you know what could be the reason?

{
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.http.MalformedChunkCodingException"],
    "msg":"org.apache.http.MalformedChunkCodingException: CRLF expected at
end of chunk",
    "trace":"org.apache.solr.common.SolrException:
org.apache.http.MalformedChunkCodingException: CRLF expected at end of
chunk\r\n\tat
org.apache.solr.client.solrj.io.stream.TupleStream.lambda$writeMap$0(TupleStream.java:79)\r\n\tat
org.apache.solr.response.JSONWriter.writeIterator(JSONResponseWriter.java:523)\r\n\tat
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:175)\r\n\tat
org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter.java:559)\r\n\tat
org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:64)\r\n\tat
org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)\r\n\tat
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:193)\r\n\tat
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:209)\r\n\tat
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:325)\r\n\tat
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:120)\r\n\tat
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:71)\r\n\tat
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:732)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:296)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\r\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\r\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\r\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)\r\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)\r\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\r\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\r\n\tat
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\r\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)\r\n\tat
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)\r\n\tat
java.lang.Thread.run(Thread.java:745)\r\nCaused by:
org.apache.http.MalformedChunkCodingException: CRLF expected at end of
chunk\r\n\tat
org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:255)\r\n\tat
org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)\r\n\tat
org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)\r\n\tat
org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:215)\r\n\tat
org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInputStream.java:316)\r\n\tat
org.apache.http.conn.BasicManagedEntity.streamClosed(BasicManagedEntity.java:164)\r\n\tat
org.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228)\r\n\tat
org.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:174)\r\n\tat
sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat
sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat
java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat
org.apache.solr.client.solrj.io.stream.JSONTupleStream.close(JSONTupleStream.java:92)\r\n\tat
org.apache.solr.client.solrj.io.stream.SolrStream.close(SolrStream.java:193)\r\n\tat
org.apache.solr.client.solrj.io.stream.CloudSolrStream.close(CloudSolrStream.java:464)\r\n\tat
org.apache.solr.client.solrj.io.stream.HashJoinStream.close(HashJoinStream.java:231)\r\n\tat
org.apache.solr.client.solrj.io.stream.ExceptionStream.close(ExceptionStream.java:93)\r\n\tat
org.apache.solr.handler.StreamHandler$TimerStream.close(StreamHandler.java:452)\r\n\tat
org.apache.solr.client.solrj.io.stream.TupleStream.lambda$writeMap$0(TupleStream.java:71)\r\n\t...
40 more\r\n",
    "code":500}}


Regards,
Edwin


On 4 May 2017 at 00:00, Joel Bernstein <jo...@gmail.com> wrote:

> I've reformatted the expression below and made a few changes. You have put
> things together properly. But these are MapReduce joins that require
> exporting the entire result sets. So you will need to add qt=/export to all
> the searches and remove the rows param. In Solr 6.6. there is a new
> "shuffle" expression that does this automatically.
>
> To test things you'll want to break down each expression and make sure it's
> behaving as expected.
>
> For example first run each search. Then run the innerJoin, not in parallel
> mode. Then run it in parallel mode. Then try the whole thing.
>
> hashJoin(parallel(collection2,
>                             innerJoin(search(collection2,
>                                                        q=*:*,
>
>  fl="a_s,b_s,c_s,d_s,e_s",
>                                                        sort="a_s asc",
>                                                        partitionKeys="a_s",
>                                                        qt="/export"),
>                                            search(collection1,
>                                                        q=*:*,
>
>  fl="a_s,f_s,g_s,h_s,i_s,j_s",
>                                                        sort="a_s asc",
>                                                       partitionKeys="a_s",
>                                                       qt="/export"),
>                                            on="a_s"),
>                              workers="2",
>                              sort="a_s asc"),
>                hashed=search(collection3,
>                                          q=*:*,
>                                          fl="a_s,k_s,l_s",
>                                          sort="a_s asc",
>                                          qt="/export"),
>               on="a_s")
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> wrote:
>
> > Hi Joel,
> >
> > Thanks for the clarification.
> >
> > Would like to check, is this the correct way to do the join? Currently, I
> > could not get any results after putting in the hashJoin for the 3rd,
> > smallerStream collection (collection3).
> >
> > http://localhost:8983/solr/collection1/stream?expr=
> > hashJoin(parallel(collection2
> > ,
> > innerJoin(
> >  search(collection2,
> > q=*:*,
> > fl="a_s,b_s,c_s,d_s,e_s",
> >              sort="a_s asc",
> > partitionKeys="a_s",
> > rows=200),
> >  search(collection1,
> > q=*:*,
> > fl="a_s,f_s,g_s,h_s,i_s,j_s",
> >              sort="a_s asc",
> > partitionKeys="a_s",
> > rows=200),
> >          on="a_s"),
> > workers="2",
> >                  sort="a_s asc"),
> >          hashed=search(collection3,
> > q=*:*,
> > fl="a_s,k_s,l_s",
> > sort="a_s asc",
> > rows=200),
> > on="a_s")
> > &indent=true
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com> wrote:
> >
> > > Sorry, it's just called hashJoin
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> > edwinyeozl@gmail.com>
> > > wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > I am getting this error when I used the innerHashJoin.
> > > >
> > > >  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(
> > innerJoin
> > > >
> > > > I also can't find the documentation on innerHashJoin for the
> Streaming
> > > > Expressions.
> > > >
> > > > Are you referring to hashJoin?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <ed...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Joel,
> > > > >
> > > > > Thanks for the info.
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > > >
> > > > > On 3 May 2017 at 02:04, Joel Bernstein <jo...@gmail.com> wrote:
> > > > >
> > > > >> Also take a look at the documentation for the "fetch" streaming
> > > > >> expression.
> > > > >>
> > > > >> Joel Bernstein
> > > > >> http://joelsolr.blogspot.com/
> > > > >>
> > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <
> joelsolr@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Yes you join more then one collection with Streaming
> Expressions.
> > > Here
> > > > >> are
> > > > >> > a few things to keep in mind.
> > > > >> >
> > > > >> > * You'll likely want to use the parallel function around the
> > largest
> > > > >> join.
> > > > >> > You'll need to use the join keys as the partitionKeys.
> > > > >> > * innerJoin: requires that the streams be sorted on the join
> keys.
> > > > >> > * innerHashJoin: has no sorting requirement.
> > > > >> >
> > > > >> > So a strategy for a three collection join might look like this:
> > > > >> >
> > > > >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)),
> > > > smallerStream)
> > > > >> >
> > > > >> > The largest join can be done in parallel using an innerJoin. You
> > can
> > > > >> then
> > > > >> > wrap the stream coming out of the parallel function in an
> > > > innerHashJoin
> > > > >> to
> > > > >> > join it to another stream.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > Joel Bernstein
> > > > >> > http://joelsolr.blogspot.com/
> > > > >> >
> > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
> > > > >> edwinyeozl@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Hi,
> > > > >> >>
> > > > >> >> Is it possible to join more than 2 collections using one of the
> > > > >> streaming
> > > > >> >> expressions (Eg: innerJoin)? If not, is there other ways we can
> > do
> > > > it?
> > > > >> >>
> > > > >> >> Currently, I may need to join 3 or 4 collections together, and
> to
> > > > >> output
> > > > >> >> selected fields from all these collections together.
> > > > >> >>
> > > > >> >> I'm using Solr 6.4.2.
> > > > >> >>
> > > > >> >> Regards,
> > > > >> >> Edwin
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Joining more than 2 collections

Posted by Joel Bernstein <jo...@gmail.com>.
I've reformatted the expression below and made a few changes. You have put
things together properly. But these are MapReduce joins that require
exporting the entire result sets. So you will need to add qt=/export to all
the searches and remove the rows param. In Solr 6.6. there is a new
"shuffle" expression that does this automatically.

To test things you'll want to break down each expression and make sure it's
behaving as expected.

For example first run each search. Then run the innerJoin, not in parallel
mode. Then run it in parallel mode. Then try the whole thing.

hashJoin(parallel(collection2,
                            innerJoin(search(collection2,
                                                       q=*:*,

 fl="a_s,b_s,c_s,d_s,e_s",
                                                       sort="a_s asc",
                                                       partitionKeys="a_s",
                                                       qt="/export"),
                                           search(collection1,
                                                       q=*:*,

 fl="a_s,f_s,g_s,h_s,i_s,j_s",
                                                       sort="a_s asc",
                                                      partitionKeys="a_s",
                                                      qt="/export"),
                                           on="a_s"),
                             workers="2",
                             sort="a_s asc"),
               hashed=search(collection3,
                                         q=*:*,
                                         fl="a_s,k_s,l_s",
                                         sort="a_s asc",
                                         qt="/export"),
              on="a_s")

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi Joel,
>
> Thanks for the clarification.
>
> Would like to check, is this the correct way to do the join? Currently, I
> could not get any results after putting in the hashJoin for the 3rd,
> smallerStream collection (collection3).
>
> http://localhost:8983/solr/collection1/stream?expr=
> hashJoin(parallel(collection2
> ,
> innerJoin(
>  search(collection2,
> q=*:*,
> fl="a_s,b_s,c_s,d_s,e_s",
>              sort="a_s asc",
> partitionKeys="a_s",
> rows=200),
>  search(collection1,
> q=*:*,
> fl="a_s,f_s,g_s,h_s,i_s,j_s",
>              sort="a_s asc",
> partitionKeys="a_s",
> rows=200),
>          on="a_s"),
> workers="2",
>                  sort="a_s asc"),
>          hashed=search(collection3,
> q=*:*,
> fl="a_s,k_s,l_s",
> sort="a_s asc",
> rows=200),
> on="a_s")
> &indent=true
>
>
> Regards,
> Edwin
>
>
> On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com> wrote:
>
> > Sorry, it's just called hashJoin
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com>
> > wrote:
> >
> > > Hi Joel,
> > >
> > > I am getting this error when I used the innerHashJoin.
> > >
> > >  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(
> innerJoin
> > >
> > > I also can't find the documentation on innerHashJoin for the Streaming
> > > Expressions.
> > >
> > > Are you referring to hashJoin?
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <ed...@gmail.com>
> > wrote:
> > >
> > > > Hi Joel,
> > > >
> > > > Thanks for the info.
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 3 May 2017 at 02:04, Joel Bernstein <jo...@gmail.com> wrote:
> > > >
> > > >> Also take a look at the documentation for the "fetch" streaming
> > > >> expression.
> > > >>
> > > >> Joel Bernstein
> > > >> http://joelsolr.blogspot.com/
> > > >>
> > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <jo...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Yes you join more then one collection with Streaming Expressions.
> > Here
> > > >> are
> > > >> > a few things to keep in mind.
> > > >> >
> > > >> > * You'll likely want to use the parallel function around the
> largest
> > > >> join.
> > > >> > You'll need to use the join keys as the partitionKeys.
> > > >> > * innerJoin: requires that the streams be sorted on the join keys.
> > > >> > * innerHashJoin: has no sorting requirement.
> > > >> >
> > > >> > So a strategy for a three collection join might look like this:
> > > >> >
> > > >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)),
> > > smallerStream)
> > > >> >
> > > >> > The largest join can be done in parallel using an innerJoin. You
> can
> > > >> then
> > > >> > wrap the stream coming out of the parallel function in an
> > > innerHashJoin
> > > >> to
> > > >> > join it to another stream.
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > Joel Bernstein
> > > >> > http://joelsolr.blogspot.com/
> > > >> >
> > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
> > > >> edwinyeozl@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> >> Hi,
> > > >> >>
> > > >> >> Is it possible to join more than 2 collections using one of the
> > > >> streaming
> > > >> >> expressions (Eg: innerJoin)? If not, is there other ways we can
> do
> > > it?
> > > >> >>
> > > >> >> Currently, I may need to join 3 or 4 collections together, and to
> > > >> output
> > > >> >> selected fields from all these collections together.
> > > >> >>
> > > >> >> I'm using Solr 6.4.2.
> > > >> >>
> > > >> >> Regards,
> > > >> >> Edwin
> > > >> >>
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Joel,

Thanks for the clarification.

Would like to check, is this the correct way to do the join? Currently, I
could not get any results after putting in the hashJoin for the 3rd,
smallerStream collection (collection3).

http://localhost:8983/solr/collection1/stream?expr=hashJoin(parallel(collection2
,
innerJoin(
 search(collection2,
q=*:*,
fl="a_s,b_s,c_s,d_s,e_s",
             sort="a_s asc",
partitionKeys="a_s",
rows=200),
 search(collection1,
q=*:*,
fl="a_s,f_s,g_s,h_s,i_s,j_s",
             sort="a_s asc",
partitionKeys="a_s",
rows=200),
         on="a_s"),
workers="2",
                 sort="a_s asc"),
         hashed=search(collection3,
q=*:*,
fl="a_s,k_s,l_s",
sort="a_s asc",
rows=200),
on="a_s")
&indent=true


Regards,
Edwin


On 3 May 2017 at 20:59, Joel Bernstein <jo...@gmail.com> wrote:

> Sorry, it's just called hashJoin
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
>
> > Hi Joel,
> >
> > I am getting this error when I used the innerHashJoin.
> >
> >  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(innerJoin
> >
> > I also can't find the documentation on innerHashJoin for the Streaming
> > Expressions.
> >
> > Are you referring to hashJoin?
> >
> > Regards,
> > Edwin
> >
> >
> > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
> >
> > > Hi Joel,
> > >
> > > Thanks for the info.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 3 May 2017 at 02:04, Joel Bernstein <jo...@gmail.com> wrote:
> > >
> > >> Also take a look at the documentation for the "fetch" streaming
> > >> expression.
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <jo...@gmail.com>
> > >> wrote:
> > >>
> > >> > Yes you join more then one collection with Streaming Expressions.
> Here
> > >> are
> > >> > a few things to keep in mind.
> > >> >
> > >> > * You'll likely want to use the parallel function around the largest
> > >> join.
> > >> > You'll need to use the join keys as the partitionKeys.
> > >> > * innerJoin: requires that the streams be sorted on the join keys.
> > >> > * innerHashJoin: has no sorting requirement.
> > >> >
> > >> > So a strategy for a three collection join might look like this:
> > >> >
> > >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)),
> > smallerStream)
> > >> >
> > >> > The largest join can be done in parallel using an innerJoin. You can
> > >> then
> > >> > wrap the stream coming out of the parallel function in an
> > innerHashJoin
> > >> to
> > >> > join it to another stream.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Joel Bernstein
> > >> > http://joelsolr.blogspot.com/
> > >> >
> > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
> > >> edwinyeozl@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> Is it possible to join more than 2 collections using one of the
> > >> streaming
> > >> >> expressions (Eg: innerJoin)? If not, is there other ways we can do
> > it?
> > >> >>
> > >> >> Currently, I may need to join 3 or 4 collections together, and to
> > >> output
> > >> >> selected fields from all these collections together.
> > >> >>
> > >> >> I'm using Solr 6.4.2.
> > >> >>
> > >> >> Regards,
> > >> >> Edwin
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Joining more than 2 collections

Posted by Joel Bernstein <jo...@gmail.com>.
Sorry, it's just called hashJoin

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi Joel,
>
> I am getting this error when I used the innerHashJoin.
>
>  "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(innerJoin
>
> I also can't find the documentation on innerHashJoin for the Streaming
> Expressions.
>
> Are you referring to hashJoin?
>
> Regards,
> Edwin
>
>
> On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:
>
> > Hi Joel,
> >
> > Thanks for the info.
> >
> > Regards,
> > Edwin
> >
> >
> > On 3 May 2017 at 02:04, Joel Bernstein <jo...@gmail.com> wrote:
> >
> >> Also take a look at the documentation for the "fetch" streaming
> >> expression.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <jo...@gmail.com>
> >> wrote:
> >>
> >> > Yes you join more then one collection with Streaming Expressions. Here
> >> are
> >> > a few things to keep in mind.
> >> >
> >> > * You'll likely want to use the parallel function around the largest
> >> join.
> >> > You'll need to use the join keys as the partitionKeys.
> >> > * innerJoin: requires that the streams be sorted on the join keys.
> >> > * innerHashJoin: has no sorting requirement.
> >> >
> >> > So a strategy for a three collection join might look like this:
> >> >
> >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)),
> smallerStream)
> >> >
> >> > The largest join can be done in parallel using an innerJoin. You can
> >> then
> >> > wrap the stream coming out of the parallel function in an
> innerHashJoin
> >> to
> >> > join it to another stream.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Joel Bernstein
> >> > http://joelsolr.blogspot.com/
> >> >
> >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
> >> edwinyeozl@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> Is it possible to join more than 2 collections using one of the
> >> streaming
> >> >> expressions (Eg: innerJoin)? If not, is there other ways we can do
> it?
> >> >>
> >> >> Currently, I may need to join 3 or 4 collections together, and to
> >> output
> >> >> selected fields from all these collections together.
> >> >>
> >> >> I'm using Solr 6.4.2.
> >> >>
> >> >> Regards,
> >> >> Edwin
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Joel,

I am getting this error when I used the innerHashJoin.

 "EXCEPTION":"Invalid stream expression innerHashJoin(parallel(innerJoin

I also can't find the documentation on innerHashJoin for the Streaming
Expressions.

Are you referring to hashJoin?

Regards,
Edwin


On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo <ed...@gmail.com> wrote:

> Hi Joel,
>
> Thanks for the info.
>
> Regards,
> Edwin
>
>
> On 3 May 2017 at 02:04, Joel Bernstein <jo...@gmail.com> wrote:
>
>> Also take a look at the documentation for the "fetch" streaming
>> expression.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <jo...@gmail.com>
>> wrote:
>>
>> > Yes you join more then one collection with Streaming Expressions. Here
>> are
>> > a few things to keep in mind.
>> >
>> > * You'll likely want to use the parallel function around the largest
>> join.
>> > You'll need to use the join keys as the partitionKeys.
>> > * innerJoin: requires that the streams be sorted on the join keys.
>> > * innerHashJoin: has no sorting requirement.
>> >
>> > So a strategy for a three collection join might look like this:
>> >
>> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)), smallerStream)
>> >
>> > The largest join can be done in parallel using an innerJoin. You can
>> then
>> > wrap the stream coming out of the parallel function in an innerHashJoin
>> to
>> > join it to another stream.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
>> edwinyeozl@gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> Is it possible to join more than 2 collections using one of the
>> streaming
>> >> expressions (Eg: innerJoin)? If not, is there other ways we can do it?
>> >>
>> >> Currently, I may need to join 3 or 4 collections together, and to
>> output
>> >> selected fields from all these collections together.
>> >>
>> >> I'm using Solr 6.4.2.
>> >>
>> >> Regards,
>> >> Edwin
>> >>
>> >
>> >
>>
>
>

Re: Joining more than 2 collections

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Joel,

Thanks for the info.

Regards,
Edwin


On 3 May 2017 at 02:04, Joel Bernstein <jo...@gmail.com> wrote:

> Also take a look at the documentation for the "fetch" streaming expression.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <jo...@gmail.com> wrote:
>
> > Yes you join more then one collection with Streaming Expressions. Here
> are
> > a few things to keep in mind.
> >
> > * You'll likely want to use the parallel function around the largest
> join.
> > You'll need to use the join keys as the partitionKeys.
> > * innerJoin: requires that the streams be sorted on the join keys.
> > * innerHashJoin: has no sorting requirement.
> >
> > So a strategy for a three collection join might look like this:
> >
> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)), smallerStream)
> >
> > The largest join can be done in parallel using an innerJoin. You can then
> > wrap the stream coming out of the parallel function in an innerHashJoin
> to
> > join it to another stream.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Is it possible to join more than 2 collections using one of the
> streaming
> >> expressions (Eg: innerJoin)? If not, is there other ways we can do it?
> >>
> >> Currently, I may need to join 3 or 4 collections together, and to output
> >> selected fields from all these collections together.
> >>
> >> I'm using Solr 6.4.2.
> >>
> >> Regards,
> >> Edwin
> >>
> >
> >
>

Re: Joining more than 2 collections

Posted by Joel Bernstein <jo...@gmail.com>.
Also take a look at the documentation for the "fetch" streaming expression.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein <jo...@gmail.com> wrote:

> Yes you join more then one collection with Streaming Expressions. Here are
> a few things to keep in mind.
>
> * You'll likely want to use the parallel function around the largest join.
> You'll need to use the join keys as the partitionKeys.
> * innerJoin: requires that the streams be sorted on the join keys.
> * innerHashJoin: has no sorting requirement.
>
> So a strategy for a three collection join might look like this:
>
> innerHashJoin(parallel(innerJoin(bigStream, bigStream)), smallerStream)
>
> The largest join can be done in parallel using an innerJoin. You can then
> wrap the stream coming out of the parallel function in an innerHashJoin to
> join it to another stream.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <ed...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Is it possible to join more than 2 collections using one of the streaming
>> expressions (Eg: innerJoin)? If not, is there other ways we can do it?
>>
>> Currently, I may need to join 3 or 4 collections together, and to output
>> selected fields from all these collections together.
>>
>> I'm using Solr 6.4.2.
>>
>> Regards,
>> Edwin
>>
>
>

Re: Joining more than 2 collections

Posted by Joel Bernstein <jo...@gmail.com>.
Yes you join more then one collection with Streaming Expressions. Here are
a few things to keep in mind.

* You'll likely want to use the parallel function around the largest join.
You'll need to use the join keys as the partitionKeys.
* innerJoin: requires that the streams be sorted on the join keys.
* innerHashJoin: has no sorting requirement.

So a strategy for a three collection join might look like this:

innerHashJoin(parallel(innerJoin(bigStream, bigStream)), smallerStream)

The largest join can be done in parallel using an innerJoin. You can then
wrap the stream coming out of the parallel function in an innerHashJoin to
join it to another stream.















Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi,
>
> Is it possible to join more than 2 collections using one of the streaming
> expressions (Eg: innerJoin)? If not, is there other ways we can do it?
>
> Currently, I may need to join 3 or 4 collections together, and to output
> selected fields from all these collections together.
>
> I'm using Solr 6.4.2.
>
> Regards,
> Edwin
>