You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hendrik Haddorp <he...@gmx.net> on 2017/09/28 19:35:20 UTC

streaming with SolrJ

Hi,

I'm trying to use the streaming API via SolrJ but have some trouble with 
the documentation and samples. In the reference guide I found the below 
example in 
http://lucene.apache.org/solr/guide/6_6/streaming-expressions.html. 
Problem is that "withStreamFunction" does not seem to exist. There is 
"withFunctionName", which would match the arguments but there is no 
documentation in the JavaDoc nor is the sample stating why I would need 
all those "with" calls if pretty much everything is also in the last 
"constructStream" method call. I was planning to retrieve a few fields 
for all documents in a collection but have trouble to figure out what is 
the correct way to do so. The documentation also uses "/export" and 
"/search", with little explanation on the differences. Would really 
appreciate a pointer to some simple samples.

The org.apache.solr.client.solrj.io package provides Java classes that 
compile streaming expressions into streaming API objects. These classes 
can be used to execute streaming expressions from inside a Java 
application. For example:

StreamFactory streamFactory = new 
StreamFactory().withCollectionZkHost("collection1", zkServer.getZkAddress())
     .withStreamFunction("search", CloudSolrStream.class)
     .withStreamFunction("unique", UniqueStream.class)
     .withStreamFunction("top", RankStream.class)
     .withStreamFunction("group", ReducerStream.class)
     .withStreamFunction("parallel", ParallelStream.class);

ParallelStream pstream = 
(ParallelStream)streamFactory.constructStream("parallel(collection1, 
group(search(collection1, q=\"*:*\", fl=\"id,a_s,a_i,a_f\", sort=\"a_s 
asc,a_f asc\", partitionKeys=\"a_s\"), by=\"a_s asc\"), workers=\"2\", 
zkHost=\""+zkHost+"\", sort=\"a_s asc\")");

regards,
Hendrik

Re: streaming with SolrJ

Posted by Joel Bernstein <jo...@gmail.com>.
There isn't much documentation for how to use the Streaming API java
classes directly. All of the effort has been going into Streaming
Expressions which you send to the /stream handler to execute. Over time
it's become more and more complicated to use the Java classes because there
are so many of them and because their initialization can be complex. All of
the test cases are now focused on exercising the underlying classes through
the expressions.


Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Sep 28, 2017 at 4:47 PM, Hendrik Haddorp <he...@gmx.net>
wrote:

> hm, thanks, but why are all those withFunctionName calls required and how
> did you get to this?
>
>
> On 28.09.2017 22:01, Susheel Kumar wrote:
>
>> I have this snippet with couple of functions e.g. if that helps
>>
>> ---
>>      TupleStream stream;
>>      List<Tuple> tuples;
>>      StreamContext streamContext = new StreamContext();
>>      SolrClientCache solrClientCache = new SolrClientCache();
>>      streamContext.setSolrClientCache(solrClientCache);
>>
>>      StreamFactory factory = new StreamFactory()
>>       .withCollectionZkHost("gettingstarted", "localhost:2181")
>>      .withFunctionName("search", CloudSolrStream.class)
>>        .withFunctionName("select", SelectStream.class)
>>        .withFunctionName("add", AddEvaluator.class)
>>        .withFunctionName("if", IfThenElseEvaluator.class)
>>        .withFunctionName("gt", GreaterThanEvaluator.class)
>>        .withFunctionName("let", LetStream.class)
>>        .withFunctionName("get", GetStream.class)
>>        .withFunctionName("echo", EchoStream.class)
>>        .withFunctionName("merge", MergeStream.class)
>>        .withFunctionName("sort", SortStream.class)
>>        .withFunctionName("tuple", TupStream.class)
>>        .withFunctionName("rollup",RollupStream.class)
>>        .withFunctionName("hashJoin", HashJoinStream.class)
>>        .withFunctionName("complement", ComplementStream.class)
>>        .withFunctionName("fetch", FetchStream.class)
>>        .withFunctionName("having",HavingStream.class)
>> //      .withFunctionName("eq", EqualsEvaluator.class)
>>        .withFunctionName("count", CountMetric.class)
>>        .withFunctionName("facet", FacetStream.class)
>>        .withFunctionName("sum", SumMetric.class)
>>        .withFunctionName("unique", UniqueStream.class)
>>        .withFunctionName("uniq", UniqueMetric.class)
>>        .withFunctionName("innerJoin", InnerJoinStream.class)
>>        .withFunctionName("intersect", IntersectStream.class)
>>        .withFunctionName("replace", ReplaceOperation.class)
>>
>>        ;
>>      try {
>>      clause = getClause();
>>        stream = factory.constructStream(clause);
>>        stream.setStreamContext(streamContext);
>>        tuples = getTuples(stream);
>>
>>        for(Tuple tuple : tuples )
>>        {
>>        System.out.println(tuple.getString("id"));
>>        System.out.println(tuple.getString("business_email_s"));
>>      ....
>>
>>        }
>>
>>        System.out.println("Total tuples retunred "+tuples.size());
>>
>>
>> ---
>> private static String getClause() {
>> String clause = "select(search(gettingstarted,\n" +
>> "                        q=*:* NOT personal_email_s:*,\n" +
>> "                        fl=\"id,business_email_s\",\n" +
>> "                        sort=\"business_email_s asc\"),\n" +
>> "id,\n" +
>> "business_email_s,\n" +
>> "personal_email_s,\n" +
>> "replace(personal_email_s,null,withField=business_email_s)\n" +
>> ")";
>> return clause;
>> }
>>
>>
>> On Thu, Sep 28, 2017 at 3:35 PM, Hendrik Haddorp <hendrik.haddorp@gmx.net
>> >
>> wrote:
>>
>> Hi,
>>>
>>> I'm trying to use the streaming API via SolrJ but have some trouble with
>>> the documentation and samples. In the reference guide I found the below
>>> example in http://lucene.apache.org/solr/guide/6_6/streaming-expression
>>> s.html. Problem is that "withStreamFunction" does not seem to exist.
>>> There is "withFunctionName", which would match the arguments but there is
>>> no documentation in the JavaDoc nor is the sample stating why I would
>>> need
>>> all those "with" calls if pretty much everything is also in the last
>>> "constructStream" method call. I was planning to retrieve a few fields
>>> for
>>> all documents in a collection but have trouble to figure out what is the
>>> correct way to do so. The documentation also uses "/export" and
>>> "/search",
>>> with little explanation on the differences. Would really appreciate a
>>> pointer to some simple samples.
>>>
>>> The org.apache.solr.client.solrj.io package provides Java classes that
>>> compile streaming expressions into streaming API objects. These classes
>>> can
>>> be used to execute streaming expressions from inside a Java application.
>>> For example:
>>>
>>> StreamFactory streamFactory = new StreamFactory().withCollection
>>> ZkHost("collection1",
>>> zkServer.getZkAddress())
>>>      .withStreamFunction("search", CloudSolrStream.class)
>>>      .withStreamFunction("unique", UniqueStream.class)
>>>      .withStreamFunction("top", RankStream.class)
>>>      .withStreamFunction("group", ReducerStream.class)
>>>      .withStreamFunction("parallel", ParallelStream.class);
>>>
>>> ParallelStream pstream = (ParallelStream)streamFactory.
>>> constructStream("parallel(collection1, group(search(collection1,
>>> q=\"*:*\", fl=\"id,a_s,a_i,a_f\", sort=\"a_s asc,a_f asc\",
>>> partitionKeys=\"a_s\"), by=\"a_s asc\"), workers=\"2\",
>>> zkHost=\""+zkHost+"\", sort=\"a_s asc\")");
>>>
>>> regards,
>>> Hendrik
>>>
>>>
>

Re: streaming with SolrJ

Posted by Hendrik Haddorp <he...@gmx.net>.
hm, thanks, but why are all those withFunctionName calls required and 
how did you get to this?

On 28.09.2017 22:01, Susheel Kumar wrote:
> I have this snippet with couple of functions e.g. if that helps
>
> ---
>      TupleStream stream;
>      List<Tuple> tuples;
>      StreamContext streamContext = new StreamContext();
>      SolrClientCache solrClientCache = new SolrClientCache();
>      streamContext.setSolrClientCache(solrClientCache);
>
>      StreamFactory factory = new StreamFactory()
>       .withCollectionZkHost("gettingstarted", "localhost:2181")
>      .withFunctionName("search", CloudSolrStream.class)
>        .withFunctionName("select", SelectStream.class)
>        .withFunctionName("add", AddEvaluator.class)
>        .withFunctionName("if", IfThenElseEvaluator.class)
>        .withFunctionName("gt", GreaterThanEvaluator.class)
>        .withFunctionName("let", LetStream.class)
>        .withFunctionName("get", GetStream.class)
>        .withFunctionName("echo", EchoStream.class)
>        .withFunctionName("merge", MergeStream.class)
>        .withFunctionName("sort", SortStream.class)
>        .withFunctionName("tuple", TupStream.class)
>        .withFunctionName("rollup",RollupStream.class)
>        .withFunctionName("hashJoin", HashJoinStream.class)
>        .withFunctionName("complement", ComplementStream.class)
>        .withFunctionName("fetch", FetchStream.class)
>        .withFunctionName("having",HavingStream.class)
> //      .withFunctionName("eq", EqualsEvaluator.class)
>        .withFunctionName("count", CountMetric.class)
>        .withFunctionName("facet", FacetStream.class)
>        .withFunctionName("sum", SumMetric.class)
>        .withFunctionName("unique", UniqueStream.class)
>        .withFunctionName("uniq", UniqueMetric.class)
>        .withFunctionName("innerJoin", InnerJoinStream.class)
>        .withFunctionName("intersect", IntersectStream.class)
>        .withFunctionName("replace", ReplaceOperation.class)
>
>        ;
>      try {
>      clause = getClause();
>        stream = factory.constructStream(clause);
>        stream.setStreamContext(streamContext);
>        tuples = getTuples(stream);
>
>        for(Tuple tuple : tuples )
>        {
>        System.out.println(tuple.getString("id"));
>        System.out.println(tuple.getString("business_email_s"));
>      ....
>
>        }
>
>        System.out.println("Total tuples retunred "+tuples.size());
>
>
> ---
> private static String getClause() {
> String clause = "select(search(gettingstarted,\n" +
> "                        q=*:* NOT personal_email_s:*,\n" +
> "                        fl=\"id,business_email_s\",\n" +
> "                        sort=\"business_email_s asc\"),\n" +
> "id,\n" +
> "business_email_s,\n" +
> "personal_email_s,\n" +
> "replace(personal_email_s,null,withField=business_email_s)\n" +
> ")";
> return clause;
> }
>
>
> On Thu, Sep 28, 2017 at 3:35 PM, Hendrik Haddorp <he...@gmx.net>
> wrote:
>
>> Hi,
>>
>> I'm trying to use the streaming API via SolrJ but have some trouble with
>> the documentation and samples. In the reference guide I found the below
>> example in http://lucene.apache.org/solr/guide/6_6/streaming-expression
>> s.html. Problem is that "withStreamFunction" does not seem to exist.
>> There is "withFunctionName", which would match the arguments but there is
>> no documentation in the JavaDoc nor is the sample stating why I would need
>> all those "with" calls if pretty much everything is also in the last
>> "constructStream" method call. I was planning to retrieve a few fields for
>> all documents in a collection but have trouble to figure out what is the
>> correct way to do so. The documentation also uses "/export" and "/search",
>> with little explanation on the differences. Would really appreciate a
>> pointer to some simple samples.
>>
>> The org.apache.solr.client.solrj.io package provides Java classes that
>> compile streaming expressions into streaming API objects. These classes can
>> be used to execute streaming expressions from inside a Java application.
>> For example:
>>
>> StreamFactory streamFactory = new StreamFactory().withCollectionZkHost("collection1",
>> zkServer.getZkAddress())
>>      .withStreamFunction("search", CloudSolrStream.class)
>>      .withStreamFunction("unique", UniqueStream.class)
>>      .withStreamFunction("top", RankStream.class)
>>      .withStreamFunction("group", ReducerStream.class)
>>      .withStreamFunction("parallel", ParallelStream.class);
>>
>> ParallelStream pstream = (ParallelStream)streamFactory.
>> constructStream("parallel(collection1, group(search(collection1,
>> q=\"*:*\", fl=\"id,a_s,a_i,a_f\", sort=\"a_s asc,a_f asc\",
>> partitionKeys=\"a_s\"), by=\"a_s asc\"), workers=\"2\",
>> zkHost=\""+zkHost+"\", sort=\"a_s asc\")");
>>
>> regards,
>> Hendrik
>>


Re: streaming with SolrJ

Posted by Susheel Kumar <su...@gmail.com>.
I have this snippet with couple of functions e.g. if that helps

---
    TupleStream stream;
    List<Tuple> tuples;
    StreamContext streamContext = new StreamContext();
    SolrClientCache solrClientCache = new SolrClientCache();
    streamContext.setSolrClientCache(solrClientCache);

    StreamFactory factory = new StreamFactory()
     .withCollectionZkHost("gettingstarted", "localhost:2181")
    .withFunctionName("search", CloudSolrStream.class)
      .withFunctionName("select", SelectStream.class)
      .withFunctionName("add", AddEvaluator.class)
      .withFunctionName("if", IfThenElseEvaluator.class)
      .withFunctionName("gt", GreaterThanEvaluator.class)
      .withFunctionName("let", LetStream.class)
      .withFunctionName("get", GetStream.class)
      .withFunctionName("echo", EchoStream.class)
      .withFunctionName("merge", MergeStream.class)
      .withFunctionName("sort", SortStream.class)
      .withFunctionName("tuple", TupStream.class)
      .withFunctionName("rollup",RollupStream.class)
      .withFunctionName("hashJoin", HashJoinStream.class)
      .withFunctionName("complement", ComplementStream.class)
      .withFunctionName("fetch", FetchStream.class)
      .withFunctionName("having",HavingStream.class)
//      .withFunctionName("eq", EqualsEvaluator.class)
      .withFunctionName("count", CountMetric.class)
      .withFunctionName("facet", FacetStream.class)
      .withFunctionName("sum", SumMetric.class)
      .withFunctionName("unique", UniqueStream.class)
      .withFunctionName("uniq", UniqueMetric.class)
      .withFunctionName("innerJoin", InnerJoinStream.class)
      .withFunctionName("intersect", IntersectStream.class)
      .withFunctionName("replace", ReplaceOperation.class)

      ;
    try {
    clause = getClause();
      stream = factory.constructStream(clause);
      stream.setStreamContext(streamContext);
      tuples = getTuples(stream);

      for(Tuple tuple : tuples )
      {
      System.out.println(tuple.getString("id"));
      System.out.println(tuple.getString("business_email_s"));
    ....

      }

      System.out.println("Total tuples retunred "+tuples.size());


---
private static String getClause() {
String clause = "select(search(gettingstarted,\n" +
"                        q=*:* NOT personal_email_s:*,\n" +
"                        fl=\"id,business_email_s\",\n" +
"                        sort=\"business_email_s asc\"),\n" +
"id,\n" +
"business_email_s,\n" +
"personal_email_s,\n" +
"replace(personal_email_s,null,withField=business_email_s)\n" +
")";
return clause;
}


On Thu, Sep 28, 2017 at 3:35 PM, Hendrik Haddorp <he...@gmx.net>
wrote:

> Hi,
>
> I'm trying to use the streaming API via SolrJ but have some trouble with
> the documentation and samples. In the reference guide I found the below
> example in http://lucene.apache.org/solr/guide/6_6/streaming-expression
> s.html. Problem is that "withStreamFunction" does not seem to exist.
> There is "withFunctionName", which would match the arguments but there is
> no documentation in the JavaDoc nor is the sample stating why I would need
> all those "with" calls if pretty much everything is also in the last
> "constructStream" method call. I was planning to retrieve a few fields for
> all documents in a collection but have trouble to figure out what is the
> correct way to do so. The documentation also uses "/export" and "/search",
> with little explanation on the differences. Would really appreciate a
> pointer to some simple samples.
>
> The org.apache.solr.client.solrj.io package provides Java classes that
> compile streaming expressions into streaming API objects. These classes can
> be used to execute streaming expressions from inside a Java application.
> For example:
>
> StreamFactory streamFactory = new StreamFactory().withCollectionZkHost("collection1",
> zkServer.getZkAddress())
>     .withStreamFunction("search", CloudSolrStream.class)
>     .withStreamFunction("unique", UniqueStream.class)
>     .withStreamFunction("top", RankStream.class)
>     .withStreamFunction("group", ReducerStream.class)
>     .withStreamFunction("parallel", ParallelStream.class);
>
> ParallelStream pstream = (ParallelStream)streamFactory.
> constructStream("parallel(collection1, group(search(collection1,
> q=\"*:*\", fl=\"id,a_s,a_i,a_f\", sort=\"a_s asc,a_f asc\",
> partitionKeys=\"a_s\"), by=\"a_s asc\"), workers=\"2\",
> zkHost=\""+zkHost+"\", sort=\"a_s asc\")");
>
> regards,
> Hendrik
>