You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dennis Gove (JIRA)" <ji...@apache.org> on 2015/06/09 13:31:00 UTC
[jira] [Commented] (SOLR-7560) Parallel SQL Support

    [ https://issues.apache.org/jira/browse/SOLR-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578750#comment-14578750 ] 

Dennis Gove commented on SOLR-7560:
-----------------------------------

Possible expression syntax for the RollupStream

{code}
rollup(
  someStream(....),
  over="fieldA, fieldB, fieldC",
  min(fieldA),
  max(fieldA),
  min(fieldB),
  mean(fieldD),
  sum(fieldC)
)
{code}

This would require making the *Metric types Expressible but I think that ends up as a good thing. Would make it real easy to support other options on metrics like excluding outliers, for example find the sum of values within 3 standard deviations from the mean could be 
{code}
sum(fieldC, limit=standardDev(3))
{code}
 (note, how that particular calculation could be implemented is left as an exercise for the reader, I'm just using it as an example of adding additional options on a relatively simple metric).
Another option example is what to do with null values. For example, in some cases a null should not impact a mean but in others it should. You could express those as
{code}
mean(fieldA, replace(null, 0))  // replace null values with 0 thus leading to an impact on the mean
mean(fieldA, includeNull="true") // nulls are counted in the denominator but nothing added to numerator
mean(fieldA, includeNull="false") // nulls neither counted in denominator nor added to numerator
mean(fieldA, replace(null, fieldB), includeNull="true") // if fieldA is null replace it with fieldB, include null fieldB in mean
{code}
so on and so forth.

> Parallel SQL Support
> --------------------
>
>                 Key: SOLR-7560
>                 URL: https://issues.apache.org/jira/browse/SOLR-7560
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, search
>            Reporter: Joel Bernstein
>             Fix For: 5.3
>
>         Attachments: SOLR-7560.patch
>
>
> This ticket provides support for executing *Parallel SQL* queries across SolrCloud collections. The SQL engine will be built on top of the Streaming API (SOLR-7082), which provides support for *parallel relational algebra* and *real-time map-reduce*.
> Basic design:
> 1) A new SQLHandler will be added to process SQL requests. The SQL statements will be compiled to live Streaming API objects for parallel execution across SolrCloud worker nodes.
> 2) SolrCloud collections will be abstracted as *Relational Tables*. 
> 3) The Presto SQL parser will be used to parse the SQL statements.
> 4) A JDBC thin client will be added as a Solrj client.
> This ticket will focus on putting the framework in place and providing basic SELECT support and GROUP BY aggregate support.
> Future releases will build on this framework to provide additional SQL features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org