You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dennis Gove (JIRA)" <ji...@apache.org> on 2015/06/09 13:31:00 UTC
[jira] [Commented] (SOLR-7560) Parallel SQL Support
[ https://issues.apache.org/jira/browse/SOLR-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578750#comment-14578750 ]
Dennis Gove commented on SOLR-7560:
-----------------------------------
Possible expression syntax for the RollupStream
{code}
rollup(
someStream(....),
over="fieldA, fieldB, fieldC",
min(fieldA),
max(fieldA),
min(fieldB),
mean(fieldD),
sum(fieldC)
)
{code}
This would require making the *Metric types Expressible but I think that ends up as a good thing. Would make it real easy to support other options on metrics like excluding outliers, for example find the sum of values within 3 standard deviations from the mean could be
{code}
sum(fieldC, limit=standardDev(3))
{code}
(note, how that particular calculation could be implemented is left as an exercise for the reader, I'm just using it as an example of adding additional options on a relatively simple metric).
Another option example is what to do with null values. For example, in some cases a null should not impact a mean but in others it should. You could express those as
{code}
mean(fieldA, replace(null, 0)) // replace null values with 0 thus leading to an impact on the mean
mean(fieldA, includeNull="true") // nulls are counted in the denominator but nothing added to numerator
mean(fieldA, includeNull="false") // nulls neither counted in denominator nor added to numerator
mean(fieldA, replace(null, fieldB), includeNull="true") // if fieldA is null replace it with fieldB, include null fieldB in mean
{code}
so on and so forth.
> Parallel SQL Support
> --------------------
>
> Key: SOLR-7560
> URL: https://issues.apache.org/jira/browse/SOLR-7560
> Project: Solr
> Issue Type: New Feature
> Components: clients - java, search
> Reporter: Joel Bernstein
> Fix For: 5.3
>
> Attachments: SOLR-7560.patch
>
>
> This ticket provides support for executing *Parallel SQL* queries across SolrCloud collections. The SQL engine will be built on top of the Streaming API (SOLR-7082), which provides support for *parallel relational algebra* and *real-time map-reduce*.
> Basic design:
> 1) A new SQLHandler will be added to process SQL requests. The SQL statements will be compiled to live Streaming API objects for parallel execution across SolrCloud worker nodes.
> 2) SolrCloud collections will be abstracted as *Relational Tables*.
> 3) The Presto SQL parser will be used to parse the SQL statements.
> 4) A JDBC thin client will be added as a Solrj client.
> This ticket will focus on putting the framework in place and providing basic SELECT support and GROUP BY aggregate support.
> Future releases will build on this framework to provide additional SQL features.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org