You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JihongMA <gi...@git.apache.org> on 2015/10/20 01:06:54 UTC
[GitHub] spark pull request: [SPARK-1086] Add range support
GitHub user JihongMA opened a pull request:
https://github.com/apache/spark/pull/9172
[SPARK-1086] Add range support
Adding range support through DeclarativeAggregate API, also prototyped an alternative ImperativeAggregate implementation as well for perf comparison, DeclarativeAggregate perform better with codegen enabled.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JihongMA/spark-1 SPARK-10861
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9172.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9172
----
commit 3e881437d6cdd23fbe5a11f827d41f03d8188c84
Author: JihongMa <li...@gmail.com>
Date: 2015-10-08T20:17:03Z
rebase with master
commit d417aecd123f4bb1ad342293fc113f49f7619c28
Author: JihongMa <li...@gmail.com>
Date: 2015-10-16T05:35:08Z
range support
commit 6ebf951344045744dcf6620d688122db98ff626d
Author: JihongMa <li...@gmail.com>
Date: 2015-10-16T05:36:17Z
rebase with upstream
commit b25cf7dec0d5f0ca758e078f68eb34e0fb73631c
Author: JihongMa <li...@gmail.com>
Date: 2015-10-19T22:28:27Z
add range support
commit 2655fb5a6f980f50e959ea5634db59f9a7d28e76
Author: JihongMa <li...@gmail.com>
Date: 2015-10-19T22:54:45Z
style fix
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10861] Add range support
Posted by JihongMA <gi...@git.apache.org>.
Github user JihongMA closed the pull request at:
https://github.com/apache/spark/pull/9172
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10861] Add range support
Posted by JihongMA <gi...@git.apache.org>.
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-173428238
@yhuai sure, I am closing it now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10861] Add range support
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-149439211
@JihongMA I did a bit research and also talked to @mengxr. I don't think this range function provides enough value (since it is just max - min for users) to justify adding it, because:
1. It is non-obvious what range means from the name.
2. There is already a range function on SQLContext. This one is just semantically confusing, but not incompatible since they are in different namespace.
3. There is already a range function in Python, which is incompatible with the range here.
I'm going to close the JIRA ticket as won't fix. Sorry about it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10861] Add range support
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-171385780
@JihongMA How about we close this PR for now and revisit it later if necessary? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10861] Add range support
Posted by JihongMA <gi...@git.apache.org>.
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-149375416
Range is generally included in Univariate Stats. but Hive doesn't support it as built-in UDF, just checked.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-1086] Add range support
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-149372336
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10861] Add range support
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-149373514
Is this a common function in other databases?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10861] Add range support
Posted by JihongMA <gi...@git.apache.org>.
Github user JihongMA commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-149378691
it is named "range" as part of dispersion measure within Univariate Stats. this is a sub-task under Univariate Stats umbrella JIRA (SPRK-10384)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10861] Add range support
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/9172#issuecomment-149377025
I'm mostly wondering if "range" is the proper name for this. Also seems it's super easy to compute: max(c) - min(c). Does it really justify having all of these code for it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org