You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JihongMA <gi...@git.apache.org> on 2015/10/20 01:06:54 UTC

[GitHub] spark pull request: [SPARK-1086] Add range support

GitHub user JihongMA opened a pull request:

    https://github.com/apache/spark/pull/9172

    [SPARK-1086] Add range support

    Adding range support through DeclarativeAggregate API, also prototyped an alternative ImperativeAggregate implementation as well for perf comparison,  DeclarativeAggregate perform better with codegen enabled. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JihongMA/spark-1 SPARK-10861

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9172.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9172
    
----
commit 3e881437d6cdd23fbe5a11f827d41f03d8188c84
Author: JihongMa <li...@gmail.com>
Date:   2015-10-08T20:17:03Z

    rebase with master

commit d417aecd123f4bb1ad342293fc113f49f7619c28
Author: JihongMa <li...@gmail.com>
Date:   2015-10-16T05:35:08Z

    range support

commit 6ebf951344045744dcf6620d688122db98ff626d
Author: JihongMa <li...@gmail.com>
Date:   2015-10-16T05:36:17Z

    rebase with upstream

commit b25cf7dec0d5f0ca758e078f68eb34e0fb73631c
Author: JihongMa <li...@gmail.com>
Date:   2015-10-19T22:28:27Z

    add range support

commit 2655fb5a6f980f50e959ea5634db59f9a7d28e76
Author: JihongMa <li...@gmail.com>
Date:   2015-10-19T22:54:45Z

    style fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10861] Add range support

Posted by JihongMA <gi...@git.apache.org>.
Github user JihongMA closed the pull request at:

    https://github.com/apache/spark/pull/9172


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10861] Add range support

Posted by JihongMA <gi...@git.apache.org>.
Github user JihongMA commented on the pull request:

    https://github.com/apache/spark/pull/9172#issuecomment-173428238
  
    @yhuai sure, I am closing it now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10861] Add range support

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/9172#issuecomment-149439211
  
    @JihongMA  I did a bit research and also talked to @mengxr. I don't think this range function provides enough value (since it is just max - min for users) to justify adding it, because:
    
    1. It is non-obvious what range means from the name.
    2. There is already a range function on SQLContext. This one is just semantically confusing, but not incompatible since they are in different namespace.
    3. There is already a range function in Python, which is incompatible with the range here.
    
    I'm going to close the JIRA ticket as won't fix. Sorry about it.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10861] Add range support

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9172#issuecomment-171385780
  
    @JihongMA How about we close this PR for now and revisit it later if necessary? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10861] Add range support

Posted by JihongMA <gi...@git.apache.org>.
Github user JihongMA commented on the pull request:

    https://github.com/apache/spark/pull/9172#issuecomment-149375416
  
    Range is generally included in Univariate Stats.   but Hive doesn't support it as built-in UDF, just checked.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1086] Add range support

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9172#issuecomment-149372336
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10861] Add range support

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/9172#issuecomment-149373514
  
    Is this a common function in other databases?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10861] Add range support

Posted by JihongMA <gi...@git.apache.org>.
Github user JihongMA commented on the pull request:

    https://github.com/apache/spark/pull/9172#issuecomment-149378691
  
    it is named "range" as part of dispersion measure within Univariate Stats.  this is a sub-task under Univariate Stats umbrella JIRA (SPRK-10384)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10861] Add range support

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/9172#issuecomment-149377025
  
    I'm mostly wondering if "range" is the proper name for this. Also seems it's super easy to compute: max(c) - min(c). Does it really justify having all of these code for it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org