You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by terma <gi...@git.apache.org> on 2017/01/30 23:38:53 UTC

[GitHub] spark pull request #16747: SPARK-16636 Add CalendarIntervalType to documenta...

GitHub user terma opened a pull request:

    https://github.com/apache/spark/pull/16747

    SPARK-16636 Add CalendarIntervalType to documentation

    ## What changes were proposed in this pull request?
    
    Add CalendarIntervalType to SQL Data Types in documentation
    
    ## How was this patch tested?
    
    unit tests
    
    @marmbrus please review

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/terma/spark fix-SPARK-16636

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16747.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16747
    
----
commit 7dde0eea30cf0ec420795e177619c4c18f83f217
Author: terma <ar...@gmail.com>
Date:   2017-01-30T23:34:15Z

    SPARK-16636 Add CalendarIntervalType to documentation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    CC @rxin, if we are going to expose `CalendarInterval` and `CalendarIntervalType` officially, shall we move `CalendarInterval` to the same package as `Decimal`, or create a new class as the external representation of `CalendarIntervalType`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    CC @cloud-fan for https://github.com/apache/spark/pull/13008#r62947902 and @yhuai for https://github.com/apache/spark/pull/8597#r38769233 as they might be what you're referring to?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    It seems there are several ones here and there. Maybe https://github.com/apache/spark/pull/15751#issuecomment-258518577 is related too because it is about supporting reading/writing out that type where it might refer that we can explicitly give the schema with that type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    (FWIW, I am OK but just worried if it might be supposed to be internal type, maybe in the future)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    @terma To avoid confusing the Spark SQL users, we might not document it? How about closing this PR now? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    Actually `CalendarInterval` is already exposed to users, e.g. we can call `collect` on a DataFrame with `CalendarIntervalType` field, and get rows containing `CalendarInterval`. We don't support reading/writing `CalendarIntervalType` though.
    
    So I'm ok to add documents for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    Then, It looks okay to me as describing the current state and I just checked it after building the doc with this, and also
    
    we can already use it as below:
    
    ```scala
    scala> sql("SELECT interval 1 second").schema(0).dataType.getClass
    res0: Class[_ <: org.apache.spark.sql.types.DataType] = class org.apache.spark.sql.types.CalendarIntervalType$
    
    scala> sql("SELECT interval 1 second").collect()(0).get(0).getClass
    res1: Class[_] = class org.apache.spark.unsafe.types.CalendarInterval
    ```
    
    ```scala
    scala> val rdd = spark.sparkContext.parallelize(Seq(Row(new CalendarInterval(0, 0))))
    rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = ParallelCollectionRDD[0] at parallelize at <console>:32
    
    scala> spark.createDataFrame(rdd, StructType(StructField("a", CalendarIntervalType) :: Nil))
    res1: org.apache.spark.sql.DataFrame = [a: calendarinterval]
    ```
    
    Another meta concern is, `org.apache.spark.unsafe.types.CalendarInterval` seems undocumented in both scaladoc/javadoc (entire `unsafe` module). Once we document this as a weak promise for this API, then we might have to keep this for backward compatibility.
    
    Maybe just describe it as SQL dedicated type or not supported for now with some `Note:` rather than describing it?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    @HyukjinKwon is this OK by you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    I am OK but I remember there are some discussions about whether this type should be exposed or not and I could not track down the conclusion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    ^ I want to be very sure if we are not going to expose this or not. Could any SQL committer guy or PMC confirm this?
    
    > `CalendarIntervalType` only for compatibility with similar type in `Hive`.
    
    This means a lot of things for example,
    
    - we might have to hide this
    
    ```scala
    scala> import org.apache.spark.sql.types.CalendarIntervalType
    import org.apache.spark.sql.types.CalendarIntervalType
    ```
    
    -  maybe disallow explicitly giving this type as a schema as it is not exposed.
    
    -  maybe we should not support reading/writing out this type as raised in https://github.com/apache/spark/pull/15751#issuecomment-258518577 
    
    and etc.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16747: SPARK-16636 Add CalendarIntervalType to documentation

Posted by terma <gi...@git.apache.org>.
Github user terma commented on the issue:

    https://github.com/apache/spark/pull/16747
  
    @srowen As I understood ```CalendarIntervalType``` only for compatibility with similar type in ```Hive```. So probably better to mark it as internal and close jira?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16747: SPARK-16636 Add CalendarIntervalType to documenta...

Posted by terma <gi...@git.apache.org>.
Github user terma closed the pull request at:

    https://github.com/apache/spark/pull/16747


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org