You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by Apo1 <gi...@git.apache.org> on 2015/12/18 10:11:15 UTC

[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

GitHub user Apo1 opened a pull request:

    https://github.com/apache/spark/pull/10374

    [SPARK-12421][SQL] Fixed copy() method of GenericRow.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Apo1/spark myBranch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10374.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10374
    
----
commit e09e898fbb17ad61e31c04b9b1b89d738fe1591f
Author: user1 <us...@localhost.localdomain>
Date:   2015-12-18T08:44:04Z

    Fixed copy() method of GenericRow.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166596638
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166819637
  
    agree with @hvanhovell about the meaning of that comment. However, I don't think we need to change the `copy` method as `Row` is meant to be immutable. `row.toSeq.toArray` seems a hack to get the underlying array and change value of the `row`. We should either add doc to say not do this, or fix the `toSeq` to not leak the backing array.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by Apo1 <gi...@git.apache.org>.
Github user Apo1 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10374#discussion_r48231498
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala ---
    @@ -201,7 +201,7 @@ class GenericRow(protected[sql] val values: Array[Any]) extends Row {
     
       override def toSeq: Seq[Any] = values.toSeq
     
    -  override def copy(): Row = this
    +  override def copy(): Row = new GenericRow(values.clone())
    --- End diff --
    
    I'm not sure what's the best solution. Maybe we need some help from more experienced committers here. I think we are agreed that we need a change here, but we differ how the change should look like. 
    
    Anyway, it's not urgent, because the wrong behave should only affect very little developers/users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-165729536
  
    @Apo1 LGTM but could you make the description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-165753256
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47998/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166596646
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48190/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166731474
  
    The comment in GenericRow explicitly says not to do this.
    
    "A row implementation that uses an array of objects as the underlying storage.  Note that, while
    the array is not copied, and thus could technically be mutated after creation, this is not
    allowed."
    
    If we change the behavior, we should update that too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-165753015
  
    **[Test build #47998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47998/consoleFull)** for PR 10374 at commit [`e09e898`](https://github.com/apache/spark/commit/e09e898fbb17ad61e31c04b9b1b89d738fe1591f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166587818
  
    **[Test build #48190 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48190/consoleFull)** for PR 10374 at commit [`4179fd9`](https://github.com/apache/spark/commit/4179fd9d5393c52a5b8d682b39eb2ca2a661d572).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10374#discussion_r48033415
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala ---
    @@ -201,7 +201,7 @@ class GenericRow(protected[sql] val values: Array[Any]) extends Row {
     
       override def toSeq: Seq[Any] = values.toSeq
     
    -  override def copy(): Row = this
    +  override def copy(): Row = new GenericRow(values.clone())
    --- End diff --
    
    I have been doing some digging. Using an adapted version of your code (only made it work):
    
        import org.apache.spark.sql.Row;
        val df = sqlContext.range(1, 9).toDF
        val firstRow = df.first
        val firstRowCopied = firstRow.copy()  // <-- HERE the row will not be copied.
        val arr = firstRowCopied.toSeq.toArray
        arr(0) =  arr(0).asInstanceOf[Long] * 10
        val newRow = Row.fromSeq(arr)
        val newDf = sqlContext.createDataFrame(sc.parallelize(Seq(firstRow, newRow)), df.schema)
    
    ```newDF.show``` yields the following result:
    
        +---+
        | id|
        +---+
        | 10|
        | 10|
        +---+
    
    Which is wrong. What happens is that the ```firstRowCopied.toSeq``` wraps the value array in a ```ArrayOps``` object, this object returns wrapped backing array (instead of a copy) when the you invoke ```toArray```. This really shouldn't happen, because now you mutable gain access to a structure which is supposed to be immutable. I think we should change the ```toSeq``` method instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166700199
  
    **[Test build #48206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48206/consoleFull)** for PR 10374 at commit [`ef4a636`](https://github.com/apache/spark/commit/ef4a636d6c62aa0db7a769e526c79437ef2133f9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166675586
  
    **[Test build #48206 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48206/consoleFull)** for PR 10374 at commit [`ef4a636`](https://github.com/apache/spark/commit/ef4a636d6c62aa0db7a769e526c79437ef2133f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-165719708
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166700377
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10374#discussion_r48020875
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala ---
    @@ -201,7 +201,7 @@ class GenericRow(protected[sql] val values: Array[Any]) extends Row {
     
       override def toSeq: Seq[Any] = values.toSeq
     
    -  override def copy(): Row = this
    +  override def copy(): Row = new GenericRow(values.clone())
    --- End diff --
    
    You cannot a put value back in the ```GenericRow``` when you have taken it out, unless it is a mutable object. Could you provide an example for this? Which is also a good basis for a unit test.
    
    ```copy``` is merely a contract all rows need to adhere to. If a row is inmutable, why copy it? It also avoids a lot object allocations. I think ```GenericInternalRow``` should also return reference to it self on ```copy```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166556640
  
    @Apo1 Could you add a unit test for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166949939
  
    I would prefer to change the implementation of toSeq.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-165753253
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10374: [SPARK-12421][SQL] Fixed copy() method of Generic...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10374


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10374: [SPARK-12421][SQL] Fixed copy() method of GenericRow.

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/10374
  
    Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166596427
  
    **[Test build #48190 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48190/consoleFull)** for PR 10374 at commit [`4179fd9`](https://github.com/apache/spark/commit/4179fd9d5393c52a5b8d682b39eb2ca2a661d572).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by Apo1 <gi...@git.apache.org>.
Github user Apo1 commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166564092
  
    @hvanhovell Ok, I try. But I'm not sure where's the right place for the unit test. `sql/catalyst/src/test/scala/org/apache/spark/sql/Rowtest` seems to initialize a GenericRow to test some stuff, so I add the test below there, ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166556560
  
    CC: @rxin @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by Apo1 <gi...@git.apache.org>.
Github user Apo1 commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-165742336
  
    @sarutak  Do you mean the description of this pull-request? I added the link to the Jira. Ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166738733
  
    @nongli I think the comment that comment is mainly meant for developers passing an array into the Row constructor; they should not reuse the array. We currently leak a reference to the backing array by accident, which can be a problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by Apo1 <gi...@git.apache.org>.
Github user Apo1 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10374#discussion_r48016931
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala ---
    @@ -201,7 +201,7 @@ class GenericRow(protected[sql] val values: Array[Any]) extends Row {
     
       override def toSeq: Seq[Any] = values.toSeq
     
    -  override def copy(): Row = this
    +  override def copy(): Row = new GenericRow(values.clone())
    --- End diff --
    
    Yes it is immutable, but if you want to extract the values of a GenericRows, change them and create a new row, then it's better to not change the original values. 
    So you would first want to copy the original row. 
    
    Also the method name copy() implies that it returns a real copy and not the reference on the original object.
    
    Other rows like GenericInternalRow implement a correct the copy method:
      override def copy(): InternalRow = new GenericInternalRow(values.clone())



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-165744449
  
    OK, Thanks! Also, it's better if you add a test for change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-165728774
  
    ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by Apo1 <gi...@git.apache.org>.
Github user Apo1 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10374#discussion_r48025796
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala ---
    @@ -201,7 +201,7 @@ class GenericRow(protected[sql] val values: Array[Any]) extends Row {
     
       override def toSeq: Seq[Any] = values.toSeq
     
    -  override def copy(): Row = this
    +  override def copy(): Row = new GenericRow(values.clone())
    --- End diff --
    
    I try to provide a simple example and I could not try the code below right now. Imagine you want to query a row, copy it, change a value and then build a new one. Then you want to build a DF out of the new and the old row. I know that this is a strange example and there are a lot of better method to implement it, but you could implement it this way:
    
        import org.apache.spark.sql.Row;
        val df = sc.parallelize(1,2,3,5,6,7,8,9).toDF.sort
        val firstRow = df.first.copy() // <-- HERE the row will not be copied. 
        val arr = firstRow.toSeq.toArray
        arr(0) =  arr(0) * 10
        val newRow = Row.fromSeq(arr)
        sqlContext.createDataframe(List(firstRow, newRow), df.schema)  // Both row will contain the value 10 and share the same value Sequence
    
    If you don't want that someone uses the copy method, because it does not do what it implies, then i think we should provide another trait/contract which does not include this method. It's just confusing if the copy method does not copy the object. 
    `row.copy()` and just `row` are identical so far! 
    
    I don't see the point, why we should'nt provide this copy method for the users who want to use it.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-166700379
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48206/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10374#discussion_r48015896
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala ---
    @@ -201,7 +201,7 @@ class GenericRow(protected[sql] val values: Array[Any]) extends Row {
     
       override def toSeq: Seq[Any] = values.toSeq
     
    -  override def copy(): Row = this
    +  override def copy(): Row = new GenericRow(values.clone())
    --- End diff --
    
    ```GenericRow``` is supposed to be immutable. So there we really shouldn't need to copy its values. Could you explain why this is needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12421][SQL] Fixed copy() method of Gene...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10374#issuecomment-165733716
  
    **[Test build #47998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47998/consoleFull)** for PR 10374 at commit [`e09e898`](https://github.com/apache/spark/commit/e09e898fbb17ad61e31c04b9b1b89d738fe1591f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org