You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/19 16:57:49 UTC

[GitHub] [spark] nkronenfeld opened a new pull request, #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size

nkronenfeld opened a new pull request, #36613:
URL: https://github.com/apache/spark/pull/36613

   ### What changes were proposed in this pull request?
   
   This PR simply adds typed select methods to Dataset up to the max Tuple size of 22.
   
   This has been bugging me for years, so I finally decided to get off my backside and do something about it :-).
   
   As noted in the JIRA issue, technically, this is a breaking change - indeed, I had to remove an old test that specifically tested that Spark didn't support typed select for tuples larger than 5.  However, it would take someone explicitly relying on select returning a DataFrame instead of a Dataset when using select on large tuples of typed columns (though I guess that test I had to remove exhibits one case where this may happen).
   
   I've set the PR as WIP because I've been unable to run all tests so far - not due to the fix, but rather due to not having things set up correctly on my computer.  Still working on that.
   
   ### Why are the changes needed?
   Arbitrarily supporting only up to 5-tuples is weird and unpredictable.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, select on tuples of all typed columns larger than 5 will now return a Dataset instead of a DataFrame
   
   ### How was this patch tested?
   I've run all sql tests, and they all pass (though testing itself still fails on my machine, I think with a path-too-long error
   I've added a test to make sure the typed select works on all sizes - mostly this is a compile issue, not a run-time issue, but I checked values too, just to double-check that I didn't miss anything (which is a big potential problem with long tuples and copy-paste errors)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #36613:
URL: https://github.com/apache/spark/pull/36613#issuecomment-1254360704

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] nkronenfeld commented on pull request #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size

Posted by GitBox <gi...@apache.org>.
nkronenfeld commented on PR #36613:
URL: https://github.com/apache/spark/pull/36613#issuecomment-1257319840

   also, I don't see a button to re-open it - does anyone know where that is?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size
URL: https://github.com/apache/spark/pull/36613


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] nkronenfeld commented on pull request #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size

Posted by GitBox <gi...@apache.org>.
nkronenfeld commented on PR #36613:
URL: https://github.com/apache/spark/pull/36613#issuecomment-1257319350

   I haven't done anything on the branch because I was waiting for comments - but as far as I know, no one even looked at it.  Am I missing something for it to get considered in the first place?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org