You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "siying (via GitHub)" <gi...@apache.org> on 2023/07/18 00:31:49 UTC

[GitHub] [spark] siying opened a new pull request, #42046: [SPARK-40434][SS] Implement applyInPandasWithState in PySpark

siying opened a new pull request, #42046:
URL: https://github.com/apache/spark/pull/42046

   ### What changes were proposed in this pull request?
   Change the serialization format for group-by-with-state outputs: include an explicit hidden column indicating how many data and state records there are.
   
   ### Why are the changes needed?
   The current implementation of ApplyInPandasWithStatePythonRunner cannot deal with outputs where the first column of the row is null, as it cannot distinguish the case where the column is null, or the field is filled as the number of data records are smaller than state records. It causes incorrect results for the former case.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Add unit tests that cover null cases and different other scenarios.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] siying commented on pull request #42046: [SPARK-44464][SS] Implement applyInPandasWithState in PySpark

Posted by "siying (via GitHub)" <gi...@apache.org>.
siying commented on PR #42046:
URL: https://github.com/apache/spark/pull/42046#issuecomment-1642557386

   > @siying There was a conflict. Could you please create a PR against branch-3.4? Thanks in advance!
   > 
   > (Btw, I didn't indicate that title is not accurate. Could you please fix the title when you submit a PR for 3.4?)
   
   Oops. I didn't realize the title was wrong. Perhaps I did a wrong copy&paste.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR closed pull request #42046: [SPARK-44464][SS] Implement applyInPandasWithState in PySpark

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR closed pull request #42046: [SPARK-44464][SS] Implement applyInPandasWithState in PySpark
URL: https://github.com/apache/spark/pull/42046


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] siying commented on pull request #42046: [SPARK-44464][SS] Implement applyInPandasWithState in PySpark

Posted by "siying (via GitHub)" <gi...@apache.org>.
siying commented on PR #42046:
URL: https://github.com/apache/spark/pull/42046#issuecomment-1642555659

   Sure. Will do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #42046: [SPARK-44464][SS] Implement applyInPandasWithState in PySpark

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #42046:
URL: https://github.com/apache/spark/pull/42046#issuecomment-1641074657

   Thanks! Merging to master/3.5/3.4!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #42046: [SPARK-44464][SS] Implement applyInPandasWithState in PySpark

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #42046:
URL: https://github.com/apache/spark/pull/42046#issuecomment-1641236418

   @siying 
   There was a conflict. Could you please create a PR against branch-3.4? Thanks in advance!
   
   (Btw, I didn't indicate that title is not accurate. Could you please fix the title when you submit a PR for 3.4?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org