You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/17 00:09:49 UTC

[GitHub] [spark] davidrabinowitz opened a new pull request #30071: [SPARK-33172] Adding support for UserDefinedType for Spark SQL Code generator

davidrabinowitz opened a new pull request #30071:
URL: https://github.com/apache/spark/pull/30071


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   Having `CodeGenerator.getValueFromVector()` to correctly treat `UserDefniedType`s as `CodeGenerator.javaType()` does.
   
   ### Why are the changes needed?
   Without it the generated java code would not compile, the error was 
   ```
   rg.codehaus.commons.compiler.CompileException: File 'generated.java', Line 153, Column 126: No applicable constructor/method found for actual parameters "int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
   ```
   The fix makes sure the method call has just one parameter.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   I've added a unit test to verify the proper code is generated: `getStruct(ordinal)`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30071: [SPARK-33172] Adding support for UserDefinedType for Spark SQL Code generator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30071:
URL: https://github.com/apache/spark/pull/30071#issuecomment-710705694


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30071: [SPARK-33172] Adding support for UserDefinedType for Spark SQL Code generator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30071:
URL: https://github.com/apache/spark/pull/30071#issuecomment-710705366


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30071: [SPARK-33172] Adding support for UserDefinedType for Spark SQL Code generator

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30071:
URL: https://github.com/apache/spark/pull/30071#issuecomment-711132812


   @davidrabinowitz:
   - the changes should target `master` branch, and then it's ported back to other branches.
   - `CodeGenerator` is internal. Can you elaborate how we can reproduce the failure you reported in the JIRA? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30071: [SPARK-33172] Adding support for UserDefinedType for Spark SQL Code generator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30071:
URL: https://github.com/apache/spark/pull/30071#issuecomment-710705366


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] davidrabinowitz commented on pull request #30071: [SPARK-33172][SQL] Adding support for UserDefinedType for Spark SQL Code generator

Posted by GitBox <gi...@apache.org>.
davidrabinowitz commented on pull request #30071:
URL: https://github.com/apache/spark/pull/30071#issuecomment-727045215


   Closed in favour of #30372 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] davidrabinowitz commented on pull request #30071: [SPARK-33172][SQL] Adding support for UserDefinedType for Spark SQL Code generator

Posted by GitBox <gi...@apache.org>.
davidrabinowitz commented on pull request #30071:
URL: https://github.com/apache/spark/pull/30071#issuecomment-721355772


   @HyukjinKwon
   
   Should I create another PR aimed at master?
   
   In order to test it first you need to create a table in BigQuery in the following manner:
   ```
   bq load --source_format NEWLINE_DELIMITED_JSON <TABLE> vector_test.data.json vector_test.schema.json
   ```
   The files are:
   
   - vector_test.data.json:
   ```
   {"name":"row1","num":"1","vector":{"type":"1","indices":[],"values":[1,2,3]}}
   {"name":"row2","num":"2","vector":{"type":"1","indices":[],"values":[4,5,6]}}
   {"name":"row3","num":"3","vector":{"type":"1","indices":[],"values":[7,8,9]}}
   ```
   
   - vector_test.schema.json:
   ```
   [
     {
       "mode": "NULLABLE",
       "name": "name",
       "type": "STRING"
     },
     {
       "mode": "NULLABLE",
       "name": "num",
       "type": "INTEGER"
     },
     {
       "description": "{spark.type=vector}",
       "fields": [
         {
           "mode": "NULLABLE",
           "name": "type",
           "type": "INTEGER"
         },
         {
           "mode": "NULLABLE",
           "name": "size",
           "type": "INTEGER"
         },
         {
           "mode": "REPEATED",
           "name": "indices",
           "type": "INTEGER"
         },
         {
           "mode": "REPEATED",
           "name": "values",
           "type": "FLOAT"
         }
       ],
       "mode": "NULLABLE",
       "name": "vector",
       "type": "RECORD"
     }
   ]
   ```
   A GCP account is needed for that, but the amount of data and operation are well in the free tier.
   
   Run `spark-shell  --packages com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.17.3` and enter the following commands:
   ```
   val df = spark.read.format("com.google.cloud.spark.bigquery.v2.BigQueryDataSourceV2").load("<TABLE>")
   df.schema()
   df.show()
   ```
   
   Notice that when the format is changed to `bigquery` another path is used which does not rely on the code generator and hence does not suffer from this issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30071: [SPARK-33172][SQL] Adding support for UserDefinedType for Spark SQL Code generator

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30071:
URL: https://github.com/apache/spark/pull/30071#issuecomment-721471134


   @davidrabinowitz, yes the PR should target the master, and it will be assessed further if it should be ported back or not. cc @gengliangwang FYI since he knows big query connector.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] davidrabinowitz closed pull request #30071: [SPARK-33172][SQL] Adding support for UserDefinedType for Spark SQL Code generator

Posted by GitBox <gi...@apache.org>.
davidrabinowitz closed pull request #30071:
URL: https://github.com/apache/spark/pull/30071


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org