You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/15 03:32:26 UTC

[GitHub] [spark] mcdull-zhang opened a new pull request, #37440: [SPARK-40076] [SQL] Support number-only column names in ORC data sources when orc impl is hive

mcdull-zhang opened a new pull request, #37440:
URL: https://github.com/apache/spark/pull/37440

   ### What changes were proposed in this pull request?
   This PR aims to support number-only column names in ORC data sources when orc impl is hive.
   In the current master, with ORC datasource, we can write a DataFrame which contains such columns into ORC files.
   ```scala
   spark.sql("SELECT 'a' as `1`, 'b' as `2`, 'c' as `3`").write.orc(path)
   ```
   But reading the ORC files will fail.
   ```tex
   val df = spark.read.orc(path)
   ...
   == SQL ==
   struct<1:string,2:string,3:string>
   -------^^^
   
   	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:265)
   	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:126)
   	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseDataType(ParseDriver.scala:40)
   	at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$readSchema$2.applyOrElse(OrcFileOperator.scala:101)
   ```
   The cause of this is `CatalystSqlParser.parseDataType` fails to parse if a column name (and nested field) consists of only numbers.
   
   
   ### Why are the changes needed?
   For better usability.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Unit Tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mcdull-zhang commented on pull request #37440: [SPARK-36663] [FOLLOWUP] [SQL] Support number-only column names in ORC data sources when orc impl is hive

Posted by GitBox <gi...@apache.org>.
mcdull-zhang commented on PR #37440:
URL: https://github.com/apache/spark/pull/37440#issuecomment-1208945336

   @cloud-fan please take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #37440: [SPARK-36663] [FOLLOWUP] [SQL] Support number-only column names in ORC data sources when orc impl is hive

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #37440:
URL: https://github.com/apache/spark/pull/37440#issuecomment-1208589381

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mcdull-zhang commented on pull request #37440: [SPARK-40076] [SQL] Support number-only column names in ORC data sources when orc impl is hive

Posted by GitBox <gi...@apache.org>.
mcdull-zhang commented on PR #37440:
URL: https://github.com/apache/spark/pull/37440#issuecomment-1214580554

   @dongjoon-hyun   I created a new JIRA, please take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mcdull-zhang closed pull request #37440: [SPARK-40076] [SQL] Support number-only column names in ORC data sources when orc impl is hive

Posted by GitBox <gi...@apache.org>.
mcdull-zhang closed pull request #37440: [SPARK-40076] [SQL] Support number-only column names in ORC data sources when orc impl is hive
URL: https://github.com/apache/spark/pull/37440


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #37440: [SPARK-40076] [SQL] Support number-only column names in ORC data sources when orc impl is hive

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #37440:
URL: https://github.com/apache/spark/pull/37440#issuecomment-1325796396

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #37440: [SPARK-40076] [SQL] Support number-only column names in ORC data sources when orc impl is hive

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #37440: [SPARK-40076] [SQL] Support number-only column names in ORC data sources when orc impl is hive
URL: https://github.com/apache/spark/pull/37440


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org