You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by GitBox <gi...@apache.org> on 2021/11/06 13:00:37 UTC

[GitHub] [orc] noirello opened a new pull request #959: ORC-1047: [C++] Handle quoted field names during string schema parsing

noirello opened a new pull request #959:
URL: https://github.com/apache/orc/pull/959


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. File a JIRA issue first and use it as a prefix of your PR title, e.g., `ORC-001: Fix ABC`.
     2. Use your PR title to summarize what this PR proposes instead of describing the problem.
     3. Make PR title and description complete because these will be the permanent commit log.
     4. If possible, provide a concise and reproducible example to reproduce the issue for a faster review.
     5. If the PR is unfinished, use GitHub PR Draft feature.
   -->
   
   ### What changes were proposed in this pull request?
   Improve parsing schema string with `Type::buildTypeFromString` to handle quoted field names and have stricter validations.
   
   ### Why are the changes needed?
   The current implementation cannot handle quoted field names and allows parsing string schemas that the Java implementation would reject (e.g. `struct<bigint>`, `map(boolean,float)`). It also cannot parse schema with `timestamp with local time zone` in the root.
   
   ### How was this patch tested?
   Ran the existing test suites locally with the newly added tests for quoted field names and invalid schemas. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] noirello commented on a change in pull request #959: ORC-1047: [C++] Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
noirello commented on a change in pull request #959:
URL: https://github.com/apache/orc/pull/959#discussion_r744532862



##########
File path: c++/src/TypeImpl.cc
##########
@@ -218,7 +227,19 @@ namespace orc {
         if (i != 0) {
           result += ",";
         }
-        result += fieldNames[i];
+        if (isUnquotedFieldName(fieldNames[i])) {
+          result += fieldNames[i];
+        } else {
+          std::string name(fieldNames[i]);
+          size_t pos = 0;
+          while ((pos = name.find("`", pos)) != std::string::npos) {
+            name.replace(pos, 1, "``");

Review comment:
       I'm not sure I understand. Could you give me an example? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] noirello commented on pull request #959: ORC-1047: [C++] Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
noirello commented on pull request #959:
URL: https://github.com/apache/orc/pull/959#issuecomment-962602508


   I had to remove the field name with the non-ascii characters (`èœ`).
   It caused a runtime error:
   
   ![appveyor](https://user-images.githubusercontent.com/615790/140644889-054d0e7a-47a3-4bfa-a208-fe3123ffba02.png)
   
   It looks like, this has been improved in recent msvc versions.
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] noirello commented on pull request #959: ORC-1047: [C++] Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
noirello commented on pull request #959:
URL: https://github.com/apache/orc/pull/959#issuecomment-962508232


   Unfortunately, I have no idea why the new test case stuck during the AppVeyor build. I cannot build the exact version locally, but using a newer MSVC++ (2016) and gtest (1.10.0) works as expected,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on a change in pull request #959: ORC-1047: [C++] Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
wgtmac commented on a change in pull request #959:
URL: https://github.com/apache/orc/pull/959#discussion_r744389708



##########
File path: c++/src/TypeImpl.cc
##########
@@ -218,7 +227,19 @@ namespace orc {
         if (i != 0) {
           result += ",";
         }
-        result += fieldNames[i];
+        if (isUnquotedFieldName(fieldNames[i])) {
+          result += fieldNames[i];
+        } else {
+          std::string name(fieldNames[i]);
+          size_t pos = 0;
+          while ((pos = name.find("`", pos)) != std::string::npos) {
+            name.replace(pos, 1, "``");

Review comment:
       If we call toString() to a type with quote and get a string. Can we call buildTypeFromString() on that string to get back the original type object?

##########
File path: c++/src/TypeImpl.cc
##########
@@ -678,79 +795,60 @@ namespace orc {
     } else if (category == "date") {
       return std::unique_ptr<Type>(new TypeImpl(DATE));

Review comment:
       missing validatePrimitiveType before return




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #959: ORC-1047: [C++] Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #959:
URL: https://github.com/apache/orc/pull/959#issuecomment-965896652


   Please feel free to merge this. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #959: ORC-1047: Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #959:
URL: https://github.com/apache/orc/pull/959#issuecomment-966572351


   Merged to `main` for Apache ORC 1.8.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] wgtmac commented on a change in pull request #959: ORC-1047: [C++] Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
wgtmac commented on a change in pull request #959:
URL: https://github.com/apache/orc/pull/959#discussion_r746301261



##########
File path: c++/src/TypeImpl.cc
##########
@@ -218,7 +227,19 @@ namespace orc {
         if (i != 0) {
           result += ",";
         }
-        result += fieldNames[i];
+        if (isUnquotedFieldName(fieldNames[i])) {
+          result += fieldNames[i];
+        } else {
+          std::string name(fieldNames[i]);
+          size_t pos = 0;
+          while ((pos = name.find("`", pos)) != std::string::npos) {
+            name.replace(pos, 1, "``");

Review comment:
       Sorry for the late reply. I got the answer from your test case (TEST(TestType, quotedFieldNames)) already.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun merged pull request #959: ORC-1047: Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun merged pull request #959:
URL: https://github.com/apache/orc/pull/959


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #959: ORC-1047: [C++] Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #959:
URL: https://github.com/apache/orc/pull/959#issuecomment-962702801


   Thank you for updating, @noirello .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] dongjoon-hyun commented on pull request #959: ORC-1047: [C++] Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #959:
URL: https://github.com/apache/orc/pull/959#issuecomment-962702886


   cc @wgtmac 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [orc] guiyanakuang commented on pull request #959: ORC-1047: [C++] Handle quoted field names during string schema parsing

Posted by GitBox <gi...@apache.org>.
guiyanakuang commented on pull request #959:
URL: https://github.com/apache/orc/pull/959#issuecomment-962540877


   > Unfortunately, I have no idea why the new test case stuck during the AppVeyor build. I cannot build the exact version locally, but using a newer MSVC++ (2016) and gtest (1.10.0) works as expected,
   
   You can log in to https://ci.appveyor.com/ with your github account and select your own fork ORC project so you can do active testing, the default is the main branch, which you can change in the settings
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org