You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "ffacs (via GitHub)" <gi...@apache.org> on 2023/03/03 09:45:34 UTC

[GitHub] [orc] ffacs opened a new pull request, #1427: ORC-1376: [C++] Support schema evolution

ffacs opened a new pull request, #1427:
URL: https://github.com/apache/orc/pull/1427

   Add schema evolution and check PPD security
   
   ### What changes were proposed in this pull request?
   
   This pr adds a class `schema evolution` which checks validity and  PDD security of a type conversion. The convertions would be finished later.
   
   ### Why are the changes needed?
   
   To support schema evolution in c++
   
   
   ### How was this patch tested?
   UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "coderex2522 (via GitHub)" <gi...@apache.org>.
coderex2522 commented on PR #1427:
URL: https://github.com/apache/orc/pull/1427#issuecomment-1467212389

   +1, LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] ffacs commented on a diff in pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "ffacs (via GitHub)" <gi...@apache.org>.
ffacs commented on code in PR #1427:
URL: https://github.com/apache/orc/pull/1427#discussion_r1129403148


##########
c++/include/orc/Type.hh:
##########
@@ -65,6 +67,7 @@ namespace orc {
     virtual std::vector<std::string> getAttributeKeys() const = 0;
     virtual std::string getAttributeValue(const std::string& key) const = 0;
     virtual std::string toString() const = 0;
+    virtual std::optional<const Type*> getTypeByColumnId(uint64_t colId) const = 0;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #1427:
URL: https://github.com/apache/orc/pull/1427#issuecomment-1469027619

   Thank you all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] dongjoon-hyun commented on pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #1427:
URL: https://github.com/apache/orc/pull/1427#issuecomment-1612184977

   Is the JIRA ID correct?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] ffacs commented on pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "ffacs (via GitHub)" <gi...@apache.org>.
ffacs commented on PR #1427:
URL: https://github.com/apache/orc/pull/1427#issuecomment-1459886964

   I`ve uploaded a new patch, please take a look. @wgtmac @coderex2522 @stiga-huang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] ffacs commented on a diff in pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "ffacs (via GitHub)" <gi...@apache.org>.
ffacs commented on code in PR #1427:
URL: https://github.com/apache/orc/pull/1427#discussion_r1133425730


##########
c++/src/SchemaEvolution.hh:
##########
@@ -0,0 +1,45 @@
+#ifndef ORC_SCHEMA_EVOLUTION_HH
+#define ORC_SCHEMA_EVOLUTION_HH
+
+#include <unordered_map>
+#include <unordered_set>
+#include "orc/Type.hh"

Review Comment:
   done



##########
c++/src/SchemaEvolution.cc:
##########
@@ -0,0 +1,259 @@
+#include "SchemaEvolution.hh"
+#include "orc/Exceptions.hh"
+
+#include <unordered_map>

Review Comment:
   donwe



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] wgtmac commented on pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #1427:
URL: https://github.com/apache/orc/pull/1427#issuecomment-1612314803

   > Is the JIRA ID correct?
   
   It seems that there isn't any JIRA for this subtask.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] wgtmac commented on a diff in pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #1427:
URL: https://github.com/apache/orc/pull/1427#discussion_r1127394807


##########
c++/include/orc/Type.hh:
##########
@@ -65,6 +65,7 @@ namespace orc {
     virtual std::vector<std::string> getAttributeKeys() const = 0;
     virtual std::string getAttributeValue(const std::string& key) const = 0;
     virtual std::string toString() const = 0;
+    virtual const Type* getTypeByColumnId(uint64_t colId) const = 0;

Review Comment:
   We have supported C++17 now. It can be replaced by std::optional. Or we need to add a comment to say what will happen if the column does not exist.



##########
c++/include/orc/Type.hh:
##########
@@ -97,6 +98,13 @@ namespace orc {
     static std::unique_ptr<Type> buildTypeFromString(const std::string& input);
   };
 
+  struct EnumClassHash {

Review Comment:
   Can we move it into `.cc` file?



##########
c++/src/SchemaEvolution.cc:
##########
@@ -0,0 +1,233 @@
+#include "SchemaEvolution.hh"
+#include <unordered_map>
+#include <unordered_set>
+#include "orc/Exceptions.hh"
+
+namespace orc {
+
+  SchemaEvolution::SchemaEvolution(const std::shared_ptr<Type>& _readType, const Type* fileType)
+      : readType(_readType) {
+    if (readType) {
+      buildConversion(readType.get(), fileType);
+    } else {
+      for (uint64_t i = 0; i <= fileType->getMaximumColumnId(); ++i) {
+        safePPDConversionMap.insert(i);
+      }
+    }
+  }
+
+  const Type* SchemaEvolution::getReadType(const Type& fileType) const {
+    auto ret = readTypeMap.find(fileType.getColumnId());
+    return ret == readTypeMap.cend() ? &fileType : ret->second;
+  }
+
+  inline void invalidConversion(const Type* readType, const Type* fileType) {
+    throw SchemaEvolutionError("Cannot convert from " + fileType->toString() + " to " +
+                               readType->toString());
+  }
+
+  // map from file type to read type. it does not contain identity mapping.
+  using TypeSet = std::unordered_set<TypeKind, EnumClassHash>;
+  using ConvertMap = std::unordered_map<TypeKind, TypeSet, EnumClassHash>;
+
+  inline bool supportConversion(const Type& readType, const Type& fileType) {
+    static const ConvertMap& SUPPORTED_CONVERSIONS = *new ConvertMap{
+        // support nothing now
+    };
+    auto iter = SUPPORTED_CONVERSIONS.find(fileType.getKind());
+    if (iter == SUPPORTED_CONVERSIONS.cend()) {
+      return false;
+    }
+    return iter->second.find(readType.getKind()) != iter->second.cend();
+  }
+
+  // return value: <valid convert, need convert>

Review Comment:
   Can we use a struct for better readability? 



##########
c++/src/SchemaEvolution.cc:
##########
@@ -0,0 +1,233 @@
+#include "SchemaEvolution.hh"
+#include <unordered_map>
+#include <unordered_set>
+#include "orc/Exceptions.hh"

Review Comment:
   ```suggestion
   #include "SchemaEvolution.hh"
   #include "orc/Exceptions.hh"
   #include <unordered_map>
   #include <unordered_set>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 merged pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "coderex2522 (via GitHub)" <gi...@apache.org>.
coderex2522 merged PR #1427:
URL: https://github.com/apache/orc/pull/1427


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] wgtmac commented on a diff in pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #1427:
URL: https://github.com/apache/orc/pull/1427#discussion_r1129345134


##########
c++/include/orc/Type.hh:
##########
@@ -65,6 +67,7 @@ namespace orc {
     virtual std::vector<std::string> getAttributeKeys() const = 0;
     virtual std::string getAttributeValue(const std::string& key) const = 0;
     virtual std::string toString() const = 0;
+    virtual std::optional<const Type*> getTypeByColumnId(uint64_t colId) const = 0;

Review Comment:
   I didn’t notice that this is a public header. As Apache impala is still on c++14 and ‘optional’ is a c++17 feature, it will break impala. Could you revert this and add a comment for the possible nullptr of return? Sorry about that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] wgtmac commented on pull request #1427: ORC-1376: [C++] Support schema evolution

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #1427:
URL: https://github.com/apache/orc/pull/1427#issuecomment-1453613762

   Thanks @ffacs for contribution! I will take a look next week.
   
   cc @stiga-huang @coderex2522 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [orc] coderex2522 commented on a diff in pull request #1427: ORC-1376: [C++] Scaffolding of schema evolution

Posted by "coderex2522 (via GitHub)" <gi...@apache.org>.
coderex2522 commented on code in PR #1427:
URL: https://github.com/apache/orc/pull/1427#discussion_r1133266823


##########
c++/src/SchemaEvolution.cc:
##########
@@ -0,0 +1,259 @@
+#include "SchemaEvolution.hh"
+#include "orc/Exceptions.hh"
+
+#include <unordered_map>

Review Comment:
   These two header files are already included in the SchemaEvolution.hh. It is a redundant code.



##########
c++/src/SchemaEvolution.hh:
##########
@@ -0,0 +1,45 @@
+#ifndef ORC_SCHEMA_EVOLUTION_HH
+#define ORC_SCHEMA_EVOLUTION_HH
+
+#include <unordered_map>
+#include <unordered_set>
+#include "orc/Type.hh"

Review Comment:
   It is recommended that this line ' #include "orc/Type.hh" ' be moved to precede line ' #include <unordered_map>'.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org