You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/30 16:22:16 UTC

[GitHub] [arrow] nealrichardson commented on a change in pull request #9367: ARROW-7288: [C++][Parquet] Don't use regular expression to parse application version

nealrichardson commented on a change in pull request #9367:
URL: https://github.com/apache/arrow/pull/9367#discussion_r567267561



##########
File path: cpp/cmake_modules/ThirdpartyToolchain.cmake
##########
@@ -854,13 +854,6 @@ else()
   set(THRIFT_REQUIRES_BOOST FALSE)
 endif()
 
-# Parquet requires boost only with gcc 4.8 (because of missing std::regex).
-if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.9")
-  set(PARQUET_REQUIRES_BOOST TRUE)
-else()
-  set(PARQUET_REQUIRES_BOOST FALSE)
-endif()
-

Review comment:
       Should also remove the lines with `BOOST_REGEX_LIBRARY` below, right? Nothing else uses boost::regex in our code. 
   
   We can also remove regex.hpp from cpp/build-support/trim-boost.sh. That should probably be a followup JIRA, but could you edit the comment on L38 of that script to say that we actually don't need it anymore?

##########
File path: cpp/src/parquet/metadata.cc
##########
@@ -949,43 +910,278 @@ ApplicationVersion::ApplicationVersion(std::string application, int major, int m
                                        int patch)
     : application_(std::move(application)), version{major, minor, patch, "", "", ""} {}
 
-ApplicationVersion::ApplicationVersion(const std::string& created_by) {
-  // Use singletons to compile only once (ARROW-9863)
-  static regex app_regex{ApplicationVersion::APPLICATION_FORMAT};
-  static regex ver_regex{ApplicationVersion::VERSION_FORMAT};
-  smatch app_matches;
-  smatch ver_matches;
-
-  std::string created_by_lower = created_by;
-  std::transform(created_by_lower.begin(), created_by_lower.end(),
-                 created_by_lower.begin(), ::tolower);
-
-  bool app_success = regex_match(created_by_lower, app_matches, app_regex);
-  bool ver_success = false;
-  std::string version_str;
-
-  if (app_success && app_matches.size() >= 4) {
-    // first match is the entire string. sub-matches start from second.
-    application_ = app_matches[1];
-    version_str = app_matches[3];
-    build_ = app_matches[4];
-    ver_success = regex_match(version_str, ver_matches, ver_regex);
-  } else {
-    application_ = "unknown";
-  }
-
-  if (ver_success && ver_matches.size() >= 7) {
-    version.major = atoi(ver_matches[1].str().c_str());
-    version.minor = atoi(ver_matches[2].str().c_str());
-    version.patch = atoi(ver_matches[3].str().c_str());
-    version.unknown = ver_matches[4].str();
-    version.pre_release = ver_matches[5].str();
-    version.build_info = ver_matches[6].str();
-  } else {
-    version.major = 0;
-    version.minor = 0;
-    version.patch = 0;
+namespace {
+// Parse the application version format and set parsed values to
+// ApplicationVersion.
+//
+// The application version format:
+//   "${APPLICATION_NAME}"
+//   "${APPLICATION_NAME} version ${VERSION}"
+//   "${APPLICATION_NAME} version ${VERSION} (build ${BUILD_NAME})"
+//
+// Eg:
+//   parquet-cpp
+//   parquet-cpp version 1.5.0ab-xyz5.5.0+cd
+//   parquet-cpp version 1.5.0ab-xyz5.5.0+cd (build abcd)
+//
+// The VERSION format:
+//   "${MAJOR}.${MINOR}.${PATCH}"
+//   "${MAJOR}.${MINOR}.${PATCH}${UNKNOWN}"
+//   "${MAJOR}.${MINOR}.${PATCH}${UNKNOWN}-${PRE_RELEASE}"
+//   "${MAJOR}.${MINOR}.${PATCH}${UNKNOWN}-${PRE_RELEASE}+${BUILD_INFO}"
+//   "${MAJOR}.${MINOR}.${PATCH}${UNKNOWN}+${BUILD_INFO}"
+//   "${MAJOR}.${MINOR}.${PATCH}-${PRE_RELEASE}"
+//   "${MAJOR}.${MINOR}.${PATCH}-${PRE_RELEASE}+${BUILD_INFO}"
+//   "${MAJOR}.${MINOR}.${PATCH}+${BUILD_INFO}"
+//

Review comment:
       Perhaps link to where parquet-mr does versioning, for reference? https://github.com/apache/parquet-mr/blob/master/parquet-common/src/main/java/org/apache/parquet/SemanticVersion.java#L31-L39




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org