You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by GitBox <gi...@apache.org> on 2021/05/18 16:13:24 UTC

[GitHub] [tez] belugabehr opened a new pull request #125: TEZ-4285: Drop Commons Codec Dependency

belugabehr opened a new pull request #125:
URL: https://github.com/apache/tez/pull/125


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tez] jteagles commented on pull request #125: TEZ-4285: Drop Commons Codec Dependency

Posted by GitBox <gi...@apache.org>.
jteagles commented on pull request #125:
URL: https://github.com/apache/tez/pull/125#issuecomment-896294319


   @belugabehr, In general, it could be a good change. Couple of thing on my mind about this.
   
   I have been thinking about replacing apache base64 due to performance after reading
   http://java-performance.info/base64-encoding-and-decoding-performance/
   
   and seeing native java8 performance.
   
   However, as Hadoop depends directly on commons-codec, thing won't truly rid tez of the transitive dependency, only the direct dependency.
   
   It's really important to switch from apache to java.util.Base64 in the backwards compatible way
   The org.apache.commons.codec.binary.Base64 encoder says it returns an RFC 2045 compliant encoder.
   https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
   
   java.util.Base64.getEncoder() returns an RFC 4648 compliant encoder, which uses the same alphabet an RFC 2045, but without line lengths and line endings.
   https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html
   
   I haven't checked this with code, but they will need to be exactly the same as, the shuffle handler could be running a different version of tez than the job. So they have to be backwards compatible. 
   
   I thought Base64.getMimeEncoder() will give a RFC 2045 compliant encoder in java.util.Base64 (76 line lengths with CRLF endings), but they do not match
   
   ``` java
     @Test
     public void testCodecEquivalence() throws Exception {
       byte[] bytes = new byte[100];
       String enc1 = Base64.getEncoder().encodeToString(bytes);
       String enc2 = new String(org.apache.commons.codec.binary.Base64.encodeBase64(bytes));
       String enc3 = Base64.getMimeEncoder().encodeToString(bytes);
       System.out.println("enc1 = <" + enc1 + ">");
       System.out.println("enc2 = <" + enc2 + ">");
       System.out.println("enc3 = <" + enc3 + ">");
       System.out.println(enc1.equals(enc2));
       System.out.println(enc1.equals(enc3));
     }
   ```
   ```
   true
   false
   ```
   So perhaps line lengths are skipped with apache and makes it RFC 4648 compatible.
   
   
   As for the sha384 bit, that's more complicated. I would rather not add more dependency on guava until we are using a shaded version as that library has not shown very good API stability and causes uses issues. That being said, if you look at how the sha384 is being used, it is very much overkill. In DAPAppMaster.isSameFile, two files are being read fully to compute sha384 and then SHAs are compared to see if they are equal. Same for DAG.verifyLocalResources. Perhaps better would be just to walk both file streams and stop when they differ. It will give an earlier exit for failed case. Sha384 seems just an added complication. What do you think?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tez] hadoop-yetus commented on pull request #125: TEZ-4285: Drop Commons Codec Dependency

Posted by GitBox <gi...@apache.org>.
hadoop-yetus commented on pull request #125:
URL: https://github.com/apache/tez/pull/125#issuecomment-843519523


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |  16m 40s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 1 new or modified test files.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   4m 59s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   8m 35s |  master passed  |
   | +1 :green_heart: |  compile  |   4m 22s |  master passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   3m 53s |  master passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   2m 40s |  master passed  |
   | +1 :green_heart: |  javadoc  |   3m 57s |  master passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 21s |  master passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +0 :ok: |  spotbugs  |   5m 55s |  Used deprecated FindBugs config; considering switching to SpotBugs.  |
   | +0 :ok: |  findbugs  |   5m 51s |  root in master has 2 extant findbugs warnings.  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   5m 22s |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 21s |  the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   4m 21s |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 53s |  the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   3m 53s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   2m 16s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  xml  |   0m  4s |  The patch has no ill-formed XML file.  |
   | +1 :green_heart: |  javadoc  |   3m 51s |  the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 22s |  the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  findbugs  |  10m 23s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m  2s |  tez-api in the patch passed.  |
   | +1 :green_heart: |  unit  |   5m  6s |  tez-runtime-library in the patch passed.  |
   | +1 :green_heart: |  unit  |   4m 15s |  tez-dag in the patch passed.  |
   | -1 :x: |  unit  |  57m 12s |  tez-tests in the patch failed.  |
   | -1 :x: |  unit  |  69m 25s |  root in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   1m 41s |  The patch does not generate ASF License warnings.  |
   |  |   | 234m  3s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-125/1/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/tez/pull/125 |
   | JIRA Issue | TEZ-4285 |
   | Optional Tests | dupname asflicense javac javadoc unit xml compile spotbugs findbugs checkstyle |
   | uname | Linux 11f4c517bf83 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/tez.sh |
   | git revision | master / 01847814d |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | unit | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-125/1/artifact/out/patch-unit-tez-tests.txt |
   | unit | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-125/1/artifact/out/patch-unit-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-125/1/testReport/ |
   | Max. process+thread count | 1210 (vs. ulimit of 5500) |
   | modules | C: tez-api tez-runtime-library tez-dag tez-tests . U: . |
   | Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-125/1/console |
   | versions | git=2.25.1 maven=3.6.3 findbugs=3.0.1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tez] belugabehr commented on pull request #125: TEZ-4285: Drop Commons Codec Dependency

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #125:
URL: https://github.com/apache/tez/pull/125#issuecomment-887625898


   @jteagles Just wanted to get your temperature on this change.  Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@tez.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org