You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by GitBox <gi...@apache.org> on 2021/05/12 23:17:53 UTC

[GitHub] [tika] kamaci opened a new pull request #441: fix for TIKA-3400 contributed by kamaci

kamaci opened a new pull request #441:
URL: https://github.com/apache/tika/pull/441


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] kamaci commented on a change in pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
kamaci commented on a change in pull request #441:
URL: https://github.com/apache/tika/pull/441#discussion_r633748947



##########
File path: tika-eval/tika-eval-core/src/main/java/org/apache/tika/eval/core/tokens/URLEmailNormalizingFilterFactory.java
##########
@@ -69,11 +69,10 @@ public boolean incrementToken() throws IOException {
                 return false;
             }
             //== is actually substantially faster than .equals(String)
-            if (typeAtt.type() == UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL]) {
+            if (typeAtt.type().equals(UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL])) {

Review comment:
       So, parameter of the `TypeAttribute#setType` can be exactly that String (`UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL]`) ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] tballison merged pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
tballison merged pull request #441:
URL: https://github.com/apache/tika/pull/441


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] tballison commented on a change in pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
tballison commented on a change in pull request #441:
URL: https://github.com/apache/tika/pull/441#discussion_r633736826



##########
File path: tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/FileNode.java
##########
@@ -257,11 +257,11 @@ public void print(OneNoteDocument document, OneNotePtr pointer, int indentLevel)
                     subType.revisionManifest.revisionRole);
 
         }
-        if ((gctxid != ExtendedGUID.nil() ||
+        if ((!gctxid.equals(ExtendedGUID.nil()) ||

Review comment:
       Good catch!  We should probably make a static constant ExtendedGUID.NIL to avoid unnecessary object creation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] tballison commented on a change in pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
tballison commented on a change in pull request #441:
URL: https://github.com/apache/tika/pull/441#discussion_r633735419



##########
File path: tika-eval/tika-eval-core/src/main/java/org/apache/tika/eval/core/tokens/URLEmailNormalizingFilterFactory.java
##########
@@ -69,11 +69,10 @@ public boolean incrementToken() throws IOException {
                 return false;
             }
             //== is actually substantially faster than .equals(String)
-            if (typeAtt.type() == UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL]) {
+            if (typeAtt.type().equals(UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL])) {

Review comment:
       This was done out of a notional sense of efficiency. I'm not sure we need to change it.

##########
File path: tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/FileNode.java
##########
@@ -257,11 +257,11 @@ public void print(OneNoteDocument document, OneNotePtr pointer, int indentLevel)
                     subType.revisionManifest.revisionRole);
 
         }
-        if ((gctxid != ExtendedGUID.nil() ||
+        if ((!gctxid.equals(ExtendedGUID.nil()) ||

Review comment:
       Good catch!  We should probably make a static constant ExtendedGUID.NIL to avoid unnecessary object creation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] kamaci commented on pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
kamaci commented on pull request #441:
URL: https://github.com/apache/tika/pull/441#issuecomment-851075197


   @tballison is there anything left to do for this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] tballison commented on pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
tballison commented on pull request #441:
URL: https://github.com/apache/tika/pull/441#issuecomment-846109800


   I was just able to replicate that in Java 11 on a Mac.  ubuntu w Java 8 passes... Ugh... I pushed a simple fix for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] kamaci commented on a change in pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
kamaci commented on a change in pull request #441:
URL: https://github.com/apache/tika/pull/441#discussion_r633784794



##########
File path: tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/FileNode.java
##########
@@ -257,11 +257,11 @@ public void print(OneNoteDocument document, OneNotePtr pointer, int indentLevel)
                     subType.revisionManifest.revisionRole);
 
         }
-        if ((gctxid != ExtendedGUID.nil() ||
+        if ((!gctxid.equals(ExtendedGUID.nil()) ||

Review comment:
       You are welcome!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] tballison commented on a change in pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
tballison commented on a change in pull request #441:
URL: https://github.com/apache/tika/pull/441#discussion_r633735419



##########
File path: tika-eval/tika-eval-core/src/main/java/org/apache/tika/eval/core/tokens/URLEmailNormalizingFilterFactory.java
##########
@@ -69,11 +69,10 @@ public boolean incrementToken() throws IOException {
                 return false;
             }
             //== is actually substantially faster than .equals(String)
-            if (typeAtt.type() == UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL]) {
+            if (typeAtt.type().equals(UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL])) {

Review comment:
       This was done out of a notional sense of efficiency. I'm not sure we need to change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] tballison commented on a change in pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
tballison commented on a change in pull request #441:
URL: https://github.com/apache/tika/pull/441#discussion_r633762174



##########
File path: tika-eval/tika-eval-core/src/main/java/org/apache/tika/eval/core/tokens/URLEmailNormalizingFilterFactory.java
##########
@@ -69,11 +69,10 @@ public boolean incrementToken() throws IOException {
                 return false;
             }
             //== is actually substantially faster than .equals(String)
-            if (typeAtt.type() == UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL]) {
+            if (typeAtt.type().equals(UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL])) {

Review comment:
       This relies on the Lucene not changing the underlying static strings: https://github.com/apache/lucene/blob/main/lucene/analysis/common/src/java/org/apache/lucene/analysis/email/UAX29URLEmailTokenizer.java#L61

##########
File path: tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/FileNode.java
##########
@@ -257,11 +257,11 @@ public void print(OneNoteDocument document, OneNotePtr pointer, int indentLevel)
                     subType.revisionManifest.revisionRole);
 
         }
-        if ((gctxid != ExtendedGUID.nil() ||
+        if ((!gctxid.equals(ExtendedGUID.nil()) ||

Review comment:
       To be clear, I'm not asking you to do the static thing on this issue.  Your catch is important.  Thank you!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] kamaci commented on a change in pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
kamaci commented on a change in pull request #441:
URL: https://github.com/apache/tika/pull/441#discussion_r633781531



##########
File path: tika-eval/tika-eval-core/src/main/java/org/apache/tika/eval/core/tokens/URLEmailNormalizingFilterFactory.java
##########
@@ -69,11 +69,10 @@ public boolean incrementToken() throws IOException {
                 return false;
             }
             //== is actually substantially faster than .equals(String)
-            if (typeAtt.type() == UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL]) {
+            if (typeAtt.type().equals(UAX29URLEmailTokenizer.TOKEN_TYPES[UAX29URLEmailTokenizer.URL])) {

Review comment:
       OK. I think that they had to use enum instead of a string array for such a thing :blush: I'll rollback that lines at my PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] kamaci commented on pull request #441: fix for TIKA-3400 contributed by kamaci

Posted by GitBox <gi...@apache.org>.
kamaci commented on pull request #441:
URL: https://github.com/apache/tika/pull/441#issuecomment-845736121


   @tballison I've updated the PR. Checks fail due to `MP4ParserTest.java:101`. I don't get same exception at my local environment. Do you have any idea about the reason?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org