You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/02/09 17:33:45 UTC

[GitHub] [solr] risdenk opened a new pull request #621: SOLR-15989: Upgrade to Tika 1.28

risdenk opened a new pull request #621:
URL: https://github.com/apache/solr/pull/621


   https://issues.apache.org/jira/browse/SOLR-15989


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
risdenk commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806088523



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Addressed the jar duplication in e65f2f559b5a8a36c909f34574289e2e6b815dd1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
risdenk commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806345824



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       So I'm on the fence about this with Tika. Since `tika-core` is NOT the only thing needed to do language detection as far as I know. It would be a lot more duplication of jars just to get language detection working. (I'm not even 100% sure that all the tika parsers in the extraction module are right for Tika to do language detection with the langid module). I think leaving it like this is going to be the most expected for now. I do think this needs more thought separately though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #621: SOLR-15989: Upgrade to Tika 1.28.1

Posted by GitBox <gi...@apache.org>.
risdenk commented on pull request #621:
URL: https://github.com/apache/solr/pull/621#issuecomment-1040307583


   I decided to bump this to 1.28.1 that was just released since it addressed a few security issues in Tika dependencies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806291185



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       I don't think the Jar duplication is a big problem for us to make modules non-independent.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Ok; makes sense for now.  Hopefully this is documented for our users.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #621: SOLR-15989: Upgrade to Tika 1.28.1

Posted by GitBox <gi...@apache.org>.
risdenk commented on pull request #621:
URL: https://github.com/apache/solr/pull/621#issuecomment-1044608615


   Thanks @dsmiley I'll merge this after PR https://github.com/apache/solr/pull/553 due to the guava upgrade.
   
   @tballison if you have any comments/thoughts on this please do comment :D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #621: SOLR-15989: Upgrade to Tika 1.28.1

Posted by GitBox <gi...@apache.org>.
risdenk commented on pull request #621:
URL: https://github.com/apache/solr/pull/621#issuecomment-1040307583


   I decided to bump this to 1.28.1 that was just released since it addressed a few security issues in Tika dependencies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
janhoy commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r804049578



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       the langid module uses Tika, it's readme says 
   > The Tika detector depends on Tika Core (which is part of the extraction module)
   
   But it also puts tika-core in its own `lib/`, so this may be old information? 
   Did we duplicate tika-core.jar in 8.x?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
risdenk commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806848496



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       There is an entire `tika-langdetect` dependency - https://search.maven.org/artifact/org.apache.tika/tika-langdetect/1.28.1/bundle or https://mvnrepository.com/artifact/org.apache.tika/tika-langdetect/1.28.1. I played with `langid` years ago and remember having to add jars to get it to actually work. I could add `tika-langdetect` as `runtimeOnly` in the `langid` module, but that adds more jars.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
risdenk commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806070690



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       I checked this and tika-core is not included in the langid lib dir.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       So I looked into this more and it doesn't seem like there is clear documentation on how this worked in Solr 8.x. It looks like there is an implicit assumption that both `extraction` and `langid` would be on the Solr classpath for `langid` to work with Tika. 
   
   Personally it feels way cleaner to me for each `module` to be independent and therefore be included as standalone. This would end up with duplicate jars. I think this is how packages would work though.
   
   I understand the concern about not having duplicate jars in the distribution due to size though. If we are that concerned about distribution size, wouldn't it make more sense to have each contrib as a separate download instead of trying to play tricks with these specific jars?
   
   I played around with `api` vs `implementation` for this change - and it has no effect on the `langid` module lib. The only way to remove the jar from `langid` is to switch to `compileOnly` instead of `implementation` in the `langid` module. I can make that change if that would be the preferred approach.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Addressed the jar duplication in e65f2f559b5a8a36c909f34574289e2e6b815dd1

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       So I'm on the fence about this with Tika. Since `tika-core` is NOT the only thing needed to do language detection as far as I know. It would be a lot more duplication of jars just to get language detection working. (I'm not even 100% sure that all the tika parsers in the extraction module are right for Tika to do language detection with the langid module). I think leaving it like this is going to be the most expected for now. I do think this needs more thought separately though.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       There is an entire `tika-langdetect` dependency - https://search.maven.org/artifact/org.apache.tika/tika-langdetect/1.28.1/bundle or https://mvnrepository.com/artifact/org.apache.tika/tika-langdetect/1.28.1. I played with `langid` years ago and remember having to add jars to get it to actually work. I could add `tika-langdetect` as `runtimeOnly` in the `langid` module, but that adds more jars.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Created https://issues.apache.org/jira/browse/SOLR-16010 to follow up separately on this.

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Thanks to Jan for looking into this in SOLR-16010 - I reverted the `langid` changes since what is on `main` works and only duplicates `tika-core` jar as expected.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28.1

Posted by GitBox <gi...@apache.org>.
risdenk commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806907188



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Thanks to Jan for looking into this in SOLR-16010 - I reverted the `langid` changes since what is on `main` works and only duplicates `tika-core` jar as expected.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
janhoy commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806642830



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       `tika-core` should contain everything needed for language identification. It is 718kb, so not a big deal.
   If we don't care for windows users we could symlink that jar from extraction to langid in the tarball :) 

##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Yea, I think Tika 1.x has the original tika detector in tika-core, while 2.x have moved it out to a separate tika-langdetect. Solr still depends on 1.x. Solr also has a choice of detectors (https://github.com/apache/solr/tree/main/solr/modules/langid/src/java/org/apache/solr/update/processor) which is overlapping with Tika's own (https://github.com/apache/tika/tree/main/tika-langdetect) with our own abstraction. Perhaps it would be better to delegate everything to tika-langdetect and kill our own custom ones. We'd still duplicate tika-core, but our langid module would use tika-langdetect which would not necessarily be needed in extraction module (unless we want to detect language during extraction).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
risdenk commented on pull request #621:
URL: https://github.com/apache/solr/pull/621#issuecomment-1035149757


   So sadly it looks like this needs to wait until after SOLR-15942 / Hadoop 3.3.x upgrade since this bumps Guava. It doesn't show up in the regular tests - just the HDFS specific tests. I'm running through some other checks as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r803850807



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       I'm glad this was reversed (api -> implementation); maybe the comment was wrong/obsolete




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
risdenk commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806070690



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       I checked this and tika-core is not included in the langid lib dir.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28.1

Posted by GitBox <gi...@apache.org>.
janhoy commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806863939



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Yea, I think Tika 1.x has the original tika detector in tika-core, while 2.x have moved it out to a separate tika-langdetect. Solr still depends on 1.x. Solr also has a choice of detectors (https://github.com/apache/solr/tree/main/solr/modules/langid/src/java/org/apache/solr/update/processor) which is overlapping with Tika's own (https://github.com/apache/tika/tree/main/tika-langdetect) with our own abstraction. Perhaps it would be better to delegate everything to tika-langdetect and kill our own custom ones. We'd still duplicate tika-core, but our langid module would use tika-langdetect which would not necessarily be needed in extraction module (unless we want to detect language during extraction).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
risdenk commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806083202



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       So I looked into this more and it doesn't seem like there is clear documentation on how this worked in Solr 8.x. It looks like there is an implicit assumption that both `extraction` and `langid` would be on the Solr classpath for `langid` to work with Tika. 
   
   Personally it feels way cleaner to me for each `module` to be independent and therefore be included as standalone. This would end up with duplicate jars. I think this is how packages would work though.
   
   I understand the concern about not having duplicate jars in the distribution due to size though. If we are that concerned about distribution size, wouldn't it make more sense to have each contrib as a separate download instead of trying to play tricks with these specific jars?
   
   I played around with `api` vs `implementation` for this change - and it has no effect on the `langid` module lib. The only way to remove the jar from `langid` is to switch to `compileOnly` instead of `implementation` in the `langid` module. I can make that change if that would be the preferred approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
risdenk commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r803880300



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Yea I'm not sure about the comment. I looked into it some and not sure where the extraction module would be referenced.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
janhoy commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806642830



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       `tika-core` should contain everything needed for language identification. It is 718kb, so not a big deal.
   If we don't care for windows users we could symlink that jar from extraction to langid in the tarball :) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28.1

Posted by GitBox <gi...@apache.org>.
risdenk commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806858822



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Created https://issues.apache.org/jira/browse/SOLR-16010 to follow up separately on this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk merged pull request #621: SOLR-15989: Upgrade to Tika 1.28.1

Posted by GitBox <gi...@apache.org>.
risdenk merged pull request #621:
URL: https://github.com/apache/solr/pull/621


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806291185



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       I don't think the Jar duplication is a big problem for us to make modules non-independent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a change in pull request #621: SOLR-15989: Upgrade to Tika 1.28

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #621:
URL: https://github.com/apache/solr/pull/621#discussion_r806432166



##########
File path: solr/modules/extraction/build.gradle
##########
@@ -26,55 +26,18 @@ dependencies {
   implementation 'org.apache.lucene:lucene-core'
   implementation 'org.slf4j:slf4j-api'
 
-  // We export tika because other modules depend on it (and its submodules)

Review comment:
       Ok; makes sense for now.  Hopefully this is documented for our users.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org