You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by GitBox <gi...@apache.org> on 2021/05/26 16:24:03 UTC

[GitHub] [tika-docker] lewismc commented on a change in pull request #2: set tesseract ocr langauges as docker build args

lewismc commented on a change in pull request #2:
URL: https://github.com/apache/tika-docker/pull/2#discussion_r639888802



##########
File path: README.md
##########
@@ -11,7 +11,11 @@ There is a minimal version, which contains only Apache Tika and it's core depend
 * Italian
 * Spanish.
 
-To install more languages simply update the apt-get command to include the package containing the language you required, or include your own custom packs using an ADD command.
+To install more languages simply use `docker-build.sh` or manually using [docker --build-arg](https://docs.docker.com/engine/reference/commandline/build/#set-build-time-variables---build-arg)
+
+For see with version is supported by tesseract on official package:

Review comment:
       > For see with version is supported by tesseract on official package:
   
   Change to 
   
   > Obtain a list of official Tesseract packages by executing (on Linux):

##########
File path: docker-tool.sh
##########
@@ -58,13 +60,18 @@ test_docker_image() {
 shift $((OPTIND -1))
 subcommand=$1; shift
 version=$1; shift
+tesseract_languages=$1; shift
 
 case "$subcommand" in
   build)
+    build_args="--build-arg TIKA_VERSION=${version}"
+    if [[ ! -z "$tesseract_languages" ]]; then
+      build_args="$build_args --build-arg TESSERACT_LANGUAGES='${tesseract_languages}'"
+    fi
     # Build slim version with minimal dependencies
     docker build -t apache/tika:${version} --build-arg TIKA_VERSION=${version} - < minimal/Dockerfile --no-cache
     # Build full version with OCR, Fonts and GDAL
-    docker build -t apache/tika:${version}-full --build-arg TIKA_VERSION=${version} - < full/Dockerfile --no-cache
+    docker build -t apache/tika:${version}-full ${build_args} - < full/Dockerfile --no-cache

Review comment:
       @mhf-ir this is the same as @dameikle has suggested... 

##########
File path: docker-tool.sh
##########
@@ -21,11 +21,14 @@ while getopts ":h" opt; do
   case ${opt} in
     h )
       echo "Usage:"
-      echo "    docker-tool.sh -h                      Display this help message."
-      echo "    docker-tool.sh build <TIKA_VERSION>    Builds images for <TIKA_VERSION>."
-      echo "    docker-tool.sh test <TIKA_VERSION>     Tests images for <TIKA_VERSION>."
-      echo "    docker-tool.sh publish <TIKA_VERSION>  Publishes images for <TIKA_VERSION> to Docker Hub."
-      echo "    docker-tool.sh latest <TIKA_VERSION>   Tags images for <TIKA_VERSION> as latest on Docker Hub."
+      echo "    docker-tool.sh -h                                              Display this help message."
+      echo "    docker-tool.sh build <TIKA_VERSION> [<TESSERACT_LANGUAGES>]    Builds images for <TIKA_VERSION> via special [<TESSERACT_LANGUAGES>]."
+      echo "    docker-tool.sh test <TIKA_VERSION>                             Tests images for <TIKA_VERSION>."
+      echo "    docker-tool.sh publish <TIKA_VERSION>                          Publishes images for <TIKA_VERSION> to Docker Hub."
+      echo "    docker-tool.sh latest <TIKA_VERSION>                           Tags images for <TIKA_VERSION> as latest on Docker Hub."
+      echo ""
+      echo "Note: [<TESSERACT_LANGUAGES>] is optional for full image,"
+      echo "      for change default tesseract-ocr packages."

Review comment:
       Change 
   
   > ...for change default tesseract-ocr packages.
   
   to
   
   > ...to customize various tesseract-ocr packages. Otherwise the default packages are installed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org