You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Caleb Cushing (Jira)" <ji...@apache.org> on 2021/06/01 14:56:00 UTC

[jira] [Created] (TIKA-3429) Performance problems partially caused by tika eagerly loading configuration

Caleb Cushing created TIKA-3429:
-----------------------------------

             Summary: Performance problems partially caused by tika eagerly loading configuration
                 Key: TIKA-3429
                 URL: https://issues.apache.org/jira/browse/TIKA-3429
             Project: Tika
          Issue Type: New Feature
            Reporter: Caleb Cushing


referencing https://github.com/spring-projects/spring-boot/issues/26709#issuecomment-851953515

{quote}
the tika configuration (eagerly loading a 7K lines XML file)
{quote}

Here's the text of that issue

I'm not sure the problem is spring boot, but I'm having problems finding it. The Jar is currently taking 3 seconds (9 if I live out tiered) to run on my system. Just to error out due to missing options and do nothing.

https://github.com/xenoterracide/brix/tree/8e3d86bcf773e564cc24b51572b0bbd8bb60b73f

{code}
time java -Xverify:none -XX:TieredStopAtLevel=1 -jar modules/app/build/libs/app-0.1.0.jar                                                  # brix -> ccushing/copy-5-1
Missing required parameters: '<language>', '<moduleType>', '<project>'
Usage: <main class> [--repo=<repo>] [--workdir=<workdir>] <language>
                    <moduleType> <project>  [COMMAND]
      <language>            The programming language you're generating code
                              for. Directory under --dir
      <moduleType>          The type of code you're generating e.g controller,
                              also the name of the config file without the
                              extension.
      <project>             The name of the project you're generating code for.
                            The name of the module to be created within the
                              project.
      --repo=<repo>         Repository path from the current working directory.
                              Templates and configs are looked up relative to
                              here. If the config isn't found here, then we
                              will search ~/.config/brix
      --workdir=<workdir>   The working directory you want your destination
                              paths to be relative to. Defaults to current
                              working directory
                              Default:
Commands:
  run
java -Xverify:none -XX:TieredStopAtLevel=1 -jar   3.15s user 0.26s system 142% cpu 2.386 total
{code}

since it's a CLI app lazy init isn't helpful. This is worded like a question (that really would not be suitable for stackoverflow, I hate that SO is the support forum for things now, it's terrible because of the attitude of people that the objective is not to help people, also it's bad at getting answers for harder problems, spring should get a discourse or something again), but I also know I had a tika CLI app in the past that loaded in less than 1s without Tiered, so I'm also concerned it's a spring boot bug. I'm going to connect a profiler later to see what I can find, but I'm not sure that will do it.

{code}
Fedora 33
5.11.16-200.fc33.x86_64
 14:08:34 up 3 days,  2:04,  1 user,  load average: 0.79, 1.10, 1.66
              total        used        free      shared  buff/cache   available
Mem:            15G         11G        1.0G        1.4G        3.0G        2.3G
Swap:           12G        1.5G         10G
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)