You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2022/10/07 13:47:00 UTC

[jira] [Created] (TIKA-3872) Improve namespacing in metadata keys

Tim Allison created TIKA-3872:
---------------------------------

             Summary: Improve namespacing in metadata keys
                 Key: TIKA-3872
                 URL: https://issues.apache.org/jira/browse/TIKA-3872
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


I recently did a group by on metadata keys in roughly 1 million files from our regression corpus.  The UTF-8 csvs are available here: https://corpora.tika.apache.org/base/share/metadata-keys-1m-20221006.tgz

My gut feeling is that we should namespace everything.  I don't think we should make any changes in 2.x, but I'm opening this for longer range planning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)