You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Keith R. Bennett (JIRA)" <ji...@apache.org> on 2007/10/02 22:32:50 UTC

[jira] Created: (TIKA-41) Resource files occur twice in jar file.

Resource files occur twice in jar file.
---------------------------------------

                 Key: TIKA-41
                 URL: https://issues.apache.org/jira/browse/TIKA-41
             Project: Tika
          Issue Type: Improvement
    Affects Versions: 0.1-incubator
            Reporter: Keith R. Bennett
            Priority: Minor
             Fix For: 0.1-incubator


The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.

For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.

I recommend the following to fix this:

* Move tika-config.xml to src/main/resources/org/apache/tika.
* Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
* Remove the copy directives for the above two from the POM file.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Thilo Goetz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535255 ] 

Thilo Goetz commented on TIKA-41:
---------------------------------

There's a similar issue in Eclipse (I think it has the same root cause).  After doing mvn eclipse:eclipse, I end up with an invalid .classpath file in Tika that gives me this error: Cannot nest output folder 'tika/target/classes/org/apache/tika' inside output folder 'tika/target/classes'.  I can manually fix the classpath, but then, just like for Keith, the test cases don't go through.  I have not investigated further.

I would like to throw in my support for a build environment that supports both maven and the popular IDEs.  We do this in UIMA, and it does cause us some headaches now and then, but for us seamless Eclipse support was non-negotiable.  I understand your reluctance to make compromises for the support of IDEs, but many developers do use them.  The easier it is to set up the development env. in your favorite IDE, the more likely you are to get more contributors.  If it's just a question of maintaining a parallel directory structure in the resources and the target directories, I would consider this a small sacrifice.  Then again, opinions may vary ;-)

And just to be clear, when working in Eclipse (just for example), I don't build with maven.  I do a svn extract from inside eclipse, then I go to the command line and do mvn clean install; mvn eclipse:eclipse.  After that, I expect to be all set to work in Eclipse (unless I create new dependencies of course).

Just something to consider...


> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-41.patch
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531907 ] 

Chris A. Mattmann commented on TIKA-41:
---------------------------------------

Hi Keith,

 I'd like to get away from having to dictate where files go based on their runtime necessity -- I think we can do better than that, and have a cleaner separation of build-time source tree versus runtime jar file needs.

 Let's investigate whether or not there's a way in Maven to prevent it from copying src/main/resources to target/classes, or, alternatively, investigate a way to do a move rather than a copy of target/classes/tika-config.xml and target/classes/mime/tika-mimetypes.xml to target/classes/org/apache/tika and target/classes/org/apache/tika/mime, respectively.

-1 for placing the files in their runtime directory required places.

Cheers,
  Chris


> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532694 ] 

Jukka Zitting commented on TIKA-41:
-----------------------------------

BTW, it's not only new users/developers who benefit from us using the standard conventions, it's also IDEs like Eclipse or Idea that'll have an easier time figuring out the project layout.

> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535262 ] 

Chris A. Mattmann commented on TIKA-41:
---------------------------------------

Hi Thilo:

While I understand both yours and Keith's concerns, what's nagging me is that I too use eclipse to work on code, both on Tika and Nutch. However,when the time to build comes along, I pull out my trusty command line and go from there. I've never had much luck with integrating IDEs and build tools. Never got Mevenide to work properly (really buggy in my environment), and despite the fact that it works most of the time, even have some trouble with ant projects within eclipse.

I've found it's far easier, and doesn't cause one side or the other to sacrifice anything really (besides the "feeling" of building within an IDE) if you use an IDE to edit/code (and take advantage of all the language features, and auto-compilation, etc.). Then when the time comes to make a delivery, or build, or test the software, pull up that trusty command line and build the project using the command line version of the tool which developers outside an IDE environment can also use. It's also important to recognize that if both IDE developers and vi/command line guys use the command line, they will have more reproduceable results. I think it may be harder say, to track down build problems when you're using the integrated build environment (that wraps an external build tool like Maven or Ant). I have had trouble in the past, where it's actually been a problem with the Eclipse/Maven plugin (rather than a problem with the actual POM file), and I've spent way too many hours tracking this down.

I'm not trying to prevent people from using the software or contributing, but to me (as I stated before in a comment on this post), I think we're on a slippery slope here. While I agree that in the short term, and in isolation, this is a small concession to make, that helps out folks who are in the IDE for everything world. I don't want to shut them out. However, I also think it's a bad idea for Tika as a project to make code-level/build-level concessions simply to support technology choices that users make. To me, that's letting technology dictate the (implementation) architecture, which in my experience as a software architect, never leads to a good thing.

My 2 cents,
  Chris


> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-41.patch
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532607 ] 

Jukka Zitting commented on TIKA-41:
-----------------------------------

I don't think this is a Maven issue at all.

The files need to be in org/apache/tika within the jar file, and I don't see why we should have them in some other place within src. To me the cleanest and simplest solution is to have a direct one-to-one mapping between the src tree and the resulting jar file entries. Otherwise you need to dig into the build script to find out where and how the files are being copied or moved around.

> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Keith R. Bennett (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534949 ] 

Keith R. Bennett commented on TIKA-41:
--------------------------------------

Guys -

I just spent a lot of time trying to figure out why I was getting a MalformedURLException when running TestParsers within Intellij Idea.  The reason was that although Idea knows to copy resources to the target directory a la Maven, we had thwarted it by putting tika-config.xml in a different directory and using a copy directive to put it in the right place.

I guess I hadn't experienced this before because I had done a mvn compile/test/install before going into my IDE.

This is exactly the kind of thing I was referring to when I suggested that we follow the Maven convention in this case.  I don't want to be a pain, but the fact that I actually experienced a problem due to this approach is IMHO significant.


> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-41.patch
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534975 ] 

Chris A. Mattmann commented on TIKA-41:
---------------------------------------

I don't think we have a requirement that folks who use Tika must be using IntelliJ, no? I'm also fairly confident that it's probably a preference change in IntelliJ that would solve this problem. Additionally, why is IntelliJ copying resources to the target directory? That's a maven duty, right, seeing as though it's the build facility we've adopted with Tika?

> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-41.patch
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-41) Resource files occur twice in jar file.

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-41:
------------------------------

    Attachment: TIKA-41.patch

Attached a patch (TIKA-41.patch) that uses declarative Maven configuration instead of <copy/> directives to place the resources in the correct location. I guess this should satisfy all the requirements expressed here.

> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-41.patch
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Keith R. Bennett (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532178 ] 

Keith R. Bennett commented on TIKA-41:
--------------------------------------

Chris -

I understand what you're saying.  I think it's a question of balancing interests.

On the one hand, we would like to put the files where they would go logically, according to the way we think of them.

On the other hand, we assess the value of doing things the Maven way; convention over configuration.  One of the strengths of Maven is that you can look at any Maven project and, assuming the conventions are not overridden (as in our case), you can know where to find files (e.g. src/main/java), and where they will go in the jar file.  By putting the files in the directories where Maven can deal with them automatically, we reduce the amount of intervention on our part (special cases in the POM file), reduce the amount of learning required by new readers, and reduce the risk that somewhere along the way, an automated process that assumes Maven directory structures is thwarted.

- Keith




> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532214 ] 

Chris A. Mattmann commented on TIKA-41:
---------------------------------------

I'm not sure I agree with your point Keith: just because Maven made "guesses" as to sensible defaults as to where to place resource files within a jar, that does not mean that's the quote-unquote standard place for things to go and deviation from that ruins the benefits that we receive from using maven. I've seen numerous cases such as this -- in fact it's why frameworks like Maven are extensible in the first place. They recognize that they aren't the Oracle and don't know where everything should go across the board to meet people's needs. That's why we can change it.

I think the benefit of keeping the files within CM within their logically correct place outweights the convenience of not having to understand the small customization of the POM file to get it to put the files where we want them. In addition, the separation of concerns between CM source tree layout and build layout is something that needs to be maintained. It allows both things to evolve independently over time, which is a great benefit.

So, I'm -1 for placing things in org/apache/tika, etc. within the src CM layout and for figuring out a way to add a command or two to the pom.xml file to place the files where we want them, and only 1x.

Cheers,
  Chris


> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532676 ] 

Chris A. Mattmann commented on TIKA-41:
---------------------------------------

> To me the cleanest and simplest solution is to have a direct one-to-one mapping between the src tree and the resulting jar file entries

I am on the complete opposite end of the spectrum with this. If there should always be a 1-to-1 mapping between build and src, then why have jar files in the first place? Why not just compile class files directly into the directories that contain their source code and deliver that as the build time package?

The reason to have decoupling between src and build structures is for independent evolution. It's to apply a filter on the things that exist in the source area, and the things that get delivered as part of the build. As an example of this, what if I wanted to drop an MS Word document containing some diagrams/figures for Tika in src/main/resources, because logically, to me, that's the place where that file should live in src (and subsequently CM). Now, when I go to build Tika, should this MS Word file be placed in the delivered jar file? IMO, the answer is no.

Here we have a somewhat different, but also simliar issue, of config files that need to end up in some build-time location as a run-time dependency within the Tika jar file. Why do we have to mandate within the src tree in CM that this file (which  in some ways is just as much of a resource as that MS Word Tika document) be placed in its build-time location which is namespace delimited, and 3 levels deeper within the already deep enough directory hierarchy?

So, anyways, I agree with one of your points, Jukka. The proposed method of placing those config files within org/apache/tika is definitely the simplest solution: I'm just not sure it's the cleanest.

> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Keith R. Bennett (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532876 ] 

Keith R. Bennett commented on TIKA-41:
--------------------------------------

Chris -

> ...mandating that the build time representation of the tika library 
> (currently a jar file) look 100% the same as the source tree 
> takes us down a slippery slope...

This mandate only applies to the src directory tree; if there are files we want to associate with the project for CM purposes, but we do not need or want to be under Maven's control, then maybe we can create another directory for that?  So under tika, we'd have the Maven style directories (src, target (when built), etc.), and then other directories as we need them?:

tika
--- src
--- foo

I don't think there's anything in Maven that would touch anything in a directory unknown to it, is there?  When it does a clean, it only removes the target directory tree to my knowledge.

Regarding Jukka's point about IDE's, I think he was referring to the IDE itself, not the IDE user.  That is, the IDE can infer from the files' placements in the directory tree what they are and what needs to be done with them, as opposed to it having to figure out a directive in a POM file.

Regards,
Keith


> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532692 ] 

Jukka Zitting commented on TIKA-41:
-----------------------------------

In any case you need some place where you define which resource ends up where in the final jar artifact. And I admit that there's little difference in whether that information is defined in a build script or encoded in the directory hierarchy. However, all other things equal, I think we should go with the conventions as codified by Maven, as that's what the average new user/developer will expect.

If you have a document that shouldn't go into the jar artifact, then by the Maven conventions you'd place it in src/test/resources (if it's needed for testing), src/site/resources (if it should go to the web site), or src/main/javadoc (if it should be a par t of the javadocs). I don't see why we should reinvent the wheel by maintaining our own build rules for such resources.

> So, anyways, I agree with one of your points, Jukka. The proposed method of placing those config files within org/apache/tika
> is definitely the simplest solution: I'm just not sure it's the cleanest.

Fair enough. :-)

I don't feel too strongly on using the Maven conventions, so I won't mind if we do have the resources somewhere else. However, we should in any case fix the issue of having the resources duplicated in the jar artifact.

> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TIKA-41) Resource files occur twice in jar file.

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-41.
-------------------------------

    Resolution: Fixed
      Assignee: Jukka Zitting

Committed the proposed patch in revision 582999.

> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-41.patch
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-41) Resource files occur twice in jar file.

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532802 ] 

Chris A. Mattmann commented on TIKA-41:
---------------------------------------

>If you have a document that shouldn't go into the jar artifact, then by the Maven conventions you'd place it in src/test/resources (if it's needed for testing), src/site/resources (if it should go to the web site), or src/main/javadoc (if it should be a par t of the javadocs). I don't see why we should reinvent the wheel by maintaining our own build rules for such resources. 

What about resources that don't fit into any of these buckets? What about design docs that aren't meant to be published on the website, but should be CM'ed for tracking purposes? Figures? Diagrams? Things such as this that aren't necessarily for the website, for the unit tests, or for the source code to be delivered? Where do these go?

Of course, I'm being fecicious here, as the files we're debating are XML config files that *are* definitely part of the build. My only concern is mandating that the build time representation of the tika library (currently a jar file) look 100% the same as the source tree takes us down a slippery slope. I'll admit that Tika is in its currently nascent stages and its current deliverable is most likely going to be a jar file. However, how do we want to handle things like dependencies? Or packaging up scripts to go along with Tika? I'm not sure it makes sense to put *everything* in a jar file, right? 

Finally with respect to your point about IDEs, I'm not sure I agree that putting someting in:

src/main/resources/org/apache/tika/tika-config.xml
src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

is easier to figure out/understand than:

src/main/resources/tika-config.xml
src/main/resources/mime/tika-mimetypes.xml

which, to me, have a much cleaner structurer, and aren't nested 3 levels deeper?

I may just not be seeing the point here :), so maybe you'll have to englighten me.

> Resource files occur twice in jar file.
> ---------------------------------------
>
>                 Key: TIKA-41
>                 URL: https://issues.apache.org/jira/browse/TIKA-41
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>            Priority: Minor
>             Fix For: 0.1-incubator
>
>
> The Tika and Mime config files occur in two places in the jar file.  This is because they are not stored in our src/test/resources directory tree in the same place that they need to be in the target/classes directory tree, and there is a copy directive in the POM file that copies the files to different directory.
> For example, tika-config.xml is in src/main/resources, but needs to go to target/class/org/apache/tika.  Maven automatically copies the files in src/main/resources to the same location in target/classes, so tika-config.xml is copied to target/classes.  Then, the copy directive in the POM file copies the file to target/classes/org/apache/tika.  So the file is copied twice.
> I recommend the following to fix this:
> * Move tika-config.xml to src/main/resources/org/apache/tika.
> * Move tika-mimetypes.xml to src/main/resources/org/apache/tika/mime.
> * Remove the copy directives for the above two from the POM file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.