You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by "Mallder, Valerie" <Va...@jhuapl.edu> on 2015/02/04 21:57:01 UTC

RE: FW: Tyler - I may need your help

Hi Tyler, 

I am running back through all of the local changes that I had to make to OODT 0.8 just to make sure I had written them all down. And I noticed that the change to MimeTypeUtils.java is different now than it was on Jan 24 when you sent me the link below.  In the version of the change that I looked at on Jan 24, line 64 had been changed by adding a '/" at the front of the filename on that line. But, now that "/" is gone.  (i.e. "/tika-mimetypes.xml" is now "tika-mimetypes.xml."  Can you verify that removing the "/" was intended?

Thanks,
Val



Valerie A. Mallder
New Horizons Deputy Mission System Engineer
Johns Hopkins University/Applied Physics Laboratory

> -----Original Message-----
> From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> Sent: Saturday, January 24, 2015 11:02 AM
> To: dev
> Subject: Re: FW: Tyler - I may need your help
> 
> Hi Val,
> 
> Please see the update for OODT-805 here
> <https://github.com/apache/oodt/commit/a44c0e871f4db4af645f00533b00de1e91d
> 5bcc3>
> .
> 
> Re: Radix, is MimeTypeUtils used in Radix? I don't see any references to it.
> 
> Tyler
> 
> On Fri, Jan 23, 2015 at 4:02 PM, Mallder, Valerie < Valerie.Mallder@jhuapl.edu>
> wrote:
> 
> > Hi Tyler,
> >
> > Yes, this fix did take care of my problem. Thanks so much!
> >
> > Chris, if you want to make a new OODT 0.8.1, be sure to also include
> > the fix for OODT-805 below in the radix installation.  My system is
> > back up and running now.
> >
> > Thanks,
> >
> > Val
> >
> >
> >
> >
> > Valerie A. Mallder
> > New Horizons Deputy Mission System Engineer Johns Hopkins
> > University/Applied Physics Laboratory
> >
> > > -----Original Message-----
> > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > Sent: Thursday, January 22, 2015 10:35 PM
> > > To: dev
> > > Subject: Re: FW: Tyler - I may need your help
> > >
> > > Hi Val,
> > >
> > > Please see OODT-805 and
> > >
> > https://github.com/apache/oodt/commit/cf1220d4ac66ccefc8e510c62fb6b38c
> > f529f
> > > fb2
> > > for what I believe is the fix.
> > >
> > > Can you make the MimeTypeUtils changes locally or try out trunk?
> > >
> > > Let me know!
> > > Tyler
> > >
> > > On Thu, Jan 22, 2015 at 5:40 PM, Tyler Palsulich
> > > <tp...@gmail.com>
> > > wrote:
> > >
> > > > Hi Val,
> > > >
> > > > Yes, I think you've hit the nail on the head -- if Tika isn't
> > > > passed your updated mimetypes configuration file (with your custom
> > > > types), then those files will not be properly identified. I'll
> > > > look into this issue more tonight and hopefully find a fix. :)
> > > >
> > > > > by default tika only knows about xml files, text files,
> > > > application/octet-stream files.
> > > > I'm not sure what you mean by this? Tika knows about much more
> > > > than that, but is there an OODT config that overrides that?
> > > >
> > > > > I'm a newbie with Java and I can't guarantee I would be able to
> > > > > build a
> > > > JUnit test program very easily. But I will continue to investigate
> > > > and see what I can do.
> > > > No worries! :) If you have time and want to try your hand at it,
> > > > the best way to learn is by looking at the existing tests, like in
> > > > https://github.com/apache/oodt/blob/trunk/metadata/src/test/org/ap
> > > > ache /oodt/cas/metadata/util/TestMimeTypeUtils.java
> > > > .
> > > >
> > > > Have a good night,
> > > > Tyler
> > > >
> > > > On Thu, Jan 22, 2015 at 2:22 PM, Mallder, Valerie <
> > > > Valerie.Mallder@jhuapl.edu> wrote:
> > > >
> > > >> Hi Tyler,
> > > >>
> > > >> Can you tell me more about the tika-mimetypes.xml file? Is this a
> > > >> new 'required' file?  I'm not 100% sure about this yet, but it
> > > >> seems to me that, since MimeTypeUtils.java instantiates Tika with
> > > >> the default constructor, and never explicitly tells Tika which
> > > >> mime-types file to use (even though the correct mime-types.xml
> > > >> file is passed to the MimeTypeUtils constructor from
> > > >> MimeExtractorRepo) there is no place where the contents of my
> > > >> mime-types.xml file is being read and stored in the Tika's
> > > >> MimeTypeRegistry, and by default tika only knows about xml files, text
> files, application/octet-stream files.
> > > >>
> > > >> I will keep looking at this tomorrow and verify which the file
> > > >> that is passed to the Tika's MimeTypesFactory class, but I have
> > > >> to head
> > home now.
> > > >>
> > > >> Val
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Valerie A. Mallder
> > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > >> University/Applied Physics Laboratory
> > > >>
> > > >>
> > > >> -----Original Message-----
> > > >> From: Mallder, Valerie
> > > >> Sent: Thursday, January 22, 2015 11:42 AM
> > > >> To: dev
> > > >> Subject: RE: Tyler - I may need your help
> > > >>
> > > >> Hi Tyler,
> > > >>
> > > >> I have defined a few custom mime types in my
> > > >> filemgr/etc/mime-types.xml file. The contents of my file looks
> > > >> exactly like the contents of
> > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/resou
> > > >> rces /mime-types.xml with the addition of project-specific
> > > >> mime-types .
> > > >> The tika-mimetypes.xml file you pointed me to has ~2000
> > > >> additional lines in it as compared to the
> > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/resou
> > > >> rces
> > > >> /mime-types.xml
> > > >> file and the
> > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/mvn/archetypes/radix/s
> > > >> rc/m
> > > >> ain/resources/archetype-resources/filemgr/src/main/resources/etc/
> > > >> mime -types.xml file. So, it is definitely different than the one
> > > >> I've been using. But, I copied it over and added my mime types to
> > > >> it, and it didn't help.  The mime types it is returning are
> > > >> 'reasonable'
> > > >> mime-types to return, they are just not the mime-types that I
> > > >> defined them as.  For instance, I have *.sfdu files and *.out
> > > >> files that contain binary data, and tika says they are
> > > >> "application/octet-stream" files.  I also have *.ecsv files that
> > > >> contain text, and tika says they are "text/plain" files.
> > > >>
> > > >> But here are the mime-types I defined for these files for my
> > > >> project, and these are the mime-types that have defined
> > > >> extractors for.  None of these filename extensions "*.out,
> > > >> *.ecsv, and *.sfdu" are defined elsewhere in the mime-types.xml file.
> > > >>
> > > >> <mime-type type="product/fei-out">
> > > >>     <glob pattern="*.out"/>
> > > >> </mime-type>
> > > >>
> > > >> <mime-type type="product/fei-ecsv">
> > > >>     <glob pattern="*.ecsv"/>
> > > >> </mime-type>
> > > >>
> > > >> <mime-type type="product/fei-sfdu">
> > > >>      <glob pattern="*.sfdu"/>
> > > >> </mime-type>
> > > >>
> > > >> I'm a newbie with Java and I can't guarantee I would be able to
> > > >> build a JUnit test program very easily. But I will continue to
> > > >> investigate and see what I can do.
> > > >>
> > > >> Thanks!
> > > >>
> > > >> Val
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Valerie A. Mallder
> > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > >> University/Applied Physics Laboratory
> > > >>
> > > >>
> > > >> > -----Original Message-----
> > > >> > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > >> > Sent: Wednesday, January 21, 2015 5:13 PM
> > > >> > To: dev
> > > >> > Subject: Re: Tyler - I may need your help
> > > >> >
> > > >> > Hi Val,
> > > >> >
> > > >> > Hmm... Is there a particular (wrong) mime-type that keeps
> > > >> > getting detected (like text/plain, or something)? I'm curious
> > > >> > if the type is just returning a default. Or, is it a seemingly
> > > >> > random file type? What
> > > >> are the contents of your mime-types.xml file?
> > > >> > If it's different than
> > > >> > https://raw.githubusercontent.com/apache/tika/trunk/tika-
> > > >> > core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
> > > >> > ,
> > > >> > can you try copying it over?
> > > >> >
> > > >> > I'm not sure I'll be able to replicate your error on my
> > > >> > computer without a bit of difficulty. Do you think there is any
> > > >> > way you could create a JUnit test case with the problem?
> > > >> >
> > > >> > Tyler
> > > >> >
> > > >> >
> > > >> > On Wed, Jan 21, 2015 at 1:26 PM, Mallder, Valerie <
> > > >> > Valerie.Mallder@jhuapl.edu>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Tyler,
> > > >> > >
> > > >> > > I'm have been looking into an issue that cropped up in my
> > > >> > > OODT system when I upgraded to OODT 0.8. The issue is, my
> > > >> > > AutoDetectProductCrawler, which is launched from a
> > > >> > > PGETaskInstance is unable to determine the mime-type for my
> > > >> > > product files.  I am using the same
> > > >> > > filemgr/etc/mime-types.xml file that I was using with OODT
> > > >> > > 0.7, and I am using the same
> > > >> > > oodt/extensions/policy/mime-extractor-map.xml file that I was
> > > >> > > using with OODT 0.7, but now, in
> > > >> > > MimeTypeRepo::getExtractorSpecsForFile,
> > > >> > > the call to
> > > >> > > this.mimeRepo.getMimeType(file) is returning the wrong
> > > >> > > mime-types for all of my files, and so the
> > > >> > > AutoDetectProductCrawler is telling me I have no extractor specs for
> my files.
> > > >> > >
> > > >> > > I noticed that you did some work on MimeTypeUtils for
> > > >> > > OODT-630 in OODT 0.8. At first glance, it doesn't' look like
> > > >> > > any of this work would be directly responsible. Can you think
> > > >> > > of anything that might be causing this to happen? I don't
> > > >> > > know anything about tika. Do I need to make any changes to my
> > > >> > > policy files to remain
> > > compatible.
> > > >> > > Just looking for clues on how to resolve this.  I have
> > > >> > > verified by adding log messages throughout the code that,
> > > >> > > prior to launching the AutoDetectProductCrawler, all of the
> > > >> > > policy files
> > are read
> > > correctly.
> > > >> > > The MimeExtractorConfigReader is reading the correct
> > > >> > > mim-extractor-map.xml file, and it is calling setMimeRepoFile
> > > >> > > with the correct mime-types.xml file, and it is setting the
> > > >> > > correct extractor config file, etc. But, once
> > > >> > > AutoDetectProductCrawler starts crawling it try to
> > > >> > > getExtractorSpecsForFile but determines the wrong mime type
> > > >> > > and then
> > > >> > can't find the extractor spec.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Val
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > Valerie A. Mallder
> > > >> > >
> > > >> > > New Horizons Deputy Mission System Engineer The Johns Hopkins
> > > >> > > University/Applied Physics Laboratory
> > > >> > > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > > >> > > 240-228-7846 (Office) 410-504-2233 (Blackberry)
> > > >> > >
> > > >> > >
> > > >>
> > > >
> > > >
> >

Re: FW: Tyler - I may need your help

Posted by Tyler Palsulich <tp...@gmail.com>.
Awesome! Thanks, Val.

On Wed, Feb 4, 2015 at 5:05 PM, Mallder, Valerie <Valerie.Mallder@jhuapl.edu
> wrote:

> Oh absolutely. That's one of the reasons I'm writing all this stuff down :)
>
>
> Valerie A. Mallder
> New Horizons Deputy Mission System Engineer
> Johns Hopkins University/Applied Physics Laboratory
>
>
> > -----Original Message-----
> > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > Sent: Wednesday, February 04, 2015 5:03 PM
> > To: dev
> > Subject: RE: FW: Tyler - I may need your help
> >
> > You're welcome! I'm happy to help. If you get a chance, can you create a
> wiki page
> > with how you set up your deployment?
> >
> > Thanks,
> > Tyler
> > On Feb 4, 2015 4:44 PM, "Mallder, Valerie" <Va...@jhuapl.edu>
> > wrote:
> >
> > > Ok, thanks!
> > >
> > >
> > > Valerie A. Mallder
> > > New Horizons Deputy Mission System Engineer Johns Hopkins
> > > University/Applied Physics Laboratory
> > >
> > >
> > > > -----Original Message-----
> > > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > Sent: Wednesday, February 04, 2015 4:40 PM
> > > > To: dev
> > > > Subject: RE: FW: Tyler - I may need your help
> > > >
> > > > Hi,
> > > >
> > > > Yes, removing the / was intended. Having it broke some of the http
> tests.
> > > >
> > > > Have a good day!
> > > > Tyler
> > > > On Feb 4, 2015 3:58 PM, "Mallder, Valerie"
> > > > <Va...@jhuapl.edu>
> > > > wrote:
> > > >
> > > > > Hi Tyler,
> > > > >
> > > > > I am running back through all of the local changes that I had to
> > > > > make to OODT 0.8 just to make sure I had written them all down.
> > > > > And I noticed that the change to MimeTypeUtils.java is different
> > > > > now than it was on Jan 24 when you sent me the link below.  In the
> > > > > version of the change that I looked at on Jan 24, line 64 had been
> > > > > changed by adding a '/" at the front of the filename on that line.
> > > > > But, now that "/" is
> > > gone.  (i.e.
> > > > > "/tika-mimetypes.xml" is now "tika-mimetypes.xml."  Can you verify
> > > > > that removing the "/" was intended?
> > > > >
> > > > > Thanks,
> > > > > Val
> > > > >
> > > > >
> > > > >
> > > > > Valerie A. Mallder
> > > > > New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > University/Applied Physics Laboratory
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > > > Sent: Saturday, January 24, 2015 11:02 AM
> > > > > > To: dev
> > > > > > Subject: Re: FW: Tyler - I may need your help
> > > > > >
> > > > > > Hi Val,
> > > > > >
> > > > > > Please see the update for OODT-805 here <
> > > > > https://github.com/apache/oodt/commit/a44c0e871f4db4af645f00533b00
> > > > > de1e
> > > > > 91d
> > > > > > 5bcc3>
> > > > > > .
> > > > > >
> > > > > > Re: Radix, is MimeTypeUtils used in Radix? I don't see any
> > > > > > references to
> > > > > it.
> > > > > >
> > > > > > Tyler
> > > > > >
> > > > > > On Fri, Jan 23, 2015 at 4:02 PM, Mallder, Valerie <
> > > > > Valerie.Mallder@jhuapl.edu>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Tyler,
> > > > > > >
> > > > > > > Yes, this fix did take care of my problem. Thanks so much!
> > > > > > >
> > > > > > > Chris, if you want to make a new OODT 0.8.1, be sure to also
> > > > > > > include the fix for OODT-805 below in the radix installation.
> > > > > > > My system is back up and running now.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Val
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Valerie A. Mallder
> > > > > > > New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > > > University/Applied Physics Laboratory
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > > > > > Sent: Thursday, January 22, 2015 10:35 PM
> > > > > > > > To: dev
> > > > > > > > Subject: Re: FW: Tyler - I may need your help
> > > > > > > >
> > > > > > > > Hi Val,
> > > > > > > >
> > > > > > > > Please see OODT-805 and
> > > > > > > >
> > > > > > > https://github.com/apache/oodt/commit/cf1220d4ac66ccefc8e510c6
> > > > > > > 2fb6
> > > > > > > b38c
> > > > > > > f529f
> > > > > > > > fb2
> > > > > > > > for what I believe is the fix.
> > > > > > > >
> > > > > > > > Can you make the MimeTypeUtils changes locally or try out
> trunk?
> > > > > > > >
> > > > > > > > Let me know!
> > > > > > > > Tyler
> > > > > > > >
> > > > > > > > On Thu, Jan 22, 2015 at 5:40 PM, Tyler Palsulich
> > > > > > > > <tp...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Val,
> > > > > > > > >
> > > > > > > > > Yes, I think you've hit the nail on the head -- if Tika
> > > > > > > > > isn't passed your updated mimetypes configuration file
> > > > > > > > > (with your custom types), then those files will not be
> > > > > > > > > properly identified. I'll look into this issue more
> > > > > > > > > tonight and hopefully find a fix. :)
> > > > > > > > >
> > > > > > > > > > by default tika only knows about xml files, text files,
> > > > > > > > > application/octet-stream files.
> > > > > > > > > I'm not sure what you mean by this? Tika knows about much
> > > > > > > > > more than that, but is there an OODT config that overrides
> that?
> > > > > > > > >
> > > > > > > > > > I'm a newbie with Java and I can't guarantee I would be
> > > > > > > > > > able to build a
> > > > > > > > > JUnit test program very easily. But I will continue to
> > > > > > > > > investigate and see what I can do.
> > > > > > > > > No worries! :) If you have time and want to try your hand
> > > > > > > > > at it, the best way to learn is by looking at the existing
> > > > > > > > > tests, like in
> > > > > > > > > https://github.com/apache/oodt/blob/trunk/metadata/src/tes
> > > > > > > > > t/or g/ap ache
> > > > > > > > > /oodt/cas/metadata/util/TestMimeTypeUtils.java
> > > > > > > > > .
> > > > > > > > >
> > > > > > > > > Have a good night,
> > > > > > > > > Tyler
> > > > > > > > >
> > > > > > > > > On Thu, Jan 22, 2015 at 2:22 PM, Mallder, Valerie <
> > > > > > > > > Valerie.Mallder@jhuapl.edu> wrote:
> > > > > > > > >
> > > > > > > > >> Hi Tyler,
> > > > > > > > >>
> > > > > > > > >> Can you tell me more about the tika-mimetypes.xml file?
> > > > > > > > >> Is this a new 'required' file?  I'm not 100% sure about
> > > > > > > > >> this yet, but it seems to me that, since
> > > > > > > > >> MimeTypeUtils.java instantiates Tika with the default
> > > > > > > > >> constructor, and never explicitly tells Tika which
> > > > > > > > >> mime-types file to use (even though the correct
> > > > > > > > >> mime-types.xml file is passed to the MimeTypeUtils
> > > > > > > > >> constructor from
> > > > > > > > >> MimeExtractorRepo) there is no place where the contents
> > > > > > > > >> of my mime-types.xml file is being read and stored in the
> > > > > > > > >> Tika's MimeTypeRegistry, and by default tika only knows
> > > > > > > > >> about xml files,
> > > > > text
> > > > > > files, application/octet-stream files.
> > > > > > > > >>
> > > > > > > > >> I will keep looking at this tomorrow and verify which the
> > > > > > > > >> file that is passed to the Tika's MimeTypesFactory class,
> > > > > > > > >> but I have to head
> > > > > > > home now.
> > > > > > > > >>
> > > > > > > > >> Val
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Valerie A. Mallder
> > > > > > > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > > > > >> University/Applied Physics Laboratory
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> -----Original Message-----
> > > > > > > > >> From: Mallder, Valerie
> > > > > > > > >> Sent: Thursday, January 22, 2015 11:42 AM
> > > > > > > > >> To: dev
> > > > > > > > >> Subject: RE: Tyler - I may need your help
> > > > > > > > >>
> > > > > > > > >> Hi Tyler,
> > > > > > > > >>
> > > > > > > > >> I have defined a few custom mime types in my
> > > > > > > > >> filemgr/etc/mime-types.xml file. The contents of my file
> > > > > > > > >> looks exactly like the contents of
> > > > > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/ma
> > > > > > > > >> in/r esou rces /mime-types.xml with the addition of
> > > > > > > > >> project-specific mime-types .
> > > > > > > > >> The tika-mimetypes.xml file you pointed me to has ~2000
> > > > > > > > >> additional lines in it as compared to the
> > > > > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/ma
> > > > > > > > >> in/r
> > > > > > > > >> esou
> > > > > > > > >> rces
> > > > > > > > >> /mime-types.xml
> > > > > > > > >> file and the
> > > > > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/mvn/archetypes
> > > > > > > > >> /rad
> > > > > > > > >> ix/s
> > > > > > > > >> rc/m
> > > > > > > > >> ain/resources/archetype-resources/filemgr/src/main/resour
> > > > > > > > >> ces/ etc/ mime -types.xml file. So, it is definitely
> > > > > > > > >> different than the one I've been using. But, I copied it
> > > > > > > > >> over and added my mime types to it, and it didn't help.
> > > > > > > > >> The mime types it is returning are 'reasonable'
> > > > > > > > >> mime-types to return, they are just not the mime-types
> > > > > > > > >> that I defined them as.  For instance, I have *.sfdu
> > > > > > > > >> files and *.out files that contain binary data, and tika
> > > > > > > > >> says they are "application/octet-stream" files.  I also
> > > > > > > > >> have *.ecsv files that contain text, and tika says they
> are
> > "text/plain" files.
> > > > > > > > >>
> > > > > > > > >> But here are the mime-types I defined for these files for
> > > > > > > > >> my project, and these are the mime-types that have
> > > > > > > > >> defined extractors for.  None of these filename
> > > > > > > > >> extensions "*.out, *.ecsv, and *.sfdu" are defined
> > > > > > > > >> elsewhere in the mime-types.xml
> > > > > file.
> > > > > > > > >>
> > > > > > > > >> <mime-type type="product/fei-out">
> > > > > > > > >>     <glob pattern="*.out"/> </mime-type>
> > > > > > > > >>
> > > > > > > > >> <mime-type type="product/fei-ecsv">
> > > > > > > > >>     <glob pattern="*.ecsv"/> </mime-type>
> > > > > > > > >>
> > > > > > > > >> <mime-type type="product/fei-sfdu">
> > > > > > > > >>      <glob pattern="*.sfdu"/> </mime-type>
> > > > > > > > >>
> > > > > > > > >> I'm a newbie with Java and I can't guarantee I would be
> > > > > > > > >> able to build a JUnit test program very easily. But I
> > > > > > > > >> will continue to investigate and see what I can do.
> > > > > > > > >>
> > > > > > > > >> Thanks!
> > > > > > > > >>
> > > > > > > > >> Val
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Valerie A. Mallder
> > > > > > > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > > > > >> University/Applied Physics Laboratory
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> > -----Original Message-----
> > > > > > > > >> > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > > > > > >> > Sent: Wednesday, January 21, 2015 5:13 PM
> > > > > > > > >> > To: dev
> > > > > > > > >> > Subject: Re: Tyler - I may need your help
> > > > > > > > >> >
> > > > > > > > >> > Hi Val,
> > > > > > > > >> >
> > > > > > > > >> > Hmm... Is there a particular (wrong) mime-type that
> > > > > > > > >> > keeps getting detected (like text/plain, or something)?
> > > > > > > > >> > I'm curious if the type is just returning a default.
> > > > > > > > >> > Or, is it a seemingly random file type? What
> > > > > > > > >> are the contents of your mime-types.xml file?
> > > > > > > > >> > If it's different than
> > > > > > > > >> > https://raw.githubusercontent.com/apache/tika/trunk/tik
> > > > > > > > >> > a-
> > > > > > > > >> > core/src/main/resources/org/apache/tika/mime/tika-mimet
> > > > > > > > >> > ypes
> > > > > > > > >> > .xml
> > > > > > > > >> > ,
> > > > > > > > >> > can you try copying it over?
> > > > > > > > >> >
> > > > > > > > >> > I'm not sure I'll be able to replicate your error on my
> > > > > > > > >> > computer without a bit of difficulty. Do you think
> > > > > > > > >> > there is any way you could create a JUnit test case
> with the
> > problem?
> > > > > > > > >> >
> > > > > > > > >> > Tyler
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On Wed, Jan 21, 2015 at 1:26 PM, Mallder, Valerie <
> > > > > > > > >> > Valerie.Mallder@jhuapl.edu>
> > > > > > > > >> > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Hi Tyler,
> > > > > > > > >> > >
> > > > > > > > >> > > I'm have been looking into an issue that cropped up
> > > > > > > > >> > > in my OODT system when I upgraded to OODT 0.8. The
> > > > > > > > >> > > issue is, my AutoDetectProductCrawler, which is
> > > > > > > > >> > > launched from a PGETaskInstance is unable to
> > > > > > > > >> > > determine the mime-type for my product files.  I am
> > > > > > > > >> > > using the same filemgr/etc/mime-types.xml file that I
> > > > > > > > >> > > was using with OODT 0.7, and I am using the same
> > > > > > > > >> > > oodt/extensions/policy/mime-extractor-map.xml file
> > > > > > > > >> > > that I was using with OODT 0.7, but now, in
> > > > > > > > >> > > MimeTypeRepo::getExtractorSpecsForFile,
> > > > > > > > >> > > the call to
> > > > > > > > >> > > this.mimeRepo.getMimeType(file) is returning the
> > > > > > > > >> > > wrong mime-types for all of my files, and so the
> > > > > > > > >> > > AutoDetectProductCrawler is telling me I have no
> > > > > > > > >> > > extractor
> > > > > specs for
> > > > > > my files.
> > > > > > > > >> > >
> > > > > > > > >> > > I noticed that you did some work on MimeTypeUtils for
> > > > > > > > >> > > OODT-630 in OODT 0.8. At first glance, it doesn't'
> > > > > > > > >> > > look like any of this work would be directly
> > > > > > > > >> > > responsible. Can you think of anything that might be
> > > > > > > > >> > > causing this to happen? I don't know anything about
> > > > > > > > >> > > tika. Do I need to make any changes to my policy
> > > > > > > > >> > > files to remain
> > > > > > > > compatible.
> > > > > > > > >> > > Just looking for clues on how to resolve this.  I
> > > > > > > > >> > > have verified by adding log messages throughout the
> > > > > > > > >> > > code that, prior to launching the
> > > > > > > > >> > > AutoDetectProductCrawler, all of the policy files
> > > > > > > are read
> > > > > > > > correctly.
> > > > > > > > >> > > The MimeExtractorConfigReader is reading the correct
> > > > > > > > >> > > mim-extractor-map.xml file, and it is calling
> > > > > > > > >> > > setMimeRepoFile with the correct mime-types.xml file,
> > > > > > > > >> > > and it is setting the correct extractor config file,
> etc.
> > > > > > > > >> > > But, once AutoDetectProductCrawler starts crawling it
> > > > > > > > >> > > try to getExtractorSpecsForFile but determines the
> > > > > > > > >> > > wrong mime type and then
> > > > > > > > >> > can't find the extractor spec.
> > > > > > > > >> > >
> > > > > > > > >> > > Thanks,
> > > > > > > > >> > > Val
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > Valerie A. Mallder
> > > > > > > > >> > >
> > > > > > > > >> > > New Horizons Deputy Mission System Engineer The Johns
> > > > > > > > >> > > Hopkins University/Applied Physics Laboratory
> > > > > > > > >> > > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > > > > > > > >> > > 240-228-7846 (Office) 410-504-2233 (Blackberry)
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
>

RE: FW: Tyler - I may need your help

Posted by "Mallder, Valerie" <Va...@jhuapl.edu>.
Oh absolutely. That's one of the reasons I'm writing all this stuff down :)


Valerie A. Mallder
New Horizons Deputy Mission System Engineer
Johns Hopkins University/Applied Physics Laboratory


> -----Original Message-----
> From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> Sent: Wednesday, February 04, 2015 5:03 PM
> To: dev
> Subject: RE: FW: Tyler - I may need your help
> 
> You're welcome! I'm happy to help. If you get a chance, can you create a wiki page
> with how you set up your deployment?
> 
> Thanks,
> Tyler
> On Feb 4, 2015 4:44 PM, "Mallder, Valerie" <Va...@jhuapl.edu>
> wrote:
> 
> > Ok, thanks!
> >
> >
> > Valerie A. Mallder
> > New Horizons Deputy Mission System Engineer Johns Hopkins
> > University/Applied Physics Laboratory
> >
> >
> > > -----Original Message-----
> > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > Sent: Wednesday, February 04, 2015 4:40 PM
> > > To: dev
> > > Subject: RE: FW: Tyler - I may need your help
> > >
> > > Hi,
> > >
> > > Yes, removing the / was intended. Having it broke some of the http tests.
> > >
> > > Have a good day!
> > > Tyler
> > > On Feb 4, 2015 3:58 PM, "Mallder, Valerie"
> > > <Va...@jhuapl.edu>
> > > wrote:
> > >
> > > > Hi Tyler,
> > > >
> > > > I am running back through all of the local changes that I had to
> > > > make to OODT 0.8 just to make sure I had written them all down.
> > > > And I noticed that the change to MimeTypeUtils.java is different
> > > > now than it was on Jan 24 when you sent me the link below.  In the
> > > > version of the change that I looked at on Jan 24, line 64 had been
> > > > changed by adding a '/" at the front of the filename on that line.
> > > > But, now that "/" is
> > gone.  (i.e.
> > > > "/tika-mimetypes.xml" is now "tika-mimetypes.xml."  Can you verify
> > > > that removing the "/" was intended?
> > > >
> > > > Thanks,
> > > > Val
> > > >
> > > >
> > > >
> > > > Valerie A. Mallder
> > > > New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > University/Applied Physics Laboratory
> > > >
> > > > > -----Original Message-----
> > > > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > > Sent: Saturday, January 24, 2015 11:02 AM
> > > > > To: dev
> > > > > Subject: Re: FW: Tyler - I may need your help
> > > > >
> > > > > Hi Val,
> > > > >
> > > > > Please see the update for OODT-805 here <
> > > > https://github.com/apache/oodt/commit/a44c0e871f4db4af645f00533b00
> > > > de1e
> > > > 91d
> > > > > 5bcc3>
> > > > > .
> > > > >
> > > > > Re: Radix, is MimeTypeUtils used in Radix? I don't see any
> > > > > references to
> > > > it.
> > > > >
> > > > > Tyler
> > > > >
> > > > > On Fri, Jan 23, 2015 at 4:02 PM, Mallder, Valerie <
> > > > Valerie.Mallder@jhuapl.edu>
> > > > > wrote:
> > > > >
> > > > > > Hi Tyler,
> > > > > >
> > > > > > Yes, this fix did take care of my problem. Thanks so much!
> > > > > >
> > > > > > Chris, if you want to make a new OODT 0.8.1, be sure to also
> > > > > > include the fix for OODT-805 below in the radix installation.
> > > > > > My system is back up and running now.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Val
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Valerie A. Mallder
> > > > > > New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > > University/Applied Physics Laboratory
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > > > > Sent: Thursday, January 22, 2015 10:35 PM
> > > > > > > To: dev
> > > > > > > Subject: Re: FW: Tyler - I may need your help
> > > > > > >
> > > > > > > Hi Val,
> > > > > > >
> > > > > > > Please see OODT-805 and
> > > > > > >
> > > > > > https://github.com/apache/oodt/commit/cf1220d4ac66ccefc8e510c6
> > > > > > 2fb6
> > > > > > b38c
> > > > > > f529f
> > > > > > > fb2
> > > > > > > for what I believe is the fix.
> > > > > > >
> > > > > > > Can you make the MimeTypeUtils changes locally or try out trunk?
> > > > > > >
> > > > > > > Let me know!
> > > > > > > Tyler
> > > > > > >
> > > > > > > On Thu, Jan 22, 2015 at 5:40 PM, Tyler Palsulich
> > > > > > > <tp...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Val,
> > > > > > > >
> > > > > > > > Yes, I think you've hit the nail on the head -- if Tika
> > > > > > > > isn't passed your updated mimetypes configuration file
> > > > > > > > (with your custom types), then those files will not be
> > > > > > > > properly identified. I'll look into this issue more
> > > > > > > > tonight and hopefully find a fix. :)
> > > > > > > >
> > > > > > > > > by default tika only knows about xml files, text files,
> > > > > > > > application/octet-stream files.
> > > > > > > > I'm not sure what you mean by this? Tika knows about much
> > > > > > > > more than that, but is there an OODT config that overrides that?
> > > > > > > >
> > > > > > > > > I'm a newbie with Java and I can't guarantee I would be
> > > > > > > > > able to build a
> > > > > > > > JUnit test program very easily. But I will continue to
> > > > > > > > investigate and see what I can do.
> > > > > > > > No worries! :) If you have time and want to try your hand
> > > > > > > > at it, the best way to learn is by looking at the existing
> > > > > > > > tests, like in
> > > > > > > > https://github.com/apache/oodt/blob/trunk/metadata/src/tes
> > > > > > > > t/or g/ap ache
> > > > > > > > /oodt/cas/metadata/util/TestMimeTypeUtils.java
> > > > > > > > .
> > > > > > > >
> > > > > > > > Have a good night,
> > > > > > > > Tyler
> > > > > > > >
> > > > > > > > On Thu, Jan 22, 2015 at 2:22 PM, Mallder, Valerie <
> > > > > > > > Valerie.Mallder@jhuapl.edu> wrote:
> > > > > > > >
> > > > > > > >> Hi Tyler,
> > > > > > > >>
> > > > > > > >> Can you tell me more about the tika-mimetypes.xml file?
> > > > > > > >> Is this a new 'required' file?  I'm not 100% sure about
> > > > > > > >> this yet, but it seems to me that, since
> > > > > > > >> MimeTypeUtils.java instantiates Tika with the default
> > > > > > > >> constructor, and never explicitly tells Tika which
> > > > > > > >> mime-types file to use (even though the correct
> > > > > > > >> mime-types.xml file is passed to the MimeTypeUtils
> > > > > > > >> constructor from
> > > > > > > >> MimeExtractorRepo) there is no place where the contents
> > > > > > > >> of my mime-types.xml file is being read and stored in the
> > > > > > > >> Tika's MimeTypeRegistry, and by default tika only knows
> > > > > > > >> about xml files,
> > > > text
> > > > > files, application/octet-stream files.
> > > > > > > >>
> > > > > > > >> I will keep looking at this tomorrow and verify which the
> > > > > > > >> file that is passed to the Tika's MimeTypesFactory class,
> > > > > > > >> but I have to head
> > > > > > home now.
> > > > > > > >>
> > > > > > > >> Val
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Valerie A. Mallder
> > > > > > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > > > >> University/Applied Physics Laboratory
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> -----Original Message-----
> > > > > > > >> From: Mallder, Valerie
> > > > > > > >> Sent: Thursday, January 22, 2015 11:42 AM
> > > > > > > >> To: dev
> > > > > > > >> Subject: RE: Tyler - I may need your help
> > > > > > > >>
> > > > > > > >> Hi Tyler,
> > > > > > > >>
> > > > > > > >> I have defined a few custom mime types in my
> > > > > > > >> filemgr/etc/mime-types.xml file. The contents of my file
> > > > > > > >> looks exactly like the contents of
> > > > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/ma
> > > > > > > >> in/r esou rces /mime-types.xml with the addition of
> > > > > > > >> project-specific mime-types .
> > > > > > > >> The tika-mimetypes.xml file you pointed me to has ~2000
> > > > > > > >> additional lines in it as compared to the
> > > > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/ma
> > > > > > > >> in/r
> > > > > > > >> esou
> > > > > > > >> rces
> > > > > > > >> /mime-types.xml
> > > > > > > >> file and the
> > > > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/mvn/archetypes
> > > > > > > >> /rad
> > > > > > > >> ix/s
> > > > > > > >> rc/m
> > > > > > > >> ain/resources/archetype-resources/filemgr/src/main/resour
> > > > > > > >> ces/ etc/ mime -types.xml file. So, it is definitely
> > > > > > > >> different than the one I've been using. But, I copied it
> > > > > > > >> over and added my mime types to it, and it didn't help.
> > > > > > > >> The mime types it is returning are 'reasonable'
> > > > > > > >> mime-types to return, they are just not the mime-types
> > > > > > > >> that I defined them as.  For instance, I have *.sfdu
> > > > > > > >> files and *.out files that contain binary data, and tika
> > > > > > > >> says they are "application/octet-stream" files.  I also
> > > > > > > >> have *.ecsv files that contain text, and tika says they are
> "text/plain" files.
> > > > > > > >>
> > > > > > > >> But here are the mime-types I defined for these files for
> > > > > > > >> my project, and these are the mime-types that have
> > > > > > > >> defined extractors for.  None of these filename
> > > > > > > >> extensions "*.out, *.ecsv, and *.sfdu" are defined
> > > > > > > >> elsewhere in the mime-types.xml
> > > > file.
> > > > > > > >>
> > > > > > > >> <mime-type type="product/fei-out">
> > > > > > > >>     <glob pattern="*.out"/> </mime-type>
> > > > > > > >>
> > > > > > > >> <mime-type type="product/fei-ecsv">
> > > > > > > >>     <glob pattern="*.ecsv"/> </mime-type>
> > > > > > > >>
> > > > > > > >> <mime-type type="product/fei-sfdu">
> > > > > > > >>      <glob pattern="*.sfdu"/> </mime-type>
> > > > > > > >>
> > > > > > > >> I'm a newbie with Java and I can't guarantee I would be
> > > > > > > >> able to build a JUnit test program very easily. But I
> > > > > > > >> will continue to investigate and see what I can do.
> > > > > > > >>
> > > > > > > >> Thanks!
> > > > > > > >>
> > > > > > > >> Val
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Valerie A. Mallder
> > > > > > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > > > >> University/Applied Physics Laboratory
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> > -----Original Message-----
> > > > > > > >> > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > > > > >> > Sent: Wednesday, January 21, 2015 5:13 PM
> > > > > > > >> > To: dev
> > > > > > > >> > Subject: Re: Tyler - I may need your help
> > > > > > > >> >
> > > > > > > >> > Hi Val,
> > > > > > > >> >
> > > > > > > >> > Hmm... Is there a particular (wrong) mime-type that
> > > > > > > >> > keeps getting detected (like text/plain, or something)?
> > > > > > > >> > I'm curious if the type is just returning a default.
> > > > > > > >> > Or, is it a seemingly random file type? What
> > > > > > > >> are the contents of your mime-types.xml file?
> > > > > > > >> > If it's different than
> > > > > > > >> > https://raw.githubusercontent.com/apache/tika/trunk/tik
> > > > > > > >> > a-
> > > > > > > >> > core/src/main/resources/org/apache/tika/mime/tika-mimet
> > > > > > > >> > ypes
> > > > > > > >> > .xml
> > > > > > > >> > ,
> > > > > > > >> > can you try copying it over?
> > > > > > > >> >
> > > > > > > >> > I'm not sure I'll be able to replicate your error on my
> > > > > > > >> > computer without a bit of difficulty. Do you think
> > > > > > > >> > there is any way you could create a JUnit test case with the
> problem?
> > > > > > > >> >
> > > > > > > >> > Tyler
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Wed, Jan 21, 2015 at 1:26 PM, Mallder, Valerie <
> > > > > > > >> > Valerie.Mallder@jhuapl.edu>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hi Tyler,
> > > > > > > >> > >
> > > > > > > >> > > I'm have been looking into an issue that cropped up
> > > > > > > >> > > in my OODT system when I upgraded to OODT 0.8. The
> > > > > > > >> > > issue is, my AutoDetectProductCrawler, which is
> > > > > > > >> > > launched from a PGETaskInstance is unable to
> > > > > > > >> > > determine the mime-type for my product files.  I am
> > > > > > > >> > > using the same filemgr/etc/mime-types.xml file that I
> > > > > > > >> > > was using with OODT 0.7, and I am using the same
> > > > > > > >> > > oodt/extensions/policy/mime-extractor-map.xml file
> > > > > > > >> > > that I was using with OODT 0.7, but now, in
> > > > > > > >> > > MimeTypeRepo::getExtractorSpecsForFile,
> > > > > > > >> > > the call to
> > > > > > > >> > > this.mimeRepo.getMimeType(file) is returning the
> > > > > > > >> > > wrong mime-types for all of my files, and so the
> > > > > > > >> > > AutoDetectProductCrawler is telling me I have no
> > > > > > > >> > > extractor
> > > > specs for
> > > > > my files.
> > > > > > > >> > >
> > > > > > > >> > > I noticed that you did some work on MimeTypeUtils for
> > > > > > > >> > > OODT-630 in OODT 0.8. At first glance, it doesn't'
> > > > > > > >> > > look like any of this work would be directly
> > > > > > > >> > > responsible. Can you think of anything that might be
> > > > > > > >> > > causing this to happen? I don't know anything about
> > > > > > > >> > > tika. Do I need to make any changes to my policy
> > > > > > > >> > > files to remain
> > > > > > > compatible.
> > > > > > > >> > > Just looking for clues on how to resolve this.  I
> > > > > > > >> > > have verified by adding log messages throughout the
> > > > > > > >> > > code that, prior to launching the
> > > > > > > >> > > AutoDetectProductCrawler, all of the policy files
> > > > > > are read
> > > > > > > correctly.
> > > > > > > >> > > The MimeExtractorConfigReader is reading the correct
> > > > > > > >> > > mim-extractor-map.xml file, and it is calling
> > > > > > > >> > > setMimeRepoFile with the correct mime-types.xml file,
> > > > > > > >> > > and it is setting the correct extractor config file, etc.
> > > > > > > >> > > But, once AutoDetectProductCrawler starts crawling it
> > > > > > > >> > > try to getExtractorSpecsForFile but determines the
> > > > > > > >> > > wrong mime type and then
> > > > > > > >> > can't find the extractor spec.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Val
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > Valerie A. Mallder
> > > > > > > >> > >
> > > > > > > >> > > New Horizons Deputy Mission System Engineer The Johns
> > > > > > > >> > > Hopkins University/Applied Physics Laboratory
> > > > > > > >> > > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > > > > > > >> > > 240-228-7846 (Office) 410-504-2233 (Blackberry)
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >

RE: FW: Tyler - I may need your help

Posted by Tyler Palsulich <tp...@gmail.com>.
You're welcome! I'm happy to help. If you get a chance, can you create a
wiki page with how you set up your deployment?

Thanks,
Tyler
On Feb 4, 2015 4:44 PM, "Mallder, Valerie" <Va...@jhuapl.edu>
wrote:

> Ok, thanks!
>
>
> Valerie A. Mallder
> New Horizons Deputy Mission System Engineer
> Johns Hopkins University/Applied Physics Laboratory
>
>
> > -----Original Message-----
> > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > Sent: Wednesday, February 04, 2015 4:40 PM
> > To: dev
> > Subject: RE: FW: Tyler - I may need your help
> >
> > Hi,
> >
> > Yes, removing the / was intended. Having it broke some of the http tests.
> >
> > Have a good day!
> > Tyler
> > On Feb 4, 2015 3:58 PM, "Mallder, Valerie" <Va...@jhuapl.edu>
> > wrote:
> >
> > > Hi Tyler,
> > >
> > > I am running back through all of the local changes that I had to make
> > > to OODT 0.8 just to make sure I had written them all down. And I
> > > noticed that the change to MimeTypeUtils.java is different now than it
> > > was on Jan 24 when you sent me the link below.  In the version of the
> > > change that I looked at on Jan 24, line 64 had been changed by adding
> > > a '/" at the front of the filename on that line. But, now that "/" is
> gone.  (i.e.
> > > "/tika-mimetypes.xml" is now "tika-mimetypes.xml."  Can you verify
> > > that removing the "/" was intended?
> > >
> > > Thanks,
> > > Val
> > >
> > >
> > >
> > > Valerie A. Mallder
> > > New Horizons Deputy Mission System Engineer Johns Hopkins
> > > University/Applied Physics Laboratory
> > >
> > > > -----Original Message-----
> > > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > Sent: Saturday, January 24, 2015 11:02 AM
> > > > To: dev
> > > > Subject: Re: FW: Tyler - I may need your help
> > > >
> > > > Hi Val,
> > > >
> > > > Please see the update for OODT-805 here <
> > > https://github.com/apache/oodt/commit/a44c0e871f4db4af645f00533b00de1e
> > > 91d
> > > > 5bcc3>
> > > > .
> > > >
> > > > Re: Radix, is MimeTypeUtils used in Radix? I don't see any
> > > > references to
> > > it.
> > > >
> > > > Tyler
> > > >
> > > > On Fri, Jan 23, 2015 at 4:02 PM, Mallder, Valerie <
> > > Valerie.Mallder@jhuapl.edu>
> > > > wrote:
> > > >
> > > > > Hi Tyler,
> > > > >
> > > > > Yes, this fix did take care of my problem. Thanks so much!
> > > > >
> > > > > Chris, if you want to make a new OODT 0.8.1, be sure to also
> > > > > include the fix for OODT-805 below in the radix installation.  My
> > > > > system is back up and running now.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Val
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Valerie A. Mallder
> > > > > New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > University/Applied Physics Laboratory
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > > > Sent: Thursday, January 22, 2015 10:35 PM
> > > > > > To: dev
> > > > > > Subject: Re: FW: Tyler - I may need your help
> > > > > >
> > > > > > Hi Val,
> > > > > >
> > > > > > Please see OODT-805 and
> > > > > >
> > > > > https://github.com/apache/oodt/commit/cf1220d4ac66ccefc8e510c62fb6
> > > > > b38c
> > > > > f529f
> > > > > > fb2
> > > > > > for what I believe is the fix.
> > > > > >
> > > > > > Can you make the MimeTypeUtils changes locally or try out trunk?
> > > > > >
> > > > > > Let me know!
> > > > > > Tyler
> > > > > >
> > > > > > On Thu, Jan 22, 2015 at 5:40 PM, Tyler Palsulich
> > > > > > <tp...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Val,
> > > > > > >
> > > > > > > Yes, I think you've hit the nail on the head -- if Tika isn't
> > > > > > > passed your updated mimetypes configuration file (with your
> > > > > > > custom types), then those files will not be properly
> > > > > > > identified. I'll look into this issue more tonight and
> > > > > > > hopefully find a fix. :)
> > > > > > >
> > > > > > > > by default tika only knows about xml files, text files,
> > > > > > > application/octet-stream files.
> > > > > > > I'm not sure what you mean by this? Tika knows about much more
> > > > > > > than that, but is there an OODT config that overrides that?
> > > > > > >
> > > > > > > > I'm a newbie with Java and I can't guarantee I would be able
> > > > > > > > to build a
> > > > > > > JUnit test program very easily. But I will continue to
> > > > > > > investigate and see what I can do.
> > > > > > > No worries! :) If you have time and want to try your hand at
> > > > > > > it, the best way to learn is by looking at the existing tests,
> > > > > > > like in
> > > > > > > https://github.com/apache/oodt/blob/trunk/metadata/src/test/or
> > > > > > > g/ap ache /oodt/cas/metadata/util/TestMimeTypeUtils.java
> > > > > > > .
> > > > > > >
> > > > > > > Have a good night,
> > > > > > > Tyler
> > > > > > >
> > > > > > > On Thu, Jan 22, 2015 at 2:22 PM, Mallder, Valerie <
> > > > > > > Valerie.Mallder@jhuapl.edu> wrote:
> > > > > > >
> > > > > > >> Hi Tyler,
> > > > > > >>
> > > > > > >> Can you tell me more about the tika-mimetypes.xml file? Is
> > > > > > >> this a new 'required' file?  I'm not 100% sure about this
> > > > > > >> yet, but it seems to me that, since MimeTypeUtils.java
> > > > > > >> instantiates Tika with the default constructor, and never
> > > > > > >> explicitly tells Tika which mime-types file to use (even
> > > > > > >> though the correct mime-types.xml file is passed to the
> > > > > > >> MimeTypeUtils constructor from
> > > > > > >> MimeExtractorRepo) there is no place where the contents of my
> > > > > > >> mime-types.xml file is being read and stored in the Tika's
> > > > > > >> MimeTypeRegistry, and by default tika only knows about xml
> > > > > > >> files,
> > > text
> > > > files, application/octet-stream files.
> > > > > > >>
> > > > > > >> I will keep looking at this tomorrow and verify which the
> > > > > > >> file that is passed to the Tika's MimeTypesFactory class, but
> > > > > > >> I have to head
> > > > > home now.
> > > > > > >>
> > > > > > >> Val
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> Valerie A. Mallder
> > > > > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > > >> University/Applied Physics Laboratory
> > > > > > >>
> > > > > > >>
> > > > > > >> -----Original Message-----
> > > > > > >> From: Mallder, Valerie
> > > > > > >> Sent: Thursday, January 22, 2015 11:42 AM
> > > > > > >> To: dev
> > > > > > >> Subject: RE: Tyler - I may need your help
> > > > > > >>
> > > > > > >> Hi Tyler,
> > > > > > >>
> > > > > > >> I have defined a few custom mime types in my
> > > > > > >> filemgr/etc/mime-types.xml file. The contents of my file
> > > > > > >> looks exactly like the contents of
> > > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/r
> > > > > > >> esou rces /mime-types.xml with the addition of
> > > > > > >> project-specific mime-types .
> > > > > > >> The tika-mimetypes.xml file you pointed me to has ~2000
> > > > > > >> additional lines in it as compared to the
> > > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/r
> > > > > > >> esou
> > > > > > >> rces
> > > > > > >> /mime-types.xml
> > > > > > >> file and the
> > > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/mvn/archetypes/rad
> > > > > > >> ix/s
> > > > > > >> rc/m
> > > > > > >> ain/resources/archetype-resources/filemgr/src/main/resources/
> > > > > > >> etc/ mime -types.xml file. So, it is definitely different
> > > > > > >> than the one I've been using. But, I copied it over and added
> > > > > > >> my mime types to it, and it didn't help.  The mime types it
> > > > > > >> is returning are 'reasonable'
> > > > > > >> mime-types to return, they are just not the mime-types that I
> > > > > > >> defined them as.  For instance, I have *.sfdu files and *.out
> > > > > > >> files that contain binary data, and tika says they are
> > > > > > >> "application/octet-stream" files.  I also have *.ecsv files
> > > > > > >> that contain text, and tika says they are "text/plain" files.
> > > > > > >>
> > > > > > >> But here are the mime-types I defined for these files for my
> > > > > > >> project, and these are the mime-types that have defined
> > > > > > >> extractors for.  None of these filename extensions "*.out,
> > > > > > >> *.ecsv, and *.sfdu" are defined elsewhere in the
> > > > > > >> mime-types.xml
> > > file.
> > > > > > >>
> > > > > > >> <mime-type type="product/fei-out">
> > > > > > >>     <glob pattern="*.out"/>
> > > > > > >> </mime-type>
> > > > > > >>
> > > > > > >> <mime-type type="product/fei-ecsv">
> > > > > > >>     <glob pattern="*.ecsv"/>
> > > > > > >> </mime-type>
> > > > > > >>
> > > > > > >> <mime-type type="product/fei-sfdu">
> > > > > > >>      <glob pattern="*.sfdu"/> </mime-type>
> > > > > > >>
> > > > > > >> I'm a newbie with Java and I can't guarantee I would be able
> > > > > > >> to build a JUnit test program very easily. But I will
> > > > > > >> continue to investigate and see what I can do.
> > > > > > >>
> > > > > > >> Thanks!
> > > > > > >>
> > > > > > >> Val
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> Valerie A. Mallder
> > > > > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > > >> University/Applied Physics Laboratory
> > > > > > >>
> > > > > > >>
> > > > > > >> > -----Original Message-----
> > > > > > >> > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > > > >> > Sent: Wednesday, January 21, 2015 5:13 PM
> > > > > > >> > To: dev
> > > > > > >> > Subject: Re: Tyler - I may need your help
> > > > > > >> >
> > > > > > >> > Hi Val,
> > > > > > >> >
> > > > > > >> > Hmm... Is there a particular (wrong) mime-type that keeps
> > > > > > >> > getting detected (like text/plain, or something)? I'm
> > > > > > >> > curious if the type is just returning a default. Or, is it
> > > > > > >> > a seemingly random file type? What
> > > > > > >> are the contents of your mime-types.xml file?
> > > > > > >> > If it's different than
> > > > > > >> > https://raw.githubusercontent.com/apache/tika/trunk/tika-
> > > > > > >> > core/src/main/resources/org/apache/tika/mime/tika-mimetypes
> > > > > > >> > .xml
> > > > > > >> > ,
> > > > > > >> > can you try copying it over?
> > > > > > >> >
> > > > > > >> > I'm not sure I'll be able to replicate your error on my
> > > > > > >> > computer without a bit of difficulty. Do you think there is
> > > > > > >> > any way you could create a JUnit test case with the problem?
> > > > > > >> >
> > > > > > >> > Tyler
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Wed, Jan 21, 2015 at 1:26 PM, Mallder, Valerie <
> > > > > > >> > Valerie.Mallder@jhuapl.edu>
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > Hi Tyler,
> > > > > > >> > >
> > > > > > >> > > I'm have been looking into an issue that cropped up in my
> > > > > > >> > > OODT system when I upgraded to OODT 0.8. The issue is, my
> > > > > > >> > > AutoDetectProductCrawler, which is launched from a
> > > > > > >> > > PGETaskInstance is unable to determine the mime-type for
> > > > > > >> > > my product files.  I am using the same
> > > > > > >> > > filemgr/etc/mime-types.xml file that I was using with
> > > > > > >> > > OODT 0.7, and I am using the same
> > > > > > >> > > oodt/extensions/policy/mime-extractor-map.xml file that I
> > > > > > >> > > was using with OODT 0.7, but now, in
> > > > > > >> > > MimeTypeRepo::getExtractorSpecsForFile,
> > > > > > >> > > the call to
> > > > > > >> > > this.mimeRepo.getMimeType(file) is returning the wrong
> > > > > > >> > > mime-types for all of my files, and so the
> > > > > > >> > > AutoDetectProductCrawler is telling me I have no
> > > > > > >> > > extractor
> > > specs for
> > > > my files.
> > > > > > >> > >
> > > > > > >> > > I noticed that you did some work on MimeTypeUtils for
> > > > > > >> > > OODT-630 in OODT 0.8. At first glance, it doesn't' look
> > > > > > >> > > like any of this work would be directly responsible. Can
> > > > > > >> > > you think of anything that might be causing this to
> > > > > > >> > > happen? I don't know anything about tika. Do I need to
> > > > > > >> > > make any changes to my policy files to remain
> > > > > > compatible.
> > > > > > >> > > Just looking for clues on how to resolve this.  I have
> > > > > > >> > > verified by adding log messages throughout the code that,
> > > > > > >> > > prior to launching the AutoDetectProductCrawler, all of
> > > > > > >> > > the policy files
> > > > > are read
> > > > > > correctly.
> > > > > > >> > > The MimeExtractorConfigReader is reading the correct
> > > > > > >> > > mim-extractor-map.xml file, and it is calling
> > > > > > >> > > setMimeRepoFile with the correct mime-types.xml file, and
> > > > > > >> > > it is setting the correct extractor config file, etc.
> > > > > > >> > > But, once AutoDetectProductCrawler starts crawling it try
> > > > > > >> > > to getExtractorSpecsForFile but determines the wrong mime
> > > > > > >> > > type and then
> > > > > > >> > can't find the extractor spec.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Val
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > Valerie A. Mallder
> > > > > > >> > >
> > > > > > >> > > New Horizons Deputy Mission System Engineer The Johns
> > > > > > >> > > Hopkins University/Applied Physics Laboratory
> > > > > > >> > > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > > > > > >> > > 240-228-7846 (Office) 410-504-2233 (Blackberry)
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > >
> > >
>

RE: FW: Tyler - I may need your help

Posted by "Mallder, Valerie" <Va...@jhuapl.edu>.
Ok, thanks!


Valerie A. Mallder
New Horizons Deputy Mission System Engineer
Johns Hopkins University/Applied Physics Laboratory


> -----Original Message-----
> From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> Sent: Wednesday, February 04, 2015 4:40 PM
> To: dev
> Subject: RE: FW: Tyler - I may need your help
> 
> Hi,
> 
> Yes, removing the / was intended. Having it broke some of the http tests.
> 
> Have a good day!
> Tyler
> On Feb 4, 2015 3:58 PM, "Mallder, Valerie" <Va...@jhuapl.edu>
> wrote:
> 
> > Hi Tyler,
> >
> > I am running back through all of the local changes that I had to make
> > to OODT 0.8 just to make sure I had written them all down. And I
> > noticed that the change to MimeTypeUtils.java is different now than it
> > was on Jan 24 when you sent me the link below.  In the version of the
> > change that I looked at on Jan 24, line 64 had been changed by adding
> > a '/" at the front of the filename on that line. But, now that "/" is gone.  (i.e.
> > "/tika-mimetypes.xml" is now "tika-mimetypes.xml."  Can you verify
> > that removing the "/" was intended?
> >
> > Thanks,
> > Val
> >
> >
> >
> > Valerie A. Mallder
> > New Horizons Deputy Mission System Engineer Johns Hopkins
> > University/Applied Physics Laboratory
> >
> > > -----Original Message-----
> > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > Sent: Saturday, January 24, 2015 11:02 AM
> > > To: dev
> > > Subject: Re: FW: Tyler - I may need your help
> > >
> > > Hi Val,
> > >
> > > Please see the update for OODT-805 here <
> > https://github.com/apache/oodt/commit/a44c0e871f4db4af645f00533b00de1e
> > 91d
> > > 5bcc3>
> > > .
> > >
> > > Re: Radix, is MimeTypeUtils used in Radix? I don't see any
> > > references to
> > it.
> > >
> > > Tyler
> > >
> > > On Fri, Jan 23, 2015 at 4:02 PM, Mallder, Valerie <
> > Valerie.Mallder@jhuapl.edu>
> > > wrote:
> > >
> > > > Hi Tyler,
> > > >
> > > > Yes, this fix did take care of my problem. Thanks so much!
> > > >
> > > > Chris, if you want to make a new OODT 0.8.1, be sure to also
> > > > include the fix for OODT-805 below in the radix installation.  My
> > > > system is back up and running now.
> > > >
> > > > Thanks,
> > > >
> > > > Val
> > > >
> > > >
> > > >
> > > >
> > > > Valerie A. Mallder
> > > > New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > University/Applied Physics Laboratory
> > > >
> > > > > -----Original Message-----
> > > > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > > Sent: Thursday, January 22, 2015 10:35 PM
> > > > > To: dev
> > > > > Subject: Re: FW: Tyler - I may need your help
> > > > >
> > > > > Hi Val,
> > > > >
> > > > > Please see OODT-805 and
> > > > >
> > > > https://github.com/apache/oodt/commit/cf1220d4ac66ccefc8e510c62fb6
> > > > b38c
> > > > f529f
> > > > > fb2
> > > > > for what I believe is the fix.
> > > > >
> > > > > Can you make the MimeTypeUtils changes locally or try out trunk?
> > > > >
> > > > > Let me know!
> > > > > Tyler
> > > > >
> > > > > On Thu, Jan 22, 2015 at 5:40 PM, Tyler Palsulich
> > > > > <tp...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Val,
> > > > > >
> > > > > > Yes, I think you've hit the nail on the head -- if Tika isn't
> > > > > > passed your updated mimetypes configuration file (with your
> > > > > > custom types), then those files will not be properly
> > > > > > identified. I'll look into this issue more tonight and
> > > > > > hopefully find a fix. :)
> > > > > >
> > > > > > > by default tika only knows about xml files, text files,
> > > > > > application/octet-stream files.
> > > > > > I'm not sure what you mean by this? Tika knows about much more
> > > > > > than that, but is there an OODT config that overrides that?
> > > > > >
> > > > > > > I'm a newbie with Java and I can't guarantee I would be able
> > > > > > > to build a
> > > > > > JUnit test program very easily. But I will continue to
> > > > > > investigate and see what I can do.
> > > > > > No worries! :) If you have time and want to try your hand at
> > > > > > it, the best way to learn is by looking at the existing tests,
> > > > > > like in
> > > > > > https://github.com/apache/oodt/blob/trunk/metadata/src/test/or
> > > > > > g/ap ache /oodt/cas/metadata/util/TestMimeTypeUtils.java
> > > > > > .
> > > > > >
> > > > > > Have a good night,
> > > > > > Tyler
> > > > > >
> > > > > > On Thu, Jan 22, 2015 at 2:22 PM, Mallder, Valerie <
> > > > > > Valerie.Mallder@jhuapl.edu> wrote:
> > > > > >
> > > > > >> Hi Tyler,
> > > > > >>
> > > > > >> Can you tell me more about the tika-mimetypes.xml file? Is
> > > > > >> this a new 'required' file?  I'm not 100% sure about this
> > > > > >> yet, but it seems to me that, since MimeTypeUtils.java
> > > > > >> instantiates Tika with the default constructor, and never
> > > > > >> explicitly tells Tika which mime-types file to use (even
> > > > > >> though the correct mime-types.xml file is passed to the
> > > > > >> MimeTypeUtils constructor from
> > > > > >> MimeExtractorRepo) there is no place where the contents of my
> > > > > >> mime-types.xml file is being read and stored in the Tika's
> > > > > >> MimeTypeRegistry, and by default tika only knows about xml
> > > > > >> files,
> > text
> > > files, application/octet-stream files.
> > > > > >>
> > > > > >> I will keep looking at this tomorrow and verify which the
> > > > > >> file that is passed to the Tika's MimeTypesFactory class, but
> > > > > >> I have to head
> > > > home now.
> > > > > >>
> > > > > >> Val
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Valerie A. Mallder
> > > > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > >> University/Applied Physics Laboratory
> > > > > >>
> > > > > >>
> > > > > >> -----Original Message-----
> > > > > >> From: Mallder, Valerie
> > > > > >> Sent: Thursday, January 22, 2015 11:42 AM
> > > > > >> To: dev
> > > > > >> Subject: RE: Tyler - I may need your help
> > > > > >>
> > > > > >> Hi Tyler,
> > > > > >>
> > > > > >> I have defined a few custom mime types in my
> > > > > >> filemgr/etc/mime-types.xml file. The contents of my file
> > > > > >> looks exactly like the contents of
> > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/r
> > > > > >> esou rces /mime-types.xml with the addition of
> > > > > >> project-specific mime-types .
> > > > > >> The tika-mimetypes.xml file you pointed me to has ~2000
> > > > > >> additional lines in it as compared to the
> > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/r
> > > > > >> esou
> > > > > >> rces
> > > > > >> /mime-types.xml
> > > > > >> file and the
> > > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/mvn/archetypes/rad
> > > > > >> ix/s
> > > > > >> rc/m
> > > > > >> ain/resources/archetype-resources/filemgr/src/main/resources/
> > > > > >> etc/ mime -types.xml file. So, it is definitely different
> > > > > >> than the one I've been using. But, I copied it over and added
> > > > > >> my mime types to it, and it didn't help.  The mime types it
> > > > > >> is returning are 'reasonable'
> > > > > >> mime-types to return, they are just not the mime-types that I
> > > > > >> defined them as.  For instance, I have *.sfdu files and *.out
> > > > > >> files that contain binary data, and tika says they are
> > > > > >> "application/octet-stream" files.  I also have *.ecsv files
> > > > > >> that contain text, and tika says they are "text/plain" files.
> > > > > >>
> > > > > >> But here are the mime-types I defined for these files for my
> > > > > >> project, and these are the mime-types that have defined
> > > > > >> extractors for.  None of these filename extensions "*.out,
> > > > > >> *.ecsv, and *.sfdu" are defined elsewhere in the
> > > > > >> mime-types.xml
> > file.
> > > > > >>
> > > > > >> <mime-type type="product/fei-out">
> > > > > >>     <glob pattern="*.out"/>
> > > > > >> </mime-type>
> > > > > >>
> > > > > >> <mime-type type="product/fei-ecsv">
> > > > > >>     <glob pattern="*.ecsv"/>
> > > > > >> </mime-type>
> > > > > >>
> > > > > >> <mime-type type="product/fei-sfdu">
> > > > > >>      <glob pattern="*.sfdu"/> </mime-type>
> > > > > >>
> > > > > >> I'm a newbie with Java and I can't guarantee I would be able
> > > > > >> to build a JUnit test program very easily. But I will
> > > > > >> continue to investigate and see what I can do.
> > > > > >>
> > > > > >> Thanks!
> > > > > >>
> > > > > >> Val
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Valerie A. Mallder
> > > > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > > >> University/Applied Physics Laboratory
> > > > > >>
> > > > > >>
> > > > > >> > -----Original Message-----
> > > > > >> > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > > >> > Sent: Wednesday, January 21, 2015 5:13 PM
> > > > > >> > To: dev
> > > > > >> > Subject: Re: Tyler - I may need your help
> > > > > >> >
> > > > > >> > Hi Val,
> > > > > >> >
> > > > > >> > Hmm... Is there a particular (wrong) mime-type that keeps
> > > > > >> > getting detected (like text/plain, or something)? I'm
> > > > > >> > curious if the type is just returning a default. Or, is it
> > > > > >> > a seemingly random file type? What
> > > > > >> are the contents of your mime-types.xml file?
> > > > > >> > If it's different than
> > > > > >> > https://raw.githubusercontent.com/apache/tika/trunk/tika-
> > > > > >> > core/src/main/resources/org/apache/tika/mime/tika-mimetypes
> > > > > >> > .xml
> > > > > >> > ,
> > > > > >> > can you try copying it over?
> > > > > >> >
> > > > > >> > I'm not sure I'll be able to replicate your error on my
> > > > > >> > computer without a bit of difficulty. Do you think there is
> > > > > >> > any way you could create a JUnit test case with the problem?
> > > > > >> >
> > > > > >> > Tyler
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Jan 21, 2015 at 1:26 PM, Mallder, Valerie <
> > > > > >> > Valerie.Mallder@jhuapl.edu>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Hi Tyler,
> > > > > >> > >
> > > > > >> > > I'm have been looking into an issue that cropped up in my
> > > > > >> > > OODT system when I upgraded to OODT 0.8. The issue is, my
> > > > > >> > > AutoDetectProductCrawler, which is launched from a
> > > > > >> > > PGETaskInstance is unable to determine the mime-type for
> > > > > >> > > my product files.  I am using the same
> > > > > >> > > filemgr/etc/mime-types.xml file that I was using with
> > > > > >> > > OODT 0.7, and I am using the same
> > > > > >> > > oodt/extensions/policy/mime-extractor-map.xml file that I
> > > > > >> > > was using with OODT 0.7, but now, in
> > > > > >> > > MimeTypeRepo::getExtractorSpecsForFile,
> > > > > >> > > the call to
> > > > > >> > > this.mimeRepo.getMimeType(file) is returning the wrong
> > > > > >> > > mime-types for all of my files, and so the
> > > > > >> > > AutoDetectProductCrawler is telling me I have no
> > > > > >> > > extractor
> > specs for
> > > my files.
> > > > > >> > >
> > > > > >> > > I noticed that you did some work on MimeTypeUtils for
> > > > > >> > > OODT-630 in OODT 0.8. At first glance, it doesn't' look
> > > > > >> > > like any of this work would be directly responsible. Can
> > > > > >> > > you think of anything that might be causing this to
> > > > > >> > > happen? I don't know anything about tika. Do I need to
> > > > > >> > > make any changes to my policy files to remain
> > > > > compatible.
> > > > > >> > > Just looking for clues on how to resolve this.  I have
> > > > > >> > > verified by adding log messages throughout the code that,
> > > > > >> > > prior to launching the AutoDetectProductCrawler, all of
> > > > > >> > > the policy files
> > > > are read
> > > > > correctly.
> > > > > >> > > The MimeExtractorConfigReader is reading the correct
> > > > > >> > > mim-extractor-map.xml file, and it is calling
> > > > > >> > > setMimeRepoFile with the correct mime-types.xml file, and
> > > > > >> > > it is setting the correct extractor config file, etc.
> > > > > >> > > But, once AutoDetectProductCrawler starts crawling it try
> > > > > >> > > to getExtractorSpecsForFile but determines the wrong mime
> > > > > >> > > type and then
> > > > > >> > can't find the extractor spec.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Val
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > Valerie A. Mallder
> > > > > >> > >
> > > > > >> > > New Horizons Deputy Mission System Engineer The Johns
> > > > > >> > > Hopkins University/Applied Physics Laboratory
> > > > > >> > > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > > > > >> > > 240-228-7846 (Office) 410-504-2233 (Blackberry)
> > > > > >> > >
> > > > > >> > >
> > > > > >>
> > > > > >
> > > > > >
> > > >
> >

RE: FW: Tyler - I may need your help

Posted by Tyler Palsulich <tp...@gmail.com>.
Hi,

Yes, removing the / was intended. Having it broke some of the http tests.

Have a good day!
Tyler
On Feb 4, 2015 3:58 PM, "Mallder, Valerie" <Va...@jhuapl.edu>
wrote:

> Hi Tyler,
>
> I am running back through all of the local changes that I had to make to
> OODT 0.8 just to make sure I had written them all down. And I noticed that
> the change to MimeTypeUtils.java is different now than it was on Jan 24
> when you sent me the link below.  In the version of the change that I
> looked at on Jan 24, line 64 had been changed by adding a '/" at the front
> of the filename on that line. But, now that "/" is gone.  (i.e.
> "/tika-mimetypes.xml" is now "tika-mimetypes.xml."  Can you verify that
> removing the "/" was intended?
>
> Thanks,
> Val
>
>
>
> Valerie A. Mallder
> New Horizons Deputy Mission System Engineer
> Johns Hopkins University/Applied Physics Laboratory
>
> > -----Original Message-----
> > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > Sent: Saturday, January 24, 2015 11:02 AM
> > To: dev
> > Subject: Re: FW: Tyler - I may need your help
> >
> > Hi Val,
> >
> > Please see the update for OODT-805 here
> > <
> https://github.com/apache/oodt/commit/a44c0e871f4db4af645f00533b00de1e91d
> > 5bcc3>
> > .
> >
> > Re: Radix, is MimeTypeUtils used in Radix? I don't see any references to
> it.
> >
> > Tyler
> >
> > On Fri, Jan 23, 2015 at 4:02 PM, Mallder, Valerie <
> Valerie.Mallder@jhuapl.edu>
> > wrote:
> >
> > > Hi Tyler,
> > >
> > > Yes, this fix did take care of my problem. Thanks so much!
> > >
> > > Chris, if you want to make a new OODT 0.8.1, be sure to also include
> > > the fix for OODT-805 below in the radix installation.  My system is
> > > back up and running now.
> > >
> > > Thanks,
> > >
> > > Val
> > >
> > >
> > >
> > >
> > > Valerie A. Mallder
> > > New Horizons Deputy Mission System Engineer Johns Hopkins
> > > University/Applied Physics Laboratory
> > >
> > > > -----Original Message-----
> > > > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > Sent: Thursday, January 22, 2015 10:35 PM
> > > > To: dev
> > > > Subject: Re: FW: Tyler - I may need your help
> > > >
> > > > Hi Val,
> > > >
> > > > Please see OODT-805 and
> > > >
> > > https://github.com/apache/oodt/commit/cf1220d4ac66ccefc8e510c62fb6b38c
> > > f529f
> > > > fb2
> > > > for what I believe is the fix.
> > > >
> > > > Can you make the MimeTypeUtils changes locally or try out trunk?
> > > >
> > > > Let me know!
> > > > Tyler
> > > >
> > > > On Thu, Jan 22, 2015 at 5:40 PM, Tyler Palsulich
> > > > <tp...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Val,
> > > > >
> > > > > Yes, I think you've hit the nail on the head -- if Tika isn't
> > > > > passed your updated mimetypes configuration file (with your custom
> > > > > types), then those files will not be properly identified. I'll
> > > > > look into this issue more tonight and hopefully find a fix. :)
> > > > >
> > > > > > by default tika only knows about xml files, text files,
> > > > > application/octet-stream files.
> > > > > I'm not sure what you mean by this? Tika knows about much more
> > > > > than that, but is there an OODT config that overrides that?
> > > > >
> > > > > > I'm a newbie with Java and I can't guarantee I would be able to
> > > > > > build a
> > > > > JUnit test program very easily. But I will continue to investigate
> > > > > and see what I can do.
> > > > > No worries! :) If you have time and want to try your hand at it,
> > > > > the best way to learn is by looking at the existing tests, like in
> > > > > https://github.com/apache/oodt/blob/trunk/metadata/src/test/org/ap
> > > > > ache /oodt/cas/metadata/util/TestMimeTypeUtils.java
> > > > > .
> > > > >
> > > > > Have a good night,
> > > > > Tyler
> > > > >
> > > > > On Thu, Jan 22, 2015 at 2:22 PM, Mallder, Valerie <
> > > > > Valerie.Mallder@jhuapl.edu> wrote:
> > > > >
> > > > >> Hi Tyler,
> > > > >>
> > > > >> Can you tell me more about the tika-mimetypes.xml file? Is this a
> > > > >> new 'required' file?  I'm not 100% sure about this yet, but it
> > > > >> seems to me that, since MimeTypeUtils.java instantiates Tika with
> > > > >> the default constructor, and never explicitly tells Tika which
> > > > >> mime-types file to use (even though the correct mime-types.xml
> > > > >> file is passed to the MimeTypeUtils constructor from
> > > > >> MimeExtractorRepo) there is no place where the contents of my
> > > > >> mime-types.xml file is being read and stored in the Tika's
> > > > >> MimeTypeRegistry, and by default tika only knows about xml files,
> text
> > files, application/octet-stream files.
> > > > >>
> > > > >> I will keep looking at this tomorrow and verify which the file
> > > > >> that is passed to the Tika's MimeTypesFactory class, but I have
> > > > >> to head
> > > home now.
> > > > >>
> > > > >> Val
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> Valerie A. Mallder
> > > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > >> University/Applied Physics Laboratory
> > > > >>
> > > > >>
> > > > >> -----Original Message-----
> > > > >> From: Mallder, Valerie
> > > > >> Sent: Thursday, January 22, 2015 11:42 AM
> > > > >> To: dev
> > > > >> Subject: RE: Tyler - I may need your help
> > > > >>
> > > > >> Hi Tyler,
> > > > >>
> > > > >> I have defined a few custom mime types in my
> > > > >> filemgr/etc/mime-types.xml file. The contents of my file looks
> > > > >> exactly like the contents of
> > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/resou
> > > > >> rces /mime-types.xml with the addition of project-specific
> > > > >> mime-types .
> > > > >> The tika-mimetypes.xml file you pointed me to has ~2000
> > > > >> additional lines in it as compared to the
> > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/filemgr/src/main/resou
> > > > >> rces
> > > > >> /mime-types.xml
> > > > >> file and the
> > > > >> http://svn.apache.org/viewvc/oodt/tags/0.8/mvn/archetypes/radix/s
> > > > >> rc/m
> > > > >> ain/resources/archetype-resources/filemgr/src/main/resources/etc/
> > > > >> mime -types.xml file. So, it is definitely different than the one
> > > > >> I've been using. But, I copied it over and added my mime types to
> > > > >> it, and it didn't help.  The mime types it is returning are
> > > > >> 'reasonable'
> > > > >> mime-types to return, they are just not the mime-types that I
> > > > >> defined them as.  For instance, I have *.sfdu files and *.out
> > > > >> files that contain binary data, and tika says they are
> > > > >> "application/octet-stream" files.  I also have *.ecsv files that
> > > > >> contain text, and tika says they are "text/plain" files.
> > > > >>
> > > > >> But here are the mime-types I defined for these files for my
> > > > >> project, and these are the mime-types that have defined
> > > > >> extractors for.  None of these filename extensions "*.out,
> > > > >> *.ecsv, and *.sfdu" are defined elsewhere in the mime-types.xml
> file.
> > > > >>
> > > > >> <mime-type type="product/fei-out">
> > > > >>     <glob pattern="*.out"/>
> > > > >> </mime-type>
> > > > >>
> > > > >> <mime-type type="product/fei-ecsv">
> > > > >>     <glob pattern="*.ecsv"/>
> > > > >> </mime-type>
> > > > >>
> > > > >> <mime-type type="product/fei-sfdu">
> > > > >>      <glob pattern="*.sfdu"/>
> > > > >> </mime-type>
> > > > >>
> > > > >> I'm a newbie with Java and I can't guarantee I would be able to
> > > > >> build a JUnit test program very easily. But I will continue to
> > > > >> investigate and see what I can do.
> > > > >>
> > > > >> Thanks!
> > > > >>
> > > > >> Val
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> Valerie A. Mallder
> > > > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > > > >> University/Applied Physics Laboratory
> > > > >>
> > > > >>
> > > > >> > -----Original Message-----
> > > > >> > From: Tyler Palsulich [mailto:tpalsulich@gmail.com]
> > > > >> > Sent: Wednesday, January 21, 2015 5:13 PM
> > > > >> > To: dev
> > > > >> > Subject: Re: Tyler - I may need your help
> > > > >> >
> > > > >> > Hi Val,
> > > > >> >
> > > > >> > Hmm... Is there a particular (wrong) mime-type that keeps
> > > > >> > getting detected (like text/plain, or something)? I'm curious
> > > > >> > if the type is just returning a default. Or, is it a seemingly
> > > > >> > random file type? What
> > > > >> are the contents of your mime-types.xml file?
> > > > >> > If it's different than
> > > > >> > https://raw.githubusercontent.com/apache/tika/trunk/tika-
> > > > >> > core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
> > > > >> > ,
> > > > >> > can you try copying it over?
> > > > >> >
> > > > >> > I'm not sure I'll be able to replicate your error on my
> > > > >> > computer without a bit of difficulty. Do you think there is any
> > > > >> > way you could create a JUnit test case with the problem?
> > > > >> >
> > > > >> > Tyler
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Jan 21, 2015 at 1:26 PM, Mallder, Valerie <
> > > > >> > Valerie.Mallder@jhuapl.edu>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi Tyler,
> > > > >> > >
> > > > >> > > I'm have been looking into an issue that cropped up in my
> > > > >> > > OODT system when I upgraded to OODT 0.8. The issue is, my
> > > > >> > > AutoDetectProductCrawler, which is launched from a
> > > > >> > > PGETaskInstance is unable to determine the mime-type for my
> > > > >> > > product files.  I am using the same
> > > > >> > > filemgr/etc/mime-types.xml file that I was using with OODT
> > > > >> > > 0.7, and I am using the same
> > > > >> > > oodt/extensions/policy/mime-extractor-map.xml file that I was
> > > > >> > > using with OODT 0.7, but now, in
> > > > >> > > MimeTypeRepo::getExtractorSpecsForFile,
> > > > >> > > the call to
> > > > >> > > this.mimeRepo.getMimeType(file) is returning the wrong
> > > > >> > > mime-types for all of my files, and so the
> > > > >> > > AutoDetectProductCrawler is telling me I have no extractor
> specs for
> > my files.
> > > > >> > >
> > > > >> > > I noticed that you did some work on MimeTypeUtils for
> > > > >> > > OODT-630 in OODT 0.8. At first glance, it doesn't' look like
> > > > >> > > any of this work would be directly responsible. Can you think
> > > > >> > > of anything that might be causing this to happen? I don't
> > > > >> > > know anything about tika. Do I need to make any changes to my
> > > > >> > > policy files to remain
> > > > compatible.
> > > > >> > > Just looking for clues on how to resolve this.  I have
> > > > >> > > verified by adding log messages throughout the code that,
> > > > >> > > prior to launching the AutoDetectProductCrawler, all of the
> > > > >> > > policy files
> > > are read
> > > > correctly.
> > > > >> > > The MimeExtractorConfigReader is reading the correct
> > > > >> > > mim-extractor-map.xml file, and it is calling setMimeRepoFile
> > > > >> > > with the correct mime-types.xml file, and it is setting the
> > > > >> > > correct extractor config file, etc. But, once
> > > > >> > > AutoDetectProductCrawler starts crawling it try to
> > > > >> > > getExtractorSpecsForFile but determines the wrong mime type
> > > > >> > > and then
> > > > >> > can't find the extractor spec.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Val
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > Valerie A. Mallder
> > > > >> > >
> > > > >> > > New Horizons Deputy Mission System Engineer The Johns Hopkins
> > > > >> > > University/Applied Physics Laboratory
> > > > >> > > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > > > >> > > 240-228-7846 (Office) 410-504-2233 (Blackberry)
> > > > >> > >
> > > > >> > >
> > > > >>
> > > > >
> > > > >
> > >
>