You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oodt.apache.org by "Nguyen, Ricky" <rn...@chla.usc.edu> on 2011/11/19 01:52:25 UTC

crawler with unique action fails on first ingest

Hi,

I am trying to run a crawler using "--actionIds Unique". Since this is the first time I am ingesting a file into FileMgr, the user guide [1] says that the catalog dir MUST NOT exist so that Lucene can create it. However, the crawler fails with the error:

IOException when opening index directory: [/Users/rnguyen/vpicu/data/catalog] for search: Message: /Users/rnguyen/vpicu/data/catalog is not a directory

Seems like crawler is trying to search for a product (to determine it's uniqueness), but the catalog hasn't been created yet. I guess since I have no catalog, the workaround is to omit the "Unique" action.

But if I use crawler as a daemon, it would be useful to leave "Unique" as an action. Any thoughts on the right course?

Thanks,
Ricky

[1] http://oodt.apache.org/components/maven/filemgr/user/basic.html


---------------------------------------------------------------------
CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
is for the sole use of the intended recipient(s) and may contain confidential
or legally privileged information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of this original message.  

---------------------------------------------------------------------


Re: crawler with unique action fails on first ingest

Posted by "Nguyen, Ricky" <rn...@chla.usc.edu>.
Thanks Chris! (and Paul if he ends up working on it :P )

On Nov 18, 2011, at 4:55 PM, Mattmann, Chris A (388J) wrote:

> Hey Ricky,
> 
> I've ran into this a number of times myself and recently Paul Ramirez and I were talking about this too. Paul even 
> said he would try and fix it (ha! I'm signing him up for work :P ). Actually I'll just look at it myself.
> 
> In the meanwhile, the workaround is exactly the one you stated. Ingest a file, that gets you a catalog. Then, you can 
> simply delete the file if you want using fmquery | fmdel and then Unique works just fine.
> 
> Cheers,
> Chris
> 
> On Nov 18, 2011, at 4:52 PM, Nguyen, Ricky wrote:
> 
>> Hi,
>> 
>> I am trying to run a crawler using "--actionIds Unique". Since this is the first time I am ingesting a file into FileMgr, the user guide [1] says that the catalog dir MUST NOT exist so that Lucene can create it. However, the crawler fails with the error:
>> 
>> IOException when opening index directory: [/Users/rnguyen/vpicu/data/catalog] for search: Message: /Users/rnguyen/vpicu/data/catalog is not a directory
>> 
>> Seems like crawler is trying to search for a product (to determine it's uniqueness), but the catalog hasn't been created yet. I guess since I have no catalog, the workaround is to omit the "Unique" action.
>> 
>> But if I use crawler as a daemon, it would be useful to leave "Unique" as an action. Any thoughts on the right course?
>> 
>> Thanks,
>> Ricky
>> 
>> [1] http://oodt.apache.org/components/maven/filemgr/user/basic.html
>> 
>> 
>> ---------------------------------------------------------------------
>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
>> is for the sole use of the intended recipient(s) and may contain confidential
>> or legally privileged information. Any unauthorized review, use, disclosure
>> or distribution is prohibited. If you are not the intended recipient, please
>> contact the sender by reply e-mail and destroy all copies of this original message.  
>> 
>> ---------------------------------------------------------------------
>> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 




---------------------------------------------------------------------
CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
is for the sole use of the intended recipient(s) and may contain confidential
or legally privileged information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of this original message.  

---------------------------------------------------------------------


Re: crawler with unique action fails on first ingest

Posted by "Nguyen, Ricky" <rn...@chla.usc.edu>.
Thanks Chris! (and Paul if he ends up working on it :P )

On Nov 18, 2011, at 4:55 PM, Mattmann, Chris A (388J) wrote:

> Hey Ricky,
> 
> I've ran into this a number of times myself and recently Paul Ramirez and I were talking about this too. Paul even 
> said he would try and fix it (ha! I'm signing him up for work :P ). Actually I'll just look at it myself.
> 
> In the meanwhile, the workaround is exactly the one you stated. Ingest a file, that gets you a catalog. Then, you can 
> simply delete the file if you want using fmquery | fmdel and then Unique works just fine.
> 
> Cheers,
> Chris
> 
> On Nov 18, 2011, at 4:52 PM, Nguyen, Ricky wrote:
> 
>> Hi,
>> 
>> I am trying to run a crawler using "--actionIds Unique". Since this is the first time I am ingesting a file into FileMgr, the user guide [1] says that the catalog dir MUST NOT exist so that Lucene can create it. However, the crawler fails with the error:
>> 
>> IOException when opening index directory: [/Users/rnguyen/vpicu/data/catalog] for search: Message: /Users/rnguyen/vpicu/data/catalog is not a directory
>> 
>> Seems like crawler is trying to search for a product (to determine it's uniqueness), but the catalog hasn't been created yet. I guess since I have no catalog, the workaround is to omit the "Unique" action.
>> 
>> But if I use crawler as a daemon, it would be useful to leave "Unique" as an action. Any thoughts on the right course?
>> 
>> Thanks,
>> Ricky
>> 
>> [1] http://oodt.apache.org/components/maven/filemgr/user/basic.html
>> 
>> 
>> ---------------------------------------------------------------------
>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
>> is for the sole use of the intended recipient(s) and may contain confidential
>> or legally privileged information. Any unauthorized review, use, disclosure
>> or distribution is prohibited. If you are not the intended recipient, please
>> contact the sender by reply e-mail and destroy all copies of this original message.  
>> 
>> ---------------------------------------------------------------------
>> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 




---------------------------------------------------------------------
CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
is for the sole use of the intended recipient(s) and may contain confidential
or legally privileged information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of this original message.  

---------------------------------------------------------------------


Re: crawler with unique action fails on first ingest

Posted by Sheryl John <sh...@gmail.com>.
Hi Tim,

fmquery and fmdel are shell alias commands for the file manager.
It was introduced a few months back. Check out OODT-306 :
https://issues.apache.org/jira/browse/OODT-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

-Sheryl

On Mon, Nov 21, 2011 at 9:23 AM, Stough, Timothy M (388F) <
timothy.m.stough@jpl.nasa.gov> wrote:

> I ran into this too.  My solution was to add the ingest of blah.txt
> (blatantly stolen from the quick-start) to my start-up scripts and just
> leave it in the catalog.
>
> What are fmquery and fmdel?  I was wondering how to remove something from
> the catalog.
>
> Thanks,
> Tim.
>
>
> On Nov 18, 2011, at 4:55 PM, Mattmann, Chris A (388J) wrote:
>
> > Hey Ricky,
> >
> > I've ran into this a number of times myself and recently Paul Ramirez
> and I were talking about this too. Paul even
> > said he would try and fix it (ha! I'm signing him up for work :P ).
> Actually I'll just look at it myself.
> >
> > In the meanwhile, the workaround is exactly the one you stated. Ingest a
> file, that gets you a catalog. Then, you can
> > simply delete the file if you want using fmquery | fmdel and then Unique
> works just fine.
> >
> > Cheers,
> > Chris
> >
> > On Nov 18, 2011, at 4:52 PM, Nguyen, Ricky wrote:
> >
> >> Hi,
> >>
> >> I am trying to run a crawler using "--actionIds Unique". Since this is
> the first time I am ingesting a file into FileMgr, the user guide [1] says
> that the catalog dir MUST NOT exist so that Lucene can create it. However,
> the crawler fails with the error:
> >>
> >> IOException when opening index directory:
> [/Users/rnguyen/vpicu/data/catalog] for search: Message:
> /Users/rnguyen/vpicu/data/catalog is not a directory
> >>
> >> Seems like crawler is trying to search for a product (to determine it's
> uniqueness), but the catalog hasn't been created yet. I guess since I have
> no catalog, the workaround is to omit the "Unique" action.
> >>
> >> But if I use crawler as a daemon, it would be useful to leave "Unique"
> as an action. Any thoughts on the right course?
> >>
> >> Thanks,
> >> Ricky
> >>
> >> [1] http://oodt.apache.org/components/maven/filemgr/user/basic.html
> >>
> >>
> >> ---------------------------------------------------------------------
> >> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
> >> is for the sole use of the intended recipient(s) and may contain
> confidential
> >> or legally privileged information. Any unauthorized review, use,
> disclosure
> >> or distribution is prohibited. If you are not the intended recipient,
> please
> >> contact the sender by reply e-mail and destroy all copies of this
> original message.
> >>
> >> ---------------------------------------------------------------------
> >>
> >
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: chris.a.mattmann@nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
>
> -----------------------------------------------------------------
> Tim Stough
> NASA/Caltech Jet Propulsion Lab
> Senior System Architect
> Data Understanding Group (Section 388)
> 818-393-5347 (office)
> 626-644-6574 (cell)
> -----------------------------------------------------------------
>
>
>
>
>
>


-- 
-Sheryl

Re: crawler with unique action fails on first ingest

Posted by Sheryl John <sh...@gmail.com>.
Hi Tim,

fmquery and fmdel are shell alias commands for the file manager.
It was introduced a few months back. Check out OODT-306 :
https://issues.apache.org/jira/browse/OODT-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

-Sheryl

On Mon, Nov 21, 2011 at 9:23 AM, Stough, Timothy M (388F) <
timothy.m.stough@jpl.nasa.gov> wrote:

> I ran into this too.  My solution was to add the ingest of blah.txt
> (blatantly stolen from the quick-start) to my start-up scripts and just
> leave it in the catalog.
>
> What are fmquery and fmdel?  I was wondering how to remove something from
> the catalog.
>
> Thanks,
> Tim.
>
>
> On Nov 18, 2011, at 4:55 PM, Mattmann, Chris A (388J) wrote:
>
> > Hey Ricky,
> >
> > I've ran into this a number of times myself and recently Paul Ramirez
> and I were talking about this too. Paul even
> > said he would try and fix it (ha! I'm signing him up for work :P ).
> Actually I'll just look at it myself.
> >
> > In the meanwhile, the workaround is exactly the one you stated. Ingest a
> file, that gets you a catalog. Then, you can
> > simply delete the file if you want using fmquery | fmdel and then Unique
> works just fine.
> >
> > Cheers,
> > Chris
> >
> > On Nov 18, 2011, at 4:52 PM, Nguyen, Ricky wrote:
> >
> >> Hi,
> >>
> >> I am trying to run a crawler using "--actionIds Unique". Since this is
> the first time I am ingesting a file into FileMgr, the user guide [1] says
> that the catalog dir MUST NOT exist so that Lucene can create it. However,
> the crawler fails with the error:
> >>
> >> IOException when opening index directory:
> [/Users/rnguyen/vpicu/data/catalog] for search: Message:
> /Users/rnguyen/vpicu/data/catalog is not a directory
> >>
> >> Seems like crawler is trying to search for a product (to determine it's
> uniqueness), but the catalog hasn't been created yet. I guess since I have
> no catalog, the workaround is to omit the "Unique" action.
> >>
> >> But if I use crawler as a daemon, it would be useful to leave "Unique"
> as an action. Any thoughts on the right course?
> >>
> >> Thanks,
> >> Ricky
> >>
> >> [1] http://oodt.apache.org/components/maven/filemgr/user/basic.html
> >>
> >>
> >> ---------------------------------------------------------------------
> >> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments,
> >> is for the sole use of the intended recipient(s) and may contain
> confidential
> >> or legally privileged information. Any unauthorized review, use,
> disclosure
> >> or distribution is prohibited. If you are not the intended recipient,
> please
> >> contact the sender by reply e-mail and destroy all copies of this
> original message.
> >>
> >> ---------------------------------------------------------------------
> >>
> >
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: chris.a.mattmann@nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
>
> -----------------------------------------------------------------
> Tim Stough
> NASA/Caltech Jet Propulsion Lab
> Senior System Architect
> Data Understanding Group (Section 388)
> 818-393-5347 (office)
> 626-644-6574 (cell)
> -----------------------------------------------------------------
>
>
>
>
>
>


-- 
-Sheryl

Re: crawler with unique action fails on first ingest

Posted by "Stough, Timothy M (388F)" <ti...@jpl.nasa.gov>.
I ran into this too.  My solution was to add the ingest of blah.txt (blatantly stolen from the quick-start) to my start-up scripts and just leave it in the catalog.

What are fmquery and fmdel?  I was wondering how to remove something from the catalog.

Thanks,
Tim.


On Nov 18, 2011, at 4:55 PM, Mattmann, Chris A (388J) wrote:

> Hey Ricky,
> 
> I've ran into this a number of times myself and recently Paul Ramirez and I were talking about this too. Paul even 
> said he would try and fix it (ha! I'm signing him up for work :P ). Actually I'll just look at it myself.
> 
> In the meanwhile, the workaround is exactly the one you stated. Ingest a file, that gets you a catalog. Then, you can 
> simply delete the file if you want using fmquery | fmdel and then Unique works just fine.
> 
> Cheers,
> Chris
> 
> On Nov 18, 2011, at 4:52 PM, Nguyen, Ricky wrote:
> 
>> Hi,
>> 
>> I am trying to run a crawler using "--actionIds Unique". Since this is the first time I am ingesting a file into FileMgr, the user guide [1] says that the catalog dir MUST NOT exist so that Lucene can create it. However, the crawler fails with the error:
>> 
>> IOException when opening index directory: [/Users/rnguyen/vpicu/data/catalog] for search: Message: /Users/rnguyen/vpicu/data/catalog is not a directory
>> 
>> Seems like crawler is trying to search for a product (to determine it's uniqueness), but the catalog hasn't been created yet. I guess since I have no catalog, the workaround is to omit the "Unique" action.
>> 
>> But if I use crawler as a daemon, it would be useful to leave "Unique" as an action. Any thoughts on the right course?
>> 
>> Thanks,
>> Ricky
>> 
>> [1] http://oodt.apache.org/components/maven/filemgr/user/basic.html
>> 
>> 
>> ---------------------------------------------------------------------
>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
>> is for the sole use of the intended recipient(s) and may contain confidential
>> or legally privileged information. Any unauthorized review, use, disclosure
>> or distribution is prohibited. If you are not the intended recipient, please
>> contact the sender by reply e-mail and destroy all copies of this original message.  
>> 
>> ---------------------------------------------------------------------
>> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 

-----------------------------------------------------------------
Tim Stough
NASA/Caltech Jet Propulsion Lab
Senior System Architect
Data Understanding Group (Section 388)
818-393-5347 (office)
626-644-6574 (cell)
-----------------------------------------------------------------






Re: crawler with unique action fails on first ingest

Posted by "Stough, Timothy M (388F)" <ti...@jpl.nasa.gov>.
I ran into this too.  My solution was to add the ingest of blah.txt (blatantly stolen from the quick-start) to my start-up scripts and just leave it in the catalog.

What are fmquery and fmdel?  I was wondering how to remove something from the catalog.

Thanks,
Tim.


On Nov 18, 2011, at 4:55 PM, Mattmann, Chris A (388J) wrote:

> Hey Ricky,
> 
> I've ran into this a number of times myself and recently Paul Ramirez and I were talking about this too. Paul even 
> said he would try and fix it (ha! I'm signing him up for work :P ). Actually I'll just look at it myself.
> 
> In the meanwhile, the workaround is exactly the one you stated. Ingest a file, that gets you a catalog. Then, you can 
> simply delete the file if you want using fmquery | fmdel and then Unique works just fine.
> 
> Cheers,
> Chris
> 
> On Nov 18, 2011, at 4:52 PM, Nguyen, Ricky wrote:
> 
>> Hi,
>> 
>> I am trying to run a crawler using "--actionIds Unique". Since this is the first time I am ingesting a file into FileMgr, the user guide [1] says that the catalog dir MUST NOT exist so that Lucene can create it. However, the crawler fails with the error:
>> 
>> IOException when opening index directory: [/Users/rnguyen/vpicu/data/catalog] for search: Message: /Users/rnguyen/vpicu/data/catalog is not a directory
>> 
>> Seems like crawler is trying to search for a product (to determine it's uniqueness), but the catalog hasn't been created yet. I guess since I have no catalog, the workaround is to omit the "Unique" action.
>> 
>> But if I use crawler as a daemon, it would be useful to leave "Unique" as an action. Any thoughts on the right course?
>> 
>> Thanks,
>> Ricky
>> 
>> [1] http://oodt.apache.org/components/maven/filemgr/user/basic.html
>> 
>> 
>> ---------------------------------------------------------------------
>> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
>> is for the sole use of the intended recipient(s) and may contain confidential
>> or legally privileged information. Any unauthorized review, use, disclosure
>> or distribution is prohibited. If you are not the intended recipient, please
>> contact the sender by reply e-mail and destroy all copies of this original message.  
>> 
>> ---------------------------------------------------------------------
>> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 

-----------------------------------------------------------------
Tim Stough
NASA/Caltech Jet Propulsion Lab
Senior System Architect
Data Understanding Group (Section 388)
818-393-5347 (office)
626-644-6574 (cell)
-----------------------------------------------------------------






Re: crawler with unique action fails on first ingest

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Ricky,

I've ran into this a number of times myself and recently Paul Ramirez and I were talking about this too. Paul even 
said he would try and fix it (ha! I'm signing him up for work :P ). Actually I'll just look at it myself.

In the meanwhile, the workaround is exactly the one you stated. Ingest a file, that gets you a catalog. Then, you can 
simply delete the file if you want using fmquery | fmdel and then Unique works just fine.

Cheers,
Chris

On Nov 18, 2011, at 4:52 PM, Nguyen, Ricky wrote:

> Hi,
> 
> I am trying to run a crawler using "--actionIds Unique". Since this is the first time I am ingesting a file into FileMgr, the user guide [1] says that the catalog dir MUST NOT exist so that Lucene can create it. However, the crawler fails with the error:
> 
> IOException when opening index directory: [/Users/rnguyen/vpicu/data/catalog] for search: Message: /Users/rnguyen/vpicu/data/catalog is not a directory
> 
> Seems like crawler is trying to search for a product (to determine it's uniqueness), but the catalog hasn't been created yet. I guess since I have no catalog, the workaround is to omit the "Unique" action.
> 
> But if I use crawler as a daemon, it would be useful to leave "Unique" as an action. Any thoughts on the right course?
> 
> Thanks,
> Ricky
> 
> [1] http://oodt.apache.org/components/maven/filemgr/user/basic.html
> 
> 
> ---------------------------------------------------------------------
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
> is for the sole use of the intended recipient(s) and may contain confidential
> or legally privileged information. Any unauthorized review, use, disclosure
> or distribution is prohibited. If you are not the intended recipient, please
> contact the sender by reply e-mail and destroy all copies of this original message.  
> 
> ---------------------------------------------------------------------
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: crawler with unique action fails on first ingest

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Ricky,

I've ran into this a number of times myself and recently Paul Ramirez and I were talking about this too. Paul even 
said he would try and fix it (ha! I'm signing him up for work :P ). Actually I'll just look at it myself.

In the meanwhile, the workaround is exactly the one you stated. Ingest a file, that gets you a catalog. Then, you can 
simply delete the file if you want using fmquery | fmdel and then Unique works just fine.

Cheers,
Chris

On Nov 18, 2011, at 4:52 PM, Nguyen, Ricky wrote:

> Hi,
> 
> I am trying to run a crawler using "--actionIds Unique". Since this is the first time I am ingesting a file into FileMgr, the user guide [1] says that the catalog dir MUST NOT exist so that Lucene can create it. However, the crawler fails with the error:
> 
> IOException when opening index directory: [/Users/rnguyen/vpicu/data/catalog] for search: Message: /Users/rnguyen/vpicu/data/catalog is not a directory
> 
> Seems like crawler is trying to search for a product (to determine it's uniqueness), but the catalog hasn't been created yet. I guess since I have no catalog, the workaround is to omit the "Unique" action.
> 
> But if I use crawler as a daemon, it would be useful to leave "Unique" as an action. Any thoughts on the right course?
> 
> Thanks,
> Ricky
> 
> [1] http://oodt.apache.org/components/maven/filemgr/user/basic.html
> 
> 
> ---------------------------------------------------------------------
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, 
> is for the sole use of the intended recipient(s) and may contain confidential
> or legally privileged information. Any unauthorized review, use, disclosure
> or distribution is prohibited. If you are not the intended recipient, please
> contact the sender by reply e-mail and destroy all copies of this original message.  
> 
> ---------------------------------------------------------------------
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++