You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bruno Mannina <bm...@free.fr> on 2012/06/12 09:06:29 UTC

Indexing Data option for subdirectories?

Dear All,

Is exist a way to index data under sub-directories directly?

I have several files under sub-directories like:
/data/A/001/*.xml
/data/A/002/*.xml
/data/A/003/*.xml
/data/A/004/*.xml
...
/data/B/001/*.xml
...

/data/Z/999/*.xml

I would like to index directly with

*i.e. java -jar post.jar -R /data*

Is it possible?

thanks a lot,
Bruno

Re: Indexing Data option for subdirectories?

Posted by Erik Hatcher <er...@gmail.com>.
If they aren't Solr XML format, but you can write an XSLT to transform it to Solr XML, you can use this: <http://wiki.apache.org/solr/XsltUpdateRequestHandler>

	Erik


On Jun 12, 2012, at 15:20 , Jack Krupansky wrote:

> There isn't a recursion option for post.jar (I did check.)
> 
> Maybe your best bet is the "find" shell command. This may not be 100% correct, but something like:
> 
>   find /data -name '*.xml' -exec java -jar post.jar {}
> 
> This is assuming that these are pre-formatted Solr XML update files with "<doc>" and "<add>".
> 
> If they are not in solr xml format and require translation, DIH with FileDataSource and FileListEntityProcessor ihc supports recursion hwmay be the way to go:
> http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Bruno Mannina
> Sent: Tuesday, June 12, 2012 3:06 AM
> To: solr-user@lucene.apache.org
> Subject: Indexing Data option for subdirectories?
> 
> Dear All,
> 
> Is exist a way to index data under sub-directories directly?
> 
> I have several files under sub-directories like:
> /data/A/001/*.xml
> /data/A/002/*.xml
> /data/A/003/*.xml
> /data/A/004/*.xml
> ...
> /data/B/001/*.xml
> ...
> 
> /data/Z/999/*.xml
> 
> I would like to index directly with
> 
> *i.e. java -jar post.jar -R /data*
> 
> Is it possible?
> 
> thanks a lot,
> Bruno 


Re: Indexing Data option for subdirectories?

Posted by Gora Mohanty <go...@mimirtech.com>.
On 13 June 2012 00:50, Jack Krupansky <ja...@basetechnology.com> wrote:
> There isn't a recursion option for post.jar (I did check.)
>
> Maybe your best bet is the "find" shell command. This may not be 100%
> correct, but something like:
>
>   find /data -name '*.xml' -exec java -jar post.jar {}
[...]

The above should end with a "\;", i.e.,
  find /data -name '*.xml' -exec java -jar post.jar {} \;
You can handle multiple posts to Solr in parallel if
you couple this with xargs.

This also assumes a UNIX system, but on most other
systems you could write a script that handles the
recursion into sub-directories, and posts each file.

Regards,
Gora

Re: Indexing Data option for subdirectories?

Posted by Jack Krupansky <ja...@basetechnology.com>.
There isn't a recursion option for post.jar (I did check.)

Maybe your best bet is the "find" shell command. This may not be 100% 
correct, but something like:

    find /data -name '*.xml' -exec java -jar post.jar {}

This is assuming that these are pre-formatted Solr XML update files with 
"<doc>" and "<add>".

If they are not in solr xml format and require translation, DIH with 
FileDataSource and FileListEntityProcessor ihc supports recursion hwmay be 
the way to go:
http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor

-- Jack Krupansky

-----Original Message----- 
From: Bruno Mannina
Sent: Tuesday, June 12, 2012 3:06 AM
To: solr-user@lucene.apache.org
Subject: Indexing Data option for subdirectories?

Dear All,

Is exist a way to index data under sub-directories directly?

I have several files under sub-directories like:
/data/A/001/*.xml
/data/A/002/*.xml
/data/A/003/*.xml
/data/A/004/*.xml
...
/data/B/001/*.xml
...

/data/Z/999/*.xml

I would like to index directly with

*i.e. java -jar post.jar -R /data*

Is it possible?

thanks a lot,
Bruno