You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bruno Mannina <bm...@free.fr> on 2012/06/12 09:06:29 UTC
Indexing Data option for subdirectories?
Dear All,
Is exist a way to index data under sub-directories directly?
I have several files under sub-directories like:
/data/A/001/*.xml
/data/A/002/*.xml
/data/A/003/*.xml
/data/A/004/*.xml
...
/data/B/001/*.xml
...
/data/Z/999/*.xml
I would like to index directly with
*i.e. java -jar post.jar -R /data*
Is it possible?
thanks a lot,
Bruno
Re: Indexing Data option for subdirectories?
Posted by Erik Hatcher <er...@gmail.com>.
If they aren't Solr XML format, but you can write an XSLT to transform it to Solr XML, you can use this: <http://wiki.apache.org/solr/XsltUpdateRequestHandler>
Erik
On Jun 12, 2012, at 15:20 , Jack Krupansky wrote:
> There isn't a recursion option for post.jar (I did check.)
>
> Maybe your best bet is the "find" shell command. This may not be 100% correct, but something like:
>
> find /data -name '*.xml' -exec java -jar post.jar {}
>
> This is assuming that these are pre-formatted Solr XML update files with "<doc>" and "<add>".
>
> If they are not in solr xml format and require translation, DIH with FileDataSource and FileListEntityProcessor ihc supports recursion hwmay be the way to go:
> http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
>
> -- Jack Krupansky
>
> -----Original Message----- From: Bruno Mannina
> Sent: Tuesday, June 12, 2012 3:06 AM
> To: solr-user@lucene.apache.org
> Subject: Indexing Data option for subdirectories?
>
> Dear All,
>
> Is exist a way to index data under sub-directories directly?
>
> I have several files under sub-directories like:
> /data/A/001/*.xml
> /data/A/002/*.xml
> /data/A/003/*.xml
> /data/A/004/*.xml
> ...
> /data/B/001/*.xml
> ...
>
> /data/Z/999/*.xml
>
> I would like to index directly with
>
> *i.e. java -jar post.jar -R /data*
>
> Is it possible?
>
> thanks a lot,
> Bruno
Re: Indexing Data option for subdirectories?
Posted by Gora Mohanty <go...@mimirtech.com>.
On 13 June 2012 00:50, Jack Krupansky <ja...@basetechnology.com> wrote:
> There isn't a recursion option for post.jar (I did check.)
>
> Maybe your best bet is the "find" shell command. This may not be 100%
> correct, but something like:
>
> find /data -name '*.xml' -exec java -jar post.jar {}
[...]
The above should end with a "\;", i.e.,
find /data -name '*.xml' -exec java -jar post.jar {} \;
You can handle multiple posts to Solr in parallel if
you couple this with xargs.
This also assumes a UNIX system, but on most other
systems you could write a script that handles the
recursion into sub-directories, and posts each file.
Regards,
Gora
Re: Indexing Data option for subdirectories?
Posted by Jack Krupansky <ja...@basetechnology.com>.
There isn't a recursion option for post.jar (I did check.)
Maybe your best bet is the "find" shell command. This may not be 100%
correct, but something like:
find /data -name '*.xml' -exec java -jar post.jar {}
This is assuming that these are pre-formatted Solr XML update files with
"<doc>" and "<add>".
If they are not in solr xml format and require translation, DIH with
FileDataSource and FileListEntityProcessor ihc supports recursion hwmay be
the way to go:
http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor
-- Jack Krupansky
-----Original Message-----
From: Bruno Mannina
Sent: Tuesday, June 12, 2012 3:06 AM
To: solr-user@lucene.apache.org
Subject: Indexing Data option for subdirectories?
Dear All,
Is exist a way to index data under sub-directories directly?
I have several files under sub-directories like:
/data/A/001/*.xml
/data/A/002/*.xml
/data/A/003/*.xml
/data/A/004/*.xml
...
/data/B/001/*.xml
...
/data/Z/999/*.xml
I would like to index directly with
*i.e. java -jar post.jar -R /data*
Is it possible?
thanks a lot,
Bruno