You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chantal Ackermann <ch...@btelligent.de> on 2011/08/01 12:17:45 UTC
Re: Store complete XML record (DIH & XPathEntityProcessor)
Hi g,
ok, I understand your problem, now. (Sorry for answering that late.)
I don't think PlainTextEntityProcessor can help you. It does not take a
regex. LineEntityProcessor does but your record elements probably do not
come on their own line each and you wouldn't want to depend on that,
anyway.
I guess you would be best off writing your own entity processor - maybe
by extending XPath EP if that gives you some advantage. You can of
course also implement your own importer using SolrJ and your favourite
XML parser framework - or any other programming language.
If you are looking for a config-only solution - i'm not sure that there
is one. Someone else might be able to comment on that?
Cheers,
Chantal
On Thu, 2011-07-28 at 19:17 +0200, solruser@9913 wrote:
> Thanks Chantal
> I am ok with the second call and I already tried using that. Unfortunatly
> It reads the whole file into a field. My file is as below example
> <xml >
> <record>
> ...
> </record>
>
> <record>
> ...
> </record>
>
> <record>
> ...
> </record>
>
> </xml>
>
> Now the XPATH does the 'for each /record' part. For each record I also need
> to store the raw log in there. If I use the PlainTextEntityProcessor then
> it gives me the whole file (from <xml> .. </xml> ) and not each of the
> <record> </record>
>
> Am I using the PlainTextEntityProcessor wrong?
>
> THanks
> g
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3207203.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Re: Store complete XML record (DIH & XPathEntityProcessor)
Posted by Michael Sokolov <so...@ifactory.com>.
On 8/1/2011 6:17 AM, Chantal Ackermann wrote:
> If you are looking for a config-only solution - i'm not sure that there
> is one. Someone else might be able to comment on that?
>
You might want to take a look at SOLR-2597; it has a patch for
XmlStripCharFilter, which will strip tags from XML for indexing (like
HtmlStripCharFilter), and also allows you to specify XML element names
to include/exclude. Not full XPath, but might work for you? You would
have to compile the 2 java files and place them in your solr classpath
since the patch has not been committed.
-Mike
Re: Store complete XML record (DIH & XPathEntityProcessor)
Posted by ka...@gmx.de.
Hi g, Hi Chantal
I had the same problem.
You can use XPathEntityProcessor but you have to insert an xsl. The drawback is performance "wasting": See
http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html
Best regards
Karsten
-------- Original-Nachricht --------
> Datum: Mon, 1 Aug 2011 12:17:45 +0200
> Von: Chantal Ackermann <ch...@btelligent.de>
> An: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Betreff: Re: Store complete XML record (DIH & XPathEntityProcessor)
> Hi g,
>
> ok, I understand your problem, now. (Sorry for answering that late.)
>
> I don't think PlainTextEntityProcessor can help you. It does not take a
> regex. LineEntityProcessor does but your record elements probably do not
> come on their own line each and you wouldn't want to depend on that,
> anyway.
>
> I guess you would be best off writing your own entity processor - maybe
> by extending XPath EP if that gives you some advantage. You can of
> course also implement your own importer using SolrJ and your favourite
> XML parser framework - or any other programming language.
>
> If you are looking for a config-only solution - i'm not sure that there
> is one. Someone else might be able to comment on that?
>
> Cheers,
> Chantal
>
>
> On Thu, 2011-07-28 at 19:17 +0200, solruser@9913 wrote:
> > Thanks Chantal
> > I am ok with the second call and I already tried using that.
> Unfortunatly
> > It reads the whole file into a field. My file is as below example
> > <xml >
> > <record>
> > ...
> > </record>
> >
> > <record>
> > ...
> > </record>
> >
> > <record>
> > ...
> > </record>
> >
> > </xml>
> >
> > Now the XPATH does the 'for each /record' part. For each record I also
> need
> > to store the raw log in there. If I use the PlainTextEntityProcessor
> then
> > it gives me the whole file (from <xml> .. </xml> ) and not each of the
> > <record> </record>
> >
> > Am I using the PlainTextEntityProcessor wrong?
> >
> > THanks
> > g
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Store-complete-XML-record-DIH-XPathEntityProcessor-tp3205524p3207203.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>