You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kumar8anuj <ku...@gmail.com> on 2011/11/07 14:18:57 UTC

Re: TikaEntityProcessor not working?

I tried to do the same but problem still persist and my document is not
getting indexed. I am using solr 3.4.0 and it was having tika 0.8 i replaced
core and parser jar with the 0.6 but document is not getting indexed. Please
help and nothing is coming in my logs related to that.


--
View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3486898.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

Posted by kumar8anuj <ku...@gmail.com>.
Thanks for the reply Gora, I  tried Googling but didn't find anything on
this. I didn't try this on Tika mailing list ,I will post this to tika
mailing list now. Thanks for the suggestion....



On Mon, Nov 21, 2011 at 9:10 PM, Gora Mohanty-3 [via Lucene] <
ml-node+s472066n3525046h49@n3.nabble.com> wrote:

> On Mon, Nov 21, 2011 at 8:45 PM, kumar8anuj <[hidden email]<http://user/SendEmail.jtp?type=node&node=3525046&i=0>>
> wrote:
> > So where can i get some information on this issue, Can you please help ?
>
> Have you tried simple things like searching Google, using the Tika
> site, and, failing these, asking on a Tika-specific mailing list? No
> offence, but you might do some basic homework yourself.
> * Tika: Not sure how well supported 0.6 is nowadays, but
>    http://tika.apache.org/0.6/gettingstarted.html seems to indicate
>    that version 3.6 of poi is needed. You should also consider
>    switching to a newer version of Tika.
> * If that does not work, please try joining a Tika mailing list, and
>   asking a more specific question there:
>   http://tika.apache.org/mail-lists.html
>
> Regards,
> Gora
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3525046.html
>  To unsubscribe from TikaEntityProcessor not working?, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=856965&code=a3VtYXIuYW51ajhAZ21haWwuY29tfDg1Njk2NXwtMzA0MTQ2MTI5>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
“The more you are willing to accept responsibility for your actions, the
more credibility you will have”
Anuj Kumar
Ph. No.-09873721510


--
View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3526896.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

Posted by Gora Mohanty <go...@mimirtech.com>.
On Mon, Nov 21, 2011 at 8:45 PM, kumar8anuj <ku...@gmail.com> wrote:
> So where can i get some information on this issue, Can you please help ?

Have you tried simple things like searching Google, using the Tika
site, and, failing these, asking on a Tika-specific mailing list? No
offence, but you might do some basic homework yourself.
* Tika: Not sure how well supported 0.6 is nowadays, but
   http://tika.apache.org/0.6/gettingstarted.html seems to indicate
   that version 3.6 of poi is needed. You should also consider
   switching to a newer version of Tika.
* If that does not work, please try joining a Tika mailing list, and
  asking a more specific question there:
  http://tika.apache.org/mail-lists.html

Regards,
Gora

Re: TikaEntityProcessor not working?

Posted by kumar8anuj <ku...@gmail.com>.
So where can i get some information on this issue, Can you please help ?

On Mon, Nov 21, 2011 at 8:17 PM, Erick Erickson [via Lucene] <
ml-node+s472066n3524905h78@n3.nabble.com> wrote:

> Sorry, but I don't really have that info.
>
> Erick
>
> On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj <[hidden email]<http://user/SendEmail.jtp?type=node&node=3524905&i=0>>
> wrote:
> > Erick,
> >          Need your help on this. Waiting for resolution. Please help ...
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html
>
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524905.html
>  To unsubscribe from TikaEntityProcessor not working?, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=856965&code=a3VtYXIuYW51ajhAZ21haWwuY29tfDg1Njk2NXwtMzA0MTQ2MTI5>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
“The more you are willing to accept responsibility for your actions, the
more credibility you will have”
Anuj Kumar
Ph. No.-09873721510


--
View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524975.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

Posted by Erick Erickson <er...@gmail.com>.
Sorry, but I don't really have that info.

Erick

On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj <ku...@gmail.com> wrote:
> Erick,
>          Need your help on this. Waiting for resolution. Please help ...
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: TikaEntityProcessor not working?

Posted by kumar8anuj <ku...@gmail.com>.
Erick, 
          Need your help on this. Waiting for resolution. Please help ... 

--
View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

Posted by kumar8anuj <ku...@gmail.com>.
Earlier issue has been resolved but stuck up on something else. Can you tell
me which poi jar version would work with tika.0.6. Currently I have 
poi-3.7.jar. Error which i am getting is this ....

SEVERE: Exception while processing: js_logins document :
SolrInputDocument[{id=id(1.0)={100984},
complete_mobile_number=complete_mobile_number(1.0)={+91 9600067575},
emailid=emailid(1.0)={vkryali@gmail.com}, full_name=full_name(1.0)={Venkat
Ryali}}]:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.<init>(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:622)
	at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
	at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
	at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
	at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
	at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: java.lang.NoSuchMethodError:
org.apache.poi.xwpf.usermodel.XWPFParagraph.<init>(Lorg/openxmlformats/schemas/wordprocessingml/x2006/main/CTP;Lorg/apache/poi/xwpf/usermodel/XWPFDocument;)V
	at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.<init>(XWPFWordExtractorDecorator.java:163)
	at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator$MyXWPFParagraph.<init>(XWPFWordExtractorDecorator.java:161)
	at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.extractTableContent(XWPFWordExtractorDecorator.java:140)
	at
org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator.buildXHTML(XWPFWordExtractorDecorator.java:91)
	at
org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:69)
	at
org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:51)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
	at
org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:128)
	at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:238)
	at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:596)
	... 7 more


--
View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3506596.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

Posted by Erick Erickson <er...@gmail.com>.
What's not clear is what you are doing to insure that the file names pulled
from your database are being read (from disk? from a shared filesystem
somewhere?), analyzed and sent to Solr.

So, somewhere you need to actually use the file name to pass on to
one of the processors that'll actually send the *contents* of that file
to Solr along with the columns.

For instance, you haven't included your DIH configuration, we can't
tell if you're trying to do anything like that here.

Best
Erick

On Tue, Nov 8, 2011 at 7:15 AM, kumar8anuj <ku...@gmail.com> wrote:
> Erick, As Brad has configured the system, I configured it in the same way and
> then no document indexing was happening and i was not even getting any
> errors in the log. I then changed my Tika to 0.6 and tried it but no
> success. So table columns are getting indexed but document is not. Let me
> know if i m not clear to you.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3490077.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: TikaEntityProcessor not working?

Posted by kumar8anuj <ku...@gmail.com>.
Erick, As Brad has configured the system, I configured it in the same way and
then no document indexing was happening and i was not even getting any
errors in the log. I then changed my Tika to 0.6 and tried it but no
success. So table columns are getting indexed but document is not. Let me
know if i m not clear to you.  

--
View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3490077.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

Posted by Erick Erickson <er...@gmail.com>.
You have to provide a lot more information about what you're doing. Are
you trying to use DIH? the extracting update request handler? What
do your config files look like?

Please review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, Nov 7, 2011 at 8:18 AM, kumar8anuj <ku...@gmail.com> wrote:
> I tried to do the same but problem still persist and my document is not
> getting indexed. I am using solr 3.4.0 and it was having tika 0.8 i replaced
> core and parser jar with the 0.6 but document is not getting indexed. Please
> help and nothing is coming in my logs related to that.
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3486898.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>