You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Vineet Ghatge Hemantkumar <he...@usc.edu> on 2014/09/26 05:05:40 UTC

Apache Tika - JSON?

Hello all,

I was wondering if there any in built parser to get help in conversion from
XHTML to JSON.

My research showed that there is one named org.apache.io.json which just
one method implemented. Also, I tried GJSON library to do this, but it does
not seem to work with Tika. Any suggestions will be appreciated?

Regards,
Vineet

Re: Apache Tika - JSON?

Posted by Vineet Ghatge Hemantkumar <he...@usc.edu>.
Hi Timothy,

I am using JSON programmatically and yes we cannot do that by default, we
need a parser built one would imagine that a SAX handler would suffice for
this. I think the community should consider building this parser.

On other hand I am not sure as to how helpful the recursive metadata would
be useful. Can the GSON be used for normal data formatting and text
extraction? I am guessing its a no.

Regards,
Vineet

On Fri, Sep 26, 2014 at 5:40 AM, Allison, Timothy B. <ta...@mitre.org>
wrote:

>  I suspect, though, that what you want is not what I answered
> (sorry!)…namely entities mapped from xhtml to json.  For that, I don’t
> think we have anything available in Tika, but it wouldn’t be difficult
> (famous last words) to write a content handler to do that…
>
>
>
> We have integrated the GSON library to serialize/deserialize Metadata
> objects in tika-serialization.
>
>
>
> *From:* Allison, Timothy B. [mailto:tallison@mitre.org]
> *Sent:* Friday, September 26, 2014 6:54 AM
> *To:* user@tika.apache.org
> *Subject:* RE: Apache Tika - JSON?
>
>
>
> The current json output option in the app and server only dump metadata…as
> you probably know.
>
>
>
> I plan to add a json version of the RecursiveParserWrapper (list of
> Metadata objects with one entry for content) to the app shortly.  Would
> that be of any use?
>
>
>
> Are you using the app, the server, or calling Tika programmatically?
>
>
>
>
>
> *From:* Vineet Ghatge Hemantkumar [mailto:hemantku@usc.edu
> <he...@usc.edu>]
> *Sent:* Thursday, September 25, 2014 11:06 PM
> *To:* user@tika.apache.org
> *Subject:* Apache Tika - JSON?
>
>
>
> Hello all,
>
>
>
> I was wondering if there any in built parser to get help in conversion
> from XHTML to JSON.
>
>
>
> My research showed that there is one named org.apache.io.json which just
> one method implemented. Also, I tried GJSON library to do this, but it does
> not seem to work with Tika. Any suggestions will be appreciated?
>
>
>
> Regards,
>
> Vineet
>



-- 
*Vineet Ghatge*

RE: Apache Tika - JSON?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
I suspect, though, that what you want is not what I answered (sorry!)…namely entities mapped from xhtml to json.  For that, I don’t think we have anything available in Tika, but it wouldn’t be difficult (famous last words) to write a content handler to do that…

We have integrated the GSON library to serialize/deserialize Metadata objects in tika-serialization.

From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Friday, September 26, 2014 6:54 AM
To: user@tika.apache.org
Subject: RE: Apache Tika - JSON?

The current json output option in the app and server only dump metadata…as you probably know.

I plan to add a json version of the RecursiveParserWrapper (list of Metadata objects with one entry for content) to the app shortly.  Would that be of any use?

Are you using the app, the server, or calling Tika programmatically?


From: Vineet Ghatge Hemantkumar [mailto:hemantku@usc.edu]
Sent: Thursday, September 25, 2014 11:06 PM
To: user@tika.apache.org<ma...@tika.apache.org>
Subject: Apache Tika - JSON?

Hello all,

I was wondering if there any in built parser to get help in conversion from XHTML to JSON.

My research showed that there is one named org.apache.io.json which just one method implemented. Also, I tried GJSON library to do this, but it does not seem to work with Tika. Any suggestions will be appreciated?

Regards,
Vineet

RE: Apache Tika - JSON?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
The current json output option in the app and server only dump metadata…as you probably know.

I plan to add a json version of the RecursiveParserWrapper (list of Metadata objects with one entry for content) to the app shortly.  Would that be of any use?

Are you using the app, the server, or calling Tika programmatically?


From: Vineet Ghatge Hemantkumar [mailto:hemantku@usc.edu]
Sent: Thursday, September 25, 2014 11:06 PM
To: user@tika.apache.org
Subject: Apache Tika - JSON?

Hello all,

I was wondering if there any in built parser to get help in conversion from XHTML to JSON.

My research showed that there is one named org.apache.io.json which just one method implemented. Also, I tried GJSON library to do this, but it does not seem to work with Tika. Any suggestions will be appreciated?

Regards,
Vineet