You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "Johnson, Jaya" <Ja...@moodys.com> on 2018/03/13 21:05:41 UTC

XBRL documents.

Can Tika parse XBRL documents it's a variation of an XML document.

Thanks.
-----------------------------------------

Moody's monitors email communications through its networks for regulatory compliance purposes and to protect its customers, employees and business and where allowed to do so by applicable law. The information contained in this e-mail message, and any attachment thereto, is confidential and may not be disclosed without our express permission. If you are not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution or copying of this message, or any attachment thereto, in whole or in part, is strictly prohibited. If you have received this message in error, please immediately notify us by telephone, fax or e-mail and delete the message and all of its attachments. Every effort is made to keep our network free from viruses. You should, however, review this e-mail message, as well as any attachment thereto, for viruses. We take no responsibility and have no liability for any computer virus which may be transferred via this e-mail message.

-----------------------------------------

Re: XBRL documents.

Posted by Chris Mattmann <ma...@apache.org>.
Dear Jaya,

 

Thanks for reaching out. If we don’t have a parser yet, we could always add one.
I would be happy to help show you how to do this or anyone in the community you
can start here:

 

http://tika.apache.org/1.16/parser_guide.html 

 

Thanks,

Chris

 

 

 

From: "Johnson, Jaya" <Ja...@moodys.com>
Reply-To: <us...@tika.apache.org>
Date: Tuesday, March 13, 2018 at 2:06 PM
To: "user@tika.apache.org" <us...@tika.apache.org>
Subject: XBRL documents.

 

Can Tika parse XBRL documents it's a variation of an XML document.

 

Thanks.

-----------------------------------------

Moody's monitors email communications through its networks for regulatory compliance purposes and to protect its customers, employees and business and where allowed to do so by applicable law. The information contained in this e-mail message, and any attachment thereto, is confidential and may not be disclosed without our express permission. If you are not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution or copying of this message, or any attachment thereto, in whole or in part, is strictly prohibited. If you have received this message in error, please immediately notify us by telephone, fax or e-mail and delete the message and all of its attachments. Every effort is made to keep our network free from viruses. You should, however, review this e-mail message, as well as any attachment thereto, for viruses. We take no responsibility and have no liability for any computer virus which may be transferred via this e-mail message. 

-----------------------------------------


RE: XBRL documents.

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Tika's default handling of xml is to scrape out the text and ignore the entities and attributes, IIRC.  So, if that's the behavior you want, and your XBRLs are well-formed XML, you'll be good to go.

If they're non-standard XML or if you want the node names and attributes, you may have to add your own parser, which should be straightforward[1].

The best way to see what Tika will do is to download tika-app[2], start up the GUI and drop in a file to see what you get.

[1] https://tika.apache.org/1.17/parser_guide.html
[2] http://www.apache.org/dyn/closer.cgi/tika/tika-app-1.17.jar

From: Johnson, Jaya [mailto:Jaya.Johnson@moodys.com]
Sent: Tuesday, March 13, 2018 5:06 PM
To: user@tika.apache.org
Subject: XBRL documents.

Can Tika parse XBRL documents it's a variation of an XML document.

Thanks.
-----------------------------------------
Moody's monitors email communications through its networks for regulatory compliance purposes and to protect its customers, employees and business and where allowed to do so by applicable law. The information contained in this e-mail message, and any attachment thereto, is confidential and may not be disclosed without our express permission. If you are not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution or copying of this message, or any attachment thereto, in whole or in part, is strictly prohibited. If you have received this message in error, please immediately notify us by telephone, fax or e-mail and delete the message and all of its attachments. Every effort is made to keep our network free from viruses. You should, however, review this e-mail message, as well as any attachment thereto, for viruses. We take no responsibility and have no liability for any computer virus which may be transferred via this e-mail message.
-----------------------------------------