You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by kevin slote <ks...@gmail.com> on 2014/04/10 15:35:19 UTC

tika-server returning ill formed xml in 1.3

Hey everyone, I am using tika-server-1.3 to extract metadata from an image
and instead of the regular comma separated values it returns large blogs of
xml inside of the csv type data.

Here is what the returned data looks like:

"Compression Lossless","true"
"Text TextEntry","keyword=XML:com.adobe.xmp, value=<x:xmpmeta
xmlns:x=""adobe:ns:meta/"" x:xmptk=""XMP Core 5.1.2"">
   <rdf:RDF xmlns:rdf=""http://www.w3.org/1999/02/22-rdf-syntax-ns#"">
      <rdf:Description rdf:about=""""
            xmlns:dc=""http://purl.org/dc/elements/1.1/"">
         <dc:subject>
            <rdf:Bag/>
         </dc:subject>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>, language=, compression=none"
"Chroma BlackIsZero","true"



Is this an expected output or a bug?  I have never seen the metadata look
like this.

Re: tika-server returning ill formed xml in 1.3

Posted by Sergey Beryozkin <sb...@gmail.com>.
Hi, can you open a JIRA issue, attach a test image (if possible) and 
type how the server is called ? I don't think it is a server issue, but 
Tika experts can see what is otherwise wrong if you create a JIRA issue

Cheers, Sergey
On 10/04/14 14:35, kevin slote wrote:
> Hey everyone, I am using tika-server-1.3 to extract metadata from an image
> and instead of the regular comma separated values it returns large blogs of
> xml inside of the csv type data.
>
> Here is what the returned data looks like:
>
> "Compression Lossless","true"
> "Text TextEntry","keyword=XML:com.adobe.xmp, value=<x:xmpmeta
> xmlns:x=""adobe:ns:meta/"" x:xmptk=""XMP Core 5.1.2"">
>     <rdf:RDF xmlns:rdf=""http://www.w3.org/1999/02/22-rdf-syntax-ns#"">
>        <rdf:Description rdf:about=""""
>              xmlns:dc=""http://purl.org/dc/elements/1.1/"">
>           <dc:subject>
>              <rdf:Bag/>
>           </dc:subject>
>        </rdf:Description>
>     </rdf:RDF>
> </x:xmpmeta>, language=, compression=none"
> "Chroma BlackIsZero","true"
>
>
>
> Is this an expected output or a bug?  I have never seen the metadata look
> like this.
>