You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Runomu <ce...@gmail.com> on 2014/11/12 19:22:31 UTC

Tika Api consumes given stream

I use Apache Tika bundle dependency for a Project to find out MimeTypes for
Files. due to some issues we have to find out through InputStream. it is
actually guaranteed to mark / reset given InputStream. Tika-Bundle includes
core and parser api and uses PoifscontainerDetector , ZipContainerDetector,
OggDetector, MimeTypes and Magic for detection. I have been debugging for 3
hours and all of Detectors mark and reset after detection. I did it in
following way.

TikaInputStream tis = null;
    try {
        TikaConfig config = new TikaConfig();
        tikaDetector = config.getDetector();
        tis =  TikaInputStream.get(in);
        MediaType mediaType = tikaDetector.detect(tis, new Metadata());

        if (mediaType != null) {
            String[] types = mediaType.toString().split(",");

            for (int i = 0; i < types.length; i++) {
                mimeTypes.add(new MimeType(types[i]));
            }
        }

    } catch (Exception e) {
        logger.error("Mime Type for given Stream could not be resolved: ",
e);
    } 

But Stream is consumed. Does anyone know how to find out MimeTypes without
consuming Stream?








--
View this message in context: http://lucene.472066.n3.nabble.com/Tika-Api-consumes-given-stream-tp4168960.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Re: Tika Api consumes given stream

Posted by Tyler Palsulich <tp...@gmail.com>.
Shot in the dark here, as I haven't tried this. But, have you tried using
mark/reset on the TikaInputStream? That should forward the requests on to
the underlying InputStream and hopefully work.

Tyler

On Wed, Nov 12, 2014 at 1:22 PM, Runomu <ce...@gmail.com> wrote:

> I use Apache Tika bundle dependency for a Project to find out MimeTypes for
> Files. due to some issues we have to find out through InputStream. it is
> actually guaranteed to mark / reset given InputStream. Tika-Bundle includes
> core and parser api and uses PoifscontainerDetector , ZipContainerDetector,
> OggDetector, MimeTypes and Magic for detection. I have been debugging for 3
> hours and all of Detectors mark and reset after detection. I did it in
> following way.
>
> TikaInputStream tis = null;
>     try {
>         TikaConfig config = new TikaConfig();
>         tikaDetector = config.getDetector();
>         tis =  TikaInputStream.get(in);
>         MediaType mediaType = tikaDetector.detect(tis, new Metadata());
>
>         if (mediaType != null) {
>             String[] types = mediaType.toString().split(",");
>
>             for (int i = 0; i < types.length; i++) {
>                 mimeTypes.add(new MimeType(types[i]));
>             }
>         }
>
>     } catch (Exception e) {
>         logger.error("Mime Type for given Stream could not be resolved: ",
> e);
>     }
>
> But Stream is consumed. Does anyone know how to find out MimeTypes without
> consuming Stream?
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tika-Api-consumes-given-stream-tp4168960.html
> Sent from the Apache Tika - Development mailing list archive at Nabble.com.
>