You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Ben Turner <be...@pobox.com> on 2013/04/23 09:22:00 UTC

Re: Broken socket pipe when writing a PNG to Tika (server mode)

Hi Dave,

Apologies to come back to this over a month later, but we had worked around
/ not seen the issue for a while, but as we start to ramp up our testing
it's come back.
Investigating it from several angles today, the problem seems to be that
SOME PNG files are failing when being parsed by Tika, but only when the -T
or -t switch is applied.

So I am currently running tika locally (under Java 1.6.0_26) using the
following command:  java -jar ~/software/tika/tika-app-1.3.jar -t -s -p 9100

And then running the following Ruby code (under ruby 1.8.7 patch 371,
although I think this would work on all releases)

#!/usr/bin/env ruby
require 'socket'

class FileStreamer
  attr_reader :filename

  def initialize(filename)
    @filename = filename
  end

  def do_it
    TCPSocket.open('127.0.0.1', 9100) do |socket|
      File.open(filename) do |file|
        content = file.read
        socket.write(content)
        socket.close_write
        puts socket.read
      end
    end
  end
end

file_streamer = FileStreamer.new('./Pictures/test.png').do_it

--> This then throws the following error:

Errno::ECONNRESET: (eval):19:in `read': Connection reset by peer
from
/home/bturner/.rbenv/versions/1.8.7-p371/lib/ruby/gems/1.8/gems/interactive_editor-0.0.10/lib/interactive_editor.rb:55:in
`eval'
 from (eval):19:in `do_it'
from (eval):15:in `open'
from (eval):15:in `do_it'
 from (eval):14:in `open'
from (eval):14:in `do_it'
from (eval):26

The file I am using to cause this error can be downloaded from
http://imgur.com/r/quotesporn/hUGXn using the "Download Full Resolution"
link - or this direct link: http://bit.ly/ZLT9Xs

Our process is trying to extract content only (and not metadata) from all
files that are thrown at it - we realise this means PNG and JPEG files will
return nothing, but we're trying to handle all files the same, where
possible, as we can't be 100% sure of the file types before processing.
Hence we use the -t flag, and NOT the -m flag. It should be noted that
changing the -t flag to -m flag causes the PNG to be correctly processed
with a blank return value. Also it should be noted that we've not
experienced this behaviour from JPEGs or other "no textual content" formats
so far.

Thanks and regards,
Ben




On 13 March 2013 11:12, Dave Meikle <lo...@gmail.com> wrote:

> Hi Ben,
>
> On 12 Mar 2013, at 05:33, Ben Turner <be...@pobox.com> wrote:
>
> > * We then talk to it via ruby sockets (for non-rubyists, this streams a
> document from the file system into our local tika server over a simple
> socket) :
> >
> > #!/usr/bin/env ruby
> > require 'socket'
> > TCPSocket.open('127.0.0.1', 12345) do |socket|
> >    File.open('/tmp/test.png', 'r') do |chunk|
> >      socket.write(chunk)
> >    end
> >    socket.close_write
> >    puts socket.read
> > end
>
> There is no know fault around this so tried this locally, and with a wee
> tweak to the Ruby code to use socket.write(chunk.read), it works for me
> with all document types.  I also used -m on the server to make sure the PNG
> was being processed and it dumps back the metadata.
>
> Is there anything else in the way over the network (firewall, IDS, etc)?
>
> Cheers,
> Dave
>
>
>

Re: Broken socket pipe when writing a PNG to Tika (server mode)

Posted by Ben Turner <be...@pobox.com>.
Dave,

My environment is Ubuntu 10.10 - my colleague has reproduced on Ubuntu
10.04 too

Ben


On 1 May 2013 04:38, Dave Meikle <lo...@gmail.com> wrote:

> Hi Ben,
>
> On 23 Apr 2013, at 08:22, Ben Turner <be...@pobox.com> wrote:
>
> > Hi Dave,
> >
> > Apologies to come back to this over a month later, but we had worked
> around / not seen the issue for a while, but as we start to ramp up our
> testing it's come back.
> > Investigating it from several angles today, the problem seems to be that
> SOME PNG files are failing when being parsed by Tika, but only when the -T
> or -t switch is applied.
>
> No problem at all.
>
> > Errno::ECONNRESET: (eval):19:in `read': Connection reset by peer
> >       from
> /home/bturner/.rbenv/versions/1.8.7-p371/lib/ruby/gems/1.8/gems/interactive_editor-0.0.10/lib/interactive_editor.rb:55:in
> `eval'
> >       from (eval):19:in `do_it'
> >       from (eval):15:in `open'
> >       from (eval):15:in `do_it'
> >       from (eval):14:in `open'
> >       from (eval):14:in `do_it'
> >       from (eval):26
>
> Quick question - what operating system are you running on?
>
> I cannot get this to fail locally on my MacBook even when performing a
> range of tests across different files but performing a quick test on a
> Linux environment before jumping on the train seems to reproduce this issue.
>
> I will try to confirm this and narrow things down later on but would be
> interested in your environment too.
>
> Cheers,
> Dave
>
>

Re: Broken socket pipe when writing a PNG to Tika (server mode)

Posted by Dave Meikle <lo...@gmail.com>.
Hi Ben,

On 23 Apr 2013, at 08:22, Ben Turner <be...@pobox.com> wrote:

> Hi Dave,
> 
> Apologies to come back to this over a month later, but we had worked around / not seen the issue for a while, but as we start to ramp up our testing it's come back.
> Investigating it from several angles today, the problem seems to be that SOME PNG files are failing when being parsed by Tika, but only when the -T or -t switch is applied.

No problem at all.

> Errno::ECONNRESET: (eval):19:in `read': Connection reset by peer
> 	from /home/bturner/.rbenv/versions/1.8.7-p371/lib/ruby/gems/1.8/gems/interactive_editor-0.0.10/lib/interactive_editor.rb:55:in `eval'
> 	from (eval):19:in `do_it'
> 	from (eval):15:in `open'
> 	from (eval):15:in `do_it'
> 	from (eval):14:in `open'
> 	from (eval):14:in `do_it'
> 	from (eval):26

Quick question - what operating system are you running on?

I cannot get this to fail locally on my MacBook even when performing a range of tests across different files but performing a quick test on a Linux environment before jumping on the train seems to reproduce this issue.

I will try to confirm this and narrow things down later on but would be interested in your environment too.

Cheers,
Dave