You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Eric Pugh <ep...@opensourceconnections.com> on 2020/02/04 16:25:41 UTC

Anyone can share an example of Java code POSTing a file to Tika-Server?

Shockingly, I can’t find a nice example of Java code being used to do a HTTP POST of a file to Tika Server.

The CURL command that I want to convert to Java code is:

curl -T /myfile.pdf http://localhost:9998/rmeta --header "X-Tika-OCRLanguage: eng" --header "X-Tika-PDFOcrStrategy: ocr_and_text_extraction" --header "X-Tika-OCRoutputType: hocr"

I’m not having much luck, so was wondering if anyone in the community had an example?

Eric



_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Re: Anyone can share an example of Java code POSTing a file to Tika-Server?

Posted by John Patrick <nh...@gmail.com>.
this contains post example plus others, mainly string content not binary content

https://openjdk.java.net/groups/net/httpclient/recipes.html

this contains reading file examples

https://docs.oracle.com/javase/tutorial/essential/io/file.html


i’ll look at doing a pull request to the tika documentation t include a basic example

Sent from my iPhone

> On 4 Feb 2020, at 16:25, Eric Pugh <ep...@opensourceconnections.com> wrote:
> 
> Shockingly, I can’t find a nice example of Java code being used to do a HTTP POST of a file to Tika Server.
> 
> The CURL command that I want to convert to Java code is:
> 
> curl -T /myfile.pdf http://localhost:9998/rmeta --header "X-Tika-OCRLanguage: eng" --header "X-Tika-PDFOcrStrategy: ocr_and_text_extraction" --header "X-Tika-OCRoutputType: hocr"
> 
> I’m not having much luck, so was wondering if anyone in the community had an example?
> 
> Eric
> 
> 
> 
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed	
> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.
> 

Re: Anyone can share an example of Java code POSTing a file to Tika-Server?

Posted by Tim Allison <ta...@apache.org>.
I updated the example to include putting the bytes (tika-server classic)
and sending the filepath in the header:

https://github.com/tballison/tika-addons/blob/master/tika-eval-solrj/src/main/java/org/tallison/tikaeval/example/TikaServerClient.java#L142

On Tue, Feb 4, 2020 at 8:37 AM Tim Allison <ta...@apache.org> wrote:

> Oops...that example requires you to use the non-secure settings and ships
> the file url to tika server to read from a fileshare.  I'll add an example
> with putting bytes later today.
>
>
> On Tue, Feb 4, 2020 at 8:33 AM Tim Allison <ta...@apache.org> wrote:
>
>> Try:
>> https://github.com/tballison/tika-addons/blob/master/tika-eval-solrj/src/main/java/org/tallison/tikaeval/example/TikaServerClient.java#L148
>>
>> On Tue, Feb 4, 2020 at 8:25 AM Eric Pugh <ep...@opensourceconnections.com>
>> wrote:
>>
>>> Shockingly, I can’t find a nice example of Java code being used to do a
>>> HTTP POST of a file to Tika Server.
>>>
>>> The CURL command that I want to convert to Java code is:
>>>
>>> curl -T /myfile.pdf http://localhost:9998/rmeta --header
>>> "X-Tika-OCRLanguage: eng" --header "X-Tika-PDFOcrStrategy:
>>> ocr_and_text_extraction" --header "X-Tika-OCRoutputType: hocr"
>>>
>>> I’m not having much luck, so was wondering if anyone in the community
>>> had an example?
>>>
>>> Eric
>>>
>>>
>>>
>>> _______________________
>>> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
>>> | http://www.opensourceconnections.com | My Free/Busy
>>> <http://tinyurl.com/eric-cal>
>>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
>>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>>> This e-mail and all contents, including attachments, is considered to be
>>> Company Confidential unless explicitly stated otherwise, regardless
>>> of whether attachments are marked as such.
>>>
>>>

Re: Anyone can share an example of Java code POSTing a file to Tika-Server?

Posted by Tim Allison <ta...@apache.org>.
Oops...that example requires you to use the non-secure settings and ships
the file url to tika server to read from a fileshare.  I'll add an example
with putting bytes later today.


On Tue, Feb 4, 2020 at 8:33 AM Tim Allison <ta...@apache.org> wrote:

> Try:
> https://github.com/tballison/tika-addons/blob/master/tika-eval-solrj/src/main/java/org/tallison/tikaeval/example/TikaServerClient.java#L148
>
> On Tue, Feb 4, 2020 at 8:25 AM Eric Pugh <ep...@opensourceconnections.com>
> wrote:
>
>> Shockingly, I can’t find a nice example of Java code being used to do a
>> HTTP POST of a file to Tika Server.
>>
>> The CURL command that I want to convert to Java code is:
>>
>> curl -T /myfile.pdf http://localhost:9998/rmeta --header
>> "X-Tika-OCRLanguage: eng" --header "X-Tika-PDFOcrStrategy:
>> ocr_and_text_extraction" --header "X-Tika-OCRoutputType: hocr"
>>
>> I’m not having much luck, so was wondering if anyone in the community had
>> an example?
>>
>> Eric
>>
>>
>>
>> _______________________
>> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
>> | http://www.opensourceconnections.com | My Free/Busy
>> <http://tinyurl.com/eric-cal>
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>> This e-mail and all contents, including attachments, is considered to be
>> Company Confidential unless explicitly stated otherwise, regardless
>> of whether attachments are marked as such.
>>
>>

Re: Anyone can share an example of Java code POSTing a file to Tika-Server?

Posted by Tim Allison <ta...@apache.org>.
Try:
https://github.com/tballison/tika-addons/blob/master/tika-eval-solrj/src/main/java/org/tallison/tikaeval/example/TikaServerClient.java#L148

On Tue, Feb 4, 2020 at 8:25 AM Eric Pugh <ep...@opensourceconnections.com>
wrote:

> Shockingly, I can’t find a nice example of Java code being used to do a
> HTTP POST of a file to Tika Server.
>
> The CURL command that I want to convert to Java code is:
>
> curl -T /myfile.pdf http://localhost:9998/rmeta --header
> "X-Tika-OCRLanguage: eng" --header "X-Tika-PDFOcrStrategy:
> ocr_and_text_extraction" --header "X-Tika-OCRoutputType: hocr"
>
> I’m not having much luck, so was wondering if anyone in the community had
> an example?
>
> Eric
>
>
>
> _______________________
> *Eric Pugh **| *Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> | http://www.opensourceconnections.com | My Free/Busy
> <http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed
> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>
>