You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Fatih Pazarbasi (Jira)" <ji...@apache.org> on 2021/08/13 06:22:00 UTC
[jira] [Created] (TIKA-3523) A replacement for enableFileUrl or
Support for Google Cloud
Fatih Pazarbasi created TIKA-3523:
-------------------------------------
Summary: A replacement for enableFileUrl or Support for Google Cloud
Key: TIKA-3523
URL: https://issues.apache.org/jira/browse/TIKA-3523
Project: Tika
Issue Type: Wish
Components: tika-server
Affects Versions: 2.0.0
Reporter: Fatih Pazarbasi
Hello,
I have a setup where users upload their files to a cloud bucket and I forward the fileUrl to make ocr on them in a serverless cloud instance. I do it this way so the users do not contact with the Tika Server and I have a copy of what they've sent to process it. Also they have nothing to do with the unprocessed response.
Now that you've removed the enableFileUrl... I have to download the files to the backend instance from the cloud bucket they have uploaded their files to, and put them to /tika server back again...
I tried the following config.xml to work around the situation but it was in vain...
For the made up url: [https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf|https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/]
{code:java}
<fetchers>
<fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher">
<params>
<name>fsf</name>
<basePath>https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o</basePath>
</params>
</fetcher>
</fetchers>
<emitters>
<emitter class="org.apache.tika.pipes.emitter.fs.FileSystemEmitter">
<params>
<name>fse</name>
<basePath>gs://abcd-efgh.appspot.com/users</basePath>
</params>
</emitter>
</emitters>
<server>
<params>
<enableUnsecureFeatures>true</enableUnsecureFeatures>
</params>
</server>
<pipes>
<params>
<tikaConfig>/path/to/tika-config.xml</tikaConfig>
</params>
</pipes>{code}
{code:java}
headers: {
Accept: 'text/plain',
'User-Agent': 'Firebase Functions',
fetcherName: 'fsf',
fetchKey: 'somefilethatdoesnotexist.pdf',
},{code}
It doesn't support the gs:// Google Storage bucket either. I have all the necessary permissions but it didn't help.
In the golden times of 1.2x Iwas simply using:
{code:java}
headers: {
Accept: 'text/plain',
'User-Agent': 'Firebase Functions',
fileUrl: 'https://firebasestorage.googleapis.com/v0/b/abcd-efgh.appspot.com/o/somefilethatdoesnotexist.pdf',
},{code}
Am I missing something?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)