You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by GitBox <gi...@apache.org> on 2021/05/14 02:40:12 UTC

[GitHub] [tika] lewismc opened a new pull request #444: TIKA-3403 Create example for Transcription

lewismc opened a new pull request #444:
URL: https://github.com/apache/tika/pull/444


   This issue addresses https://issues.apache.org/jira/browse/TIKA-3403
   In addition to implementing the example file, it proposes the following improvements
   * minor upgrade of aws libraries to `1.11.1018`
   * adds a new configuration option for the AWS transcriber allowing client to write to a specific region cf. `transcribe.REGION`
   * makes use of [SelectObjectContentRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/SelectObjectContentRequest.html) which filters the contents of an Amazon S3 object (transcription) based on a simple Structured Query Language (SQL) statement. In the request, along with the SQL expression, we specify JSON as the data serialization format of the object. Amazon S3 uses this to parse object data into records, and returns only records that match the specified SQL expression. In our case this means we ONLY return the transcription text. This dramatically (orders of magnitude) reduces the amount of data we egress from s3 to client.
   * the implementation will now automatically create the bucket (to store the transcription) if one does not already exist. This is a merely a utility feature.
   * introduces a LOT of exception handling and checks which will assist the client in debugging errors/anomalies. 
   * Reformatted GoogleTranslator.java with 4-space indents.
   
   Thanks about it.
   
   CC @rohan2810 FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] rohan2810 commented on pull request #444: TIKA-3403 Create example for Transcription

Posted by GitBox <gi...@apache.org>.
rohan2810 commented on pull request #444:
URL: https://github.com/apache/tika/pull/444#issuecomment-840971142


   Sure @lewismc 
   @phantuanminh @abehara2 @nprate2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] tballison merged pull request #444: TIKA-3403 Create example for Transcription

Posted by GitBox <gi...@apache.org>.
tballison merged pull request #444:
URL: https://github.com/apache/tika/pull/444


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on pull request #444: TIKA-3403 Create example for Transcription

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #444:
URL: https://github.com/apache/tika/pull/444#issuecomment-840967172


   @rohan2810 can you please tag the rest of the HackIllinois crew? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org