You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by GitBox <gi...@apache.org> on 2021/02/26 05:25:51 UTC

[GitHub] [tika] lewismc opened a new pull request #406: WIP: TIKA-94 Speech recognition

lewismc opened a new pull request #406:
URL: https://github.com/apache/tika/pull/406


   This is a WIP on the work we are doing as fulfillment of the Hackillinois program.
   We will be adding to this and I will be making comments in here.
   Great work team on the work so far... 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] rohan2810 commented on a change in pull request #406: WIP: TIKA-94 Speech recognition

Posted by GitBox <gi...@apache.org>.
rohan2810 commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r583421600



##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,90 @@
+/*

Review comment:
       I was trying to follow a similar pattern to that of the Translate module.
   The interface for translating is called "Translator".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] rohan2810 commented on a change in pull request #406: WIP: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
rohan2810 commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r587924916



##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika 2.1
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException   If an I/O exception of some sort has occurred.
+     * @since 2.1
+     */
+    public String transcribe(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException   If an I/O exception of some sort has occurred.
+     * @since 2.1
+     */
+    public String transcribe(String filePath, String sourceLanguage) throws TikaException, IOException;

Review comment:
       Yes I have taken this into consideration and pushed changes accordingly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on a change in pull request #406: WIP: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
lewismc commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r586751972



##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+/**
+ * Interface for Transcriber services.
+ *

Review comment:
       Please add `@see <a href="https://issues.apache.org/jira/browse/TIKA-94">TIKA-94</a>`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc edited a comment on pull request #406: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
lewismc edited a comment on pull request #406:
URL: https://github.com/apache/tika/pull/406#issuecomment-791632511


   I feel that this patch is ready for testing by the community so I removed the `WIP`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on a change in pull request #406: WIP: TIKA-94 Speech recognition

Posted by GitBox <gi...@apache.org>.
lewismc commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r586093006



##########
File path: tika-transcribe/pom.xml
##########
@@ -0,0 +1,144 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+    <modelVersion>4.0.0</modelVersion>
+
+    <parent>
+        <groupId>org.apache.tika</groupId>
+        <artifactId>tika-parent</artifactId>
+        <version>2.0.0-SNAPSHOT</version>
+        <relativePath>../tika-parent/pom.xml</relativePath>
+    </parent>
+
+    <artifactId>tika-transcribe</artifactId>
+    <packaging>bundle</packaging>
+    <name>Apache Tika transcribe</name>
+    <url>http://tika.apache.org/</url>
+    <!--TODO use latest aws version or the one defined in the tika-parent-->
+    <dependencies>
+        <dependency>
+            <groupId>org.apache.tika</groupId>
+            <artifactId>tika-core</artifactId>
+            <version>${project.version}</version>
+        </dependency>
+        <dependency>
+            <groupId>com.amazonaws</groupId>
+            <artifactId>aws-java-sdk-transcribe</artifactId>
+            <version>${aws.version}</version>
+        </dependency>

Review comment:
       ```
   +            <exclusions>
   +                <exclusion>
   +                    <groupId>commons-logging</groupId>
   +                    <artifactId>commons-logging</artifactId>
   +                </exclusion>
   +            </exclusions>
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] abehara2 commented on pull request #406: WIP: TIKA-94 Speech recognition

Posted by GitBox <gi...@apache.org>.
abehara2 commented on pull request #406:
URL: https://github.com/apache/tika/pull/406#issuecomment-787403113


   **Current issues we are working on**
   - Figure out how to instantiate and connect to AWS Transcribe and S3 webservices
   - Video to audio conversion
   
   **What we have done so far**
   -  Implementations of speech-to-text through transcribe service and speech interface
   - Made speech interface non AWS dependent through auto key generation using UUID
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on a change in pull request #406: WIP: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
lewismc commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r586749522



##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika 2.1
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException   If an I/O exception of some sort has occurred.
+     * @since 2.1
+     */
+    public String transcribe(String filePath) throws TikaException, IOException;

Review comment:
       Please change this from `transcribe(String filePath)` to `transcribe(InputStream is)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc closed pull request #406: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
lewismc closed pull request #406:
URL: https://github.com/apache/tika/pull/406


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on a change in pull request #406: WIP: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
lewismc commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r586751972



##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+/**
+ * Interface for Transcriber services.
+ *

Review comment:
       Please add `@see https://issues.apache.org/jira/browse/TIKA-94`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on a change in pull request #406: WIP: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
lewismc commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r586749707



##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika 2.1
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException   If an I/O exception of some sort has occurred.
+     * @since 2.1
+     */
+    public String transcribe(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException   If an I/O exception of some sort has occurred.
+     * @since 2.1
+     */
+    public String transcribe(String filePath, String sourceLanguage) throws TikaException, IOException;

Review comment:
       Please change this from `transcribe(String filePath, String sourceLanguage
   )` to `transcribe(InputStream is, String sourceLanguage)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on a change in pull request #406: WIP: TIKA-94 Speech recognition

Posted by GitBox <gi...@apache.org>.
lewismc commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r583392435



##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,90 @@
+/*

Review comment:
       Please change name of interface from `Transcriber.java` to `Transcribe.java`
   Why? 
   The Interface doesn't do the transcribing... the implementation does.

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+import com.amazonaws.services.transcribe.model.LanguageCode;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO

Review comment:
       Excellent. Thank you for adding this. We will populate it when we complete the pull request.

##########
File path: tika-core/pom.xml
##########
@@ -84,6 +84,12 @@
       <artifactId>junit</artifactId>
       <scope>test</scope>
     </dependency>
+      <dependency>

Review comment:
       Please push this into the `tika-translate` module

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+import com.amazonaws.services.transcribe.model.LanguageCode;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * @return

Review comment:
       First, we need a description of the interface. This is REALLY important
   Next we add parameters
   Then we add `@throws`
   then return
   
   This method signature needs to change. It is too tighly coupled to the AWS transcribe input. Please model the interface on the `tika-translate` API. 

##########
File path: tika-transcribe/src/main/resources/org/apache/tika/transcribe/transcribe/transcribe.amazon.properties
##########
@@ -0,0 +1,18 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+transcribe.AWS_ACCESS_KEY=dummy_key
+transcribe.AWS_SECRET_KEY=dummy_key
+transcribe.BUCKET_NAME=dummy_name

Review comment:
       I feel that we need to put more out of the interface and into the imlementation. The same goes for pushing more backend-specific methos parameters into this config file. 

##########
File path: tika-transcribe/src/main/java/org/apache/tika/transcribe/transcribe/AmazonTranscribe.java
##########
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe.transcribe;
+import java.io.File;
+
+import com.amazonaws.services.transcribe.model.*;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.transcribe.Transcriber;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.model.PutObjectRequest;
+import com.amazonaws.services.transcribe.AmazonTranscribeAsync;
+
+import java.io.IOException;
+import java.util.Properties;
+
+
+public class AmazonTranscribe implements Transcriber {
+
+    private AmazonTranscribeAsync amazonTranscribe;
+
+    private AmazonS3 amazonS3;
+
+    private static final Logger LOG = LoggerFactory.getLogger(AmazonTranscribe.class);
+
+    private String bucketName;
+
+    private boolean isAvailable; // Flag for whether or not translation is available.
+
+    private String clientId;
+
+    private String clientSecret;  // Keys used for the API calls.
+
+//    private HashSet<String> validSourceLanguages = new HashSet<>(Arrays.asList("en-US", "en-GB", "es-US", "fr-CA", "fr-FR", "en-AU",

Review comment:
       Is this not available from the AWS Java API? This is difficult  to maintain otherwise. 

##########
File path: tika-transcribe/src/main/java/org/apache/tika/transcribe/transcribe/AmazonTranscribe.java
##########
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe.transcribe;
+import java.io.File;
+
+import com.amazonaws.services.transcribe.model.*;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.transcribe.Transcriber;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.model.PutObjectRequest;
+import com.amazonaws.services.transcribe.AmazonTranscribeAsync;
+
+import java.io.IOException;
+import java.util.Properties;
+
+
+public class AmazonTranscribe implements Transcriber {
+
+    private AmazonTranscribeAsync amazonTranscribe;
+
+    private AmazonS3 amazonS3;
+
+    private static final Logger LOG = LoggerFactory.getLogger(AmazonTranscribe.class);
+
+    private String bucketName;
+
+    private boolean isAvailable; // Flag for whether or not translation is available.
+
+    private String clientId;
+
+    private String clientSecret;  // Keys used for the API calls.
+
+//    private HashSet<String> validSourceLanguages = new HashSet<>(Arrays.asList("en-US", "en-GB", "es-US", "fr-CA", "fr-FR", "en-AU",
+//            "it-IT", "de-DE", "pt-BR", "ja-JP", "ko-KR"));  // Valid inputs to StartStreamTranscription for language of source file (audio)
+
+    public AmazonTranscribe() {
+        this.isAvailable = true;
+        Properties config = new Properties();
+        try {
+            config.load(AmazonTranscribe.class
+                    .getResourceAsStream(
+                            "transcribe.amazon.properties"));
+            this.clientId = config.getProperty("transcribe.AWS_ACCESS_KEY");
+            this.clientSecret = config.getProperty("transcribe.AWS_SECRET_KEY");
+            this.bucketName = config.getProperty("transcribe.BUCKET_NAME");
+
+        } catch (Exception e) {
+            LOG.warn("Exception reading config file", e);
+            isAvailable = false;
+        }
+    }
+
+    
+    /**
+     * Audio to text function without language specification
+     * @param fileName
+     * @return Transcribed text
+     * @throws TikaException
+     * @throws IOException
+     */
+    @Override
+    public void startTranscribeAudio(String fileName, String jobName) throws TikaException, IOException {
+        if (!isAvailable())
+            return;
+        StartTranscriptionJobRequest startTranscriptionJobRequest = new StartTranscriptionJobRequest();
+        Media media = new Media();
+        media.setMediaFileUri(amazonS3.getUrl(bucketName, fileName).toString());

Review comment:
       What about source language?

##########
File path: tika-transcribe/src/test/java/org/apache/tika/transcibe/transcibe/AmazonTranscribeGuessLanguageTest.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.transcibe.transcibe;
+
+import org.apache.tika.transcribe.transcribe.AmazonTranscribe;
+import org.junit.Before;
+import org.junit.Test;
+
+import static junit.framework.TestCase.assertNotNull;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+public class AmazonTranscribeGuessLanguageTest {
+    AmazonTranscribe transcriber;

Review comment:
       This should be
   ```
   Transcribe transcriber;
   ```
   

##########
File path: tika-transcribe/src/main/java/org/apache/tika/transcribe/transcribe/AmazonTranscribe.java
##########
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe.transcribe;
+import java.io.File;
+
+import com.amazonaws.services.transcribe.model.*;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.transcribe.Transcriber;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.model.PutObjectRequest;
+import com.amazonaws.services.transcribe.AmazonTranscribeAsync;
+
+import java.io.IOException;
+import java.util.Properties;
+
+
+public class AmazonTranscribe implements Transcriber {
+
+    private AmazonTranscribeAsync amazonTranscribe;
+
+    private AmazonS3 amazonS3;
+
+    private static final Logger LOG = LoggerFactory.getLogger(AmazonTranscribe.class);
+
+    private String bucketName;
+
+    private boolean isAvailable; // Flag for whether or not translation is available.
+
+    private String clientId;
+
+    private String clientSecret;  // Keys used for the API calls.
+
+//    private HashSet<String> validSourceLanguages = new HashSet<>(Arrays.asList("en-US", "en-GB", "es-US", "fr-CA", "fr-FR", "en-AU",
+//            "it-IT", "de-DE", "pt-BR", "ja-JP", "ko-KR"));  // Valid inputs to StartStreamTranscription for language of source file (audio)
+
+    public AmazonTranscribe() {
+        this.isAvailable = true;
+        Properties config = new Properties();
+        try {
+            config.load(AmazonTranscribe.class
+                    .getResourceAsStream(
+                            "transcribe.amazon.properties"));
+            this.clientId = config.getProperty("transcribe.AWS_ACCESS_KEY");
+            this.clientSecret = config.getProperty("transcribe.AWS_SECRET_KEY");
+            this.bucketName = config.getProperty("transcribe.BUCKET_NAME");
+
+        } catch (Exception e) {
+            LOG.warn("Exception reading config file", e);
+            isAvailable = false;
+        }
+    }
+
+    
+    /**
+     * Audio to text function without language specification
+     * @param fileName
+     * @return Transcribed text
+     * @throws TikaException
+     * @throws IOException
+     */
+    @Override
+    public void startTranscribeAudio(String fileName, String jobName) throws TikaException, IOException {
+        if (!isAvailable())
+            return;
+        StartTranscriptionJobRequest startTranscriptionJobRequest = new StartTranscriptionJobRequest();
+        Media media = new Media();
+        media.setMediaFileUri(amazonS3.getUrl(bucketName, fileName).toString());
+        startTranscriptionJobRequest.withMedia(media)
+                .withOutputBucketName(this.bucketName)
+                .setTranscriptionJobName(jobName);
+        amazonTranscribe.startTranscriptionJob(startTranscriptionJobRequest);
+    }
+
+    /**
+     * Audio to text function with language specification
+     * @param fileName
+     * @param sourceLanguage
+     * @return Transcribed text
+     * @throws TikaException
+     * @throws IOException
+     */
+    @Override
+    public void startTranscribeAudio(String fileName, LanguageCode sourceLanguage, String jobName) throws TikaException, IOException {
+        if (!isAvailable())
+			return;
+        StartTranscriptionJobRequest startTranscriptionJobRequest = new StartTranscriptionJobRequest();
+        Media media = new Media();
+        media.setMediaFileUri(amazonS3.getUrl(bucketName, fileName).toString());
+        startTranscriptionJobRequest.withMedia(media)
+                .withLanguageCode(sourceLanguage)
+                .withOutputBucketName(this.bucketName)
+                .setTranscriptionJobName(jobName);
+        amazonTranscribe.startTranscriptionJob(startTranscriptionJobRequest);
+    }
+
+    @Override
+    public void startTranscribeVideo(String fileName, String jobName) throws TikaException, IOException {
+        if (!isAvailable())
+            return;
+        //TODO
+
+    }
+
+    /**
+     * Audio to text function with language specification
+     * @param fileName
+     * @param sourceLanguage
+     * @return Transcribed text
+     * @throws TikaException
+     * @throws IOException
+     */
+    @Override
+    public void startTranscribeVideo(String fileName, LanguageCode sourceLanguage, String jobName) throws TikaException, IOException {
+        if (!isAvailable())
+            return;
+        //boolean validSourceLanguageFlag = validSourceLanguages.contains(sourceLanguage); // Checks if sourceLanguage in validSourceLanguages O(1) lookup time
+
+        //if (!validSourceLanguageFlag) { // Throws TikaException if the input sourceLanguage is not present in validSourceLanguages
+        //    throw new TikaException("Provided Source Language is Not Valid. Run without language parameter or please select one of: " +
+        //           "en-US, en-GB, es-US, fr-CA, fr-FR, en-AU, it-IT, de-DE, pt-BR, ja-JP, ko-KR"); }
+        //TODO
+
+    }
+
+    /**
+     * @return Valid AWS Credentials
+     */
+	public boolean isAvailable() {
+		return this.isAvailable;
+	}
+
+    /** Gets Transcriptioni result from AWS S3 bucket given bucketNamee and key
+     * @param key
+     * @return
+     */
+    @Override
+    public String getTranscriptResult(String key) {
+        TranscriptionJob transcriptionJob = retrieveObjectWhenJobCompleted(key);
+        if (transcriptionJob != null && !TranscriptionJobStatus.FAILED.equals(transcriptionJob.getTranscriptionJobStatus())) {
+            return amazonS3.getObjectAsString(this.bucketName, key + ".json");
+        } else
+            return null;
+    }
+
+    /**
+     * Private helper function to get object from s3
+     * @param key
+     * @return
+     */
+    private TranscriptionJob retrieveObjectWhenJobCompleted(String key) {
+        GetTranscriptionJobRequest getTranscriptionJobRequest = new GetTranscriptionJobRequest();
+        getTranscriptionJobRequest.setTranscriptionJobName(key);
+
+        while (true) {
+            GetTranscriptionJobResult innerResult = amazonTranscribe.getTranscriptionJob(getTranscriptionJobRequest);
+            String status = innerResult.getTranscriptionJob().getTranscriptionJobStatus();
+            if (TranscriptionJobStatus.COMPLETED.name().equals(status) ||
+                    TranscriptionJobStatus.FAILED.name().equals(status)) {
+                return innerResult.getTranscriptionJob();
+            }
+        }
+    }
+
+    /**
+     * Call this method in order to upload a file to the Amazon S3 bucket.
+     * @param bucketName
+     * @param fileName
+     * @param fullFileName
+     */
+    @Override
+    public void uploadFileToBucket(String bucketName, String fileName, String fullFileName) {

Review comment:
       This needs to exist in the AWS implementation but NOT in the Transcribe Interface. 

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+import com.amazonaws.services.transcribe.model.LanguageCode;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * @return
+     * @param fileName
+     * @param jobName
+     * @throws TikaException       When there is an error translating.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public void startTranscribeAudio(String fileName, String jobName) throws TikaException, IOException;
+
+    /**
+     * @return
+     * @param fileName
+     * @param sourceLanguage
+     * @param jobName
+     * @throws TikaException       When there is an error translating.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public void startTranscribeAudio(String fileName, LanguageCode sourceLanguage, String jobName) throws TikaException, IOException;
+
+    /**
+     * @return
+     * @param fileName
+     * @param jobName
+     * @throws TikaException       When there is an error translating.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public void startTranscribeVideo(String fileName, String jobName) throws TikaException, IOException;
+
+    /**
+     * @return
+     * @param fileName
+     * @param jobName
+     * @param sourceLanguage
+     * @throws TikaException       When there is an error translating.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public void startTranscribeVideo(String fileName, LanguageCode sourceLanguage, String jobName) throws TikaException, IOException;
+
+    /**
+     * Gets transcription result from S3
+     * @param key
+     * @return
+     */
+    public String getTranscriptResult(String key);
+
+    /**
+     * Upload file to s3
+     * @param bucketName
+     * @param fileName
+     * @param filePath
+     */
+    public void uploadFileToBucket(String bucketName, String fileName, String filePath);

Review comment:
       This should never be in an interface. This is WAY to AWS specific. 

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+

Review comment:
       Remove whitespace

##########
File path: tika-transcribe/src/main/java/org/apache/tika/transcribe/transcribe/AmazonTranscribe.java
##########
@@ -0,0 +1,193 @@
+/*

Review comment:
       Please look at the package naming here...
   ```
   tika-transcribe/src/main/java/org/apache/tika/transcribe/transcribe/
   ```
   should be
   ```
   tika-transcribe/src/main/java/org/apache/tika/transcribe
   ```
   
   

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+import com.amazonaws.services.transcribe.model.LanguageCode;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * @return
+     * @param fileName

Review comment:
       Also, what about the language implementation the transcription service should work on?

##########
File path: tika-transcribe/pom.xml
##########
@@ -0,0 +1,144 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+    <modelVersion>4.0.0</modelVersion>
+
+    <parent>
+        <groupId>org.apache.tika</groupId>
+        <artifactId>tika-parent</artifactId>
+        <version>2.0.0-SNAPSHOT</version>
+        <relativePath>../tika-parent/pom.xml</relativePath>
+    </parent>
+
+    <artifactId>tika-transcribe</artifactId>
+    <packaging>bundle</packaging>
+    <name>Apache Tika transcribe</name>
+    <url>http://tika.apache.org/</url>
+    <!--TODO use latest aws version or the one defined in the tika-parent-->

Review comment:
       Defining in `tika-parent` is fine but not in `tika-core`

##########
File path: tika-transcribe/src/main/java/org/apache/tika/transcribe/transcribe/AmazonTranscribe.java
##########
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe.transcribe;
+import java.io.File;
+
+import com.amazonaws.services.transcribe.model.*;

Review comment:
       Please order all `import`s alphabetically

##########
File path: tika-transcribe/src/test/java/org/apache/tika/transcibe/transcibe/AmazonTranscribeGuessLanguageTest.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.transcibe.transcibe;
+
+import org.apache.tika.transcribe.transcribe.AmazonTranscribe;
+import org.junit.Before;
+import org.junit.Test;
+
+import static junit.framework.TestCase.assertNotNull;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+public class AmazonTranscribeGuessLanguageTest {
+    AmazonTranscribe transcriber;
+
+    @Before
+    public void setUp() {
+        transcriber = new AmazonTranscribe();
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageAudioShortTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of ShortAudioSample.mp3
+        /*
+        URL res = getClass().getClassLoader().getResource("ShortAudioSample.mp3");
+        File file = Paths.get(res.toURI()).toFile();
+        String absolutePath = file.getAbsolutePath();
+        Necessary to get the correct file path from our test resource folder? */
+        //TODO: is the above commented block necessary to obtain the proper filepath for a file located in the tika-translate/test/resources directory?
+
+        String audioFilePath = "src/test/resources/ShortAudioSample.mp3";
+        String result = null;
+
+        if (transcriber.isAvailable()) {
+            try {
+                result = transcriber.transcribeAudio(audioFilePath);
+                assertNotNull(result);
+                assertEquals("Result: [" + result
+                                + "]: not equal to expected: [" + expected + "]",
+                        expected, result);
+            } catch (Exception e) {
+                e.printStackTrace();
+                fail(e.getMessage());
+            }
+        }
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageAudioLongTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of LongAudioSample.mp3
+        String audioFilePath = "src/test/resources/LongAudioSample.mp3";

Review comment:
       Where is this file?

##########
File path: tika-transcribe/src/main/java/org/apache/tika/transcribe/transcribe/AmazonTranscribe.java
##########
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe.transcribe;
+import java.io.File;
+
+import com.amazonaws.services.transcribe.model.*;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.transcribe.Transcriber;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.model.PutObjectRequest;
+import com.amazonaws.services.transcribe.AmazonTranscribeAsync;
+
+import java.io.IOException;
+import java.util.Properties;
+
+
+public class AmazonTranscribe implements Transcriber {
+
+    private AmazonTranscribeAsync amazonTranscribe;
+
+    private AmazonS3 amazonS3;
+
+    private static final Logger LOG = LoggerFactory.getLogger(AmazonTranscribe.class);
+
+    private String bucketName;
+
+    private boolean isAvailable; // Flag for whether or not translation is available.
+
+    private String clientId;
+
+    private String clientSecret;  // Keys used for the API calls.
+
+//    private HashSet<String> validSourceLanguages = new HashSet<>(Arrays.asList("en-US", "en-GB", "es-US", "fr-CA", "fr-FR", "en-AU",
+//            "it-IT", "de-DE", "pt-BR", "ja-JP", "ko-KR"));  // Valid inputs to StartStreamTranscription for language of source file (audio)
+
+    public AmazonTranscribe() {
+        this.isAvailable = true;
+        Properties config = new Properties();
+        try {
+            config.load(AmazonTranscribe.class
+                    .getResourceAsStream(
+                            "transcribe.amazon.properties"));
+            this.clientId = config.getProperty("transcribe.AWS_ACCESS_KEY");
+            this.clientSecret = config.getProperty("transcribe.AWS_SECRET_KEY");
+            this.bucketName = config.getProperty("transcribe.BUCKET_NAME");
+
+        } catch (Exception e) {
+            LOG.warn("Exception reading config file", e);
+            isAvailable = false;
+        }
+    }
+
+    
+    /**
+     * Audio to text function without language specification
+     * @param fileName
+     * @return Transcribed text

Review comment:
       Please populate all of this Javadoc based upon the guidance I provided above. 

##########
File path: tika-transcribe/src/test/java/org/apache/tika/transcibe/transcibe/AmazonTranscribeGuessLanguageTest.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.transcibe.transcibe;
+
+import org.apache.tika.transcribe.transcribe.AmazonTranscribe;
+import org.junit.Before;
+import org.junit.Test;
+
+import static junit.framework.TestCase.assertNotNull;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+public class AmazonTranscribeGuessLanguageTest {
+    AmazonTranscribe transcriber;
+
+    @Before
+    public void setUp() {
+        transcriber = new AmazonTranscribe();
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageAudioShortTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of ShortAudioSample.mp3
+        /*
+        URL res = getClass().getClassLoader().getResource("ShortAudioSample.mp3");
+        File file = Paths.get(res.toURI()).toFile();
+        String absolutePath = file.getAbsolutePath();
+        Necessary to get the correct file path from our test resource folder? */
+        //TODO: is the above commented block necessary to obtain the proper filepath for a file located in the tika-translate/test/resources directory?
+
+        String audioFilePath = "src/test/resources/ShortAudioSample.mp3";

Review comment:
       Where is this file?

##########
File path: tika-transcribe/src/test/java/org/apache/tika/transcibe/transcibe/AmazonTranscribeGuessLanguageTest.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.transcibe.transcibe;
+
+import org.apache.tika.transcribe.transcribe.AmazonTranscribe;
+import org.junit.Before;
+import org.junit.Test;
+
+import static junit.framework.TestCase.assertNotNull;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+public class AmazonTranscribeGuessLanguageTest {
+    AmazonTranscribe transcriber;
+
+    @Before
+    public void setUp() {
+        transcriber = new AmazonTranscribe();
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageAudioShortTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of ShortAudioSample.mp3
+        /*
+        URL res = getClass().getClassLoader().getResource("ShortAudioSample.mp3");
+        File file = Paths.get(res.toURI()).toFile();
+        String absolutePath = file.getAbsolutePath();
+        Necessary to get the correct file path from our test resource folder? */
+        //TODO: is the above commented block necessary to obtain the proper filepath for a file located in the tika-translate/test/resources directory?
+
+        String audioFilePath = "src/test/resources/ShortAudioSample.mp3";
+        String result = null;
+
+        if (transcriber.isAvailable()) {
+            try {
+                result = transcriber.transcribeAudio(audioFilePath);
+                assertNotNull(result);
+                assertEquals("Result: [" + result
+                                + "]: not equal to expected: [" + expected + "]",
+                        expected, result);
+            } catch (Exception e) {
+                e.printStackTrace();
+                fail(e.getMessage());
+            }
+        }
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageAudioLongTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of LongAudioSample.mp3
+        String audioFilePath = "src/test/resources/LongAudioSample.mp3";
+        String result = null;
+
+        if (transcriber.isAvailable()) {
+            try {
+                result = transcriber.transcribeAudio(audioFilePath);
+                assertNotNull(result);
+                assertEquals("Result: [" + result
+                                + "]: not equal to expected: [" + expected + "]",
+                        expected, result);
+            } catch (Exception e) {
+                e.printStackTrace();
+                fail(e.getMessage());
+            }
+        }
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageShortVideoTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of ShortVideoSample.mp4
+        String videoFilePath = "src/test/resources/ShortVideoSample.mp4";

Review comment:
       Where  is this file?

##########
File path: tika-transcribe/src/test/java/org/apache/tika/transcibe/transcibe/AmazonTranscribeGuessLanguageTest.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.transcibe.transcibe;
+
+import org.apache.tika.transcribe.transcribe.AmazonTranscribe;
+import org.junit.Before;
+import org.junit.Test;
+
+import static junit.framework.TestCase.assertNotNull;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+public class AmazonTranscribeGuessLanguageTest {
+    AmazonTranscribe transcriber;
+
+    @Before
+    public void setUp() {
+        transcriber = new AmazonTranscribe();
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageAudioShortTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of ShortAudioSample.mp3
+        /*
+        URL res = getClass().getClassLoader().getResource("ShortAudioSample.mp3");
+        File file = Paths.get(res.toURI()).toFile();
+        String absolutePath = file.getAbsolutePath();
+        Necessary to get the correct file path from our test resource folder? */
+        //TODO: is the above commented block necessary to obtain the proper filepath for a file located in the tika-translate/test/resources directory?
+
+        String audioFilePath = "src/test/resources/ShortAudioSample.mp3";
+        String result = null;
+
+        if (transcriber.isAvailable()) {
+            try {
+                result = transcriber.transcribeAudio(audioFilePath);
+                assertNotNull(result);
+                assertEquals("Result: [" + result
+                                + "]: not equal to expected: [" + expected + "]",
+                        expected, result);
+            } catch (Exception e) {
+                e.printStackTrace();
+                fail(e.getMessage());
+            }
+        }
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageAudioLongTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of LongAudioSample.mp3
+        String audioFilePath = "src/test/resources/LongAudioSample.mp3";
+        String result = null;
+
+        if (transcriber.isAvailable()) {
+            try {
+                result = transcriber.transcribeAudio(audioFilePath);
+                assertNotNull(result);
+                assertEquals("Result: [" + result
+                                + "]: not equal to expected: [" + expected + "]",
+                        expected, result);
+            } catch (Exception e) {
+                e.printStackTrace();
+                fail(e.getMessage());
+            }
+        }
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageShortVideoTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of ShortVideoSample.mp4
+        String videoFilePath = "src/test/resources/ShortVideoSample.mp4";
+        String result = null;
+
+        if (transcriber.isAvailable()) {
+            try {
+                result = transcriber.transcribeVideo(videoFilePath);
+                assertNotNull(result);
+                assertEquals("Result: [" + result
+                                + "]: not equal to expected: [" + expected + "]",
+                        expected, result);
+            } catch (Exception e) {
+                e.printStackTrace();
+                fail(e.getMessage());
+            }
+        }
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageLongVideoTest() {
+        String expected = "hello sir";
+        //TODO: "expected" should be changed to reflect the contents of LongVideoSample.mp4
+        String videoFilePath = "src/test/resources/LongVideoSample.mp4";

Review comment:
       ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on pull request #406: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #406:
URL: https://github.com/apache/tika/pull/406#issuecomment-791632511


   I feel that this patch is ready for testing by the community. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] phantuanminh commented on a change in pull request #406: WIP: TIKA-94 Speech recognition

Posted by GitBox <gi...@apache.org>.
phantuanminh commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r584266199



##########
File path: tika-transcribe/src/test/java/org/apache/tika/transcibe/transcibe/AmazonTranscribeGuessLanguageTest.java
##########
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.transcibe.transcibe;
+
+import org.apache.tika.transcribe.transcribe.AmazonTranscribe;
+import org.junit.Before;
+import org.junit.Test;
+
+import static junit.framework.TestCase.assertNotNull;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+public class AmazonTranscribeGuessLanguageTest {
+    AmazonTranscribe transcriber;
+
+    @Before
+    public void setUp() {
+        transcriber = new AmazonTranscribe();
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageAudioShortTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of ShortAudioSample.mp3
+        /*
+        URL res = getClass().getClassLoader().getResource("ShortAudioSample.mp3");
+        File file = Paths.get(res.toURI()).toFile();
+        String absolutePath = file.getAbsolutePath();
+        Necessary to get the correct file path from our test resource folder? */
+        //TODO: is the above commented block necessary to obtain the proper filepath for a file located in the tika-translate/test/resources directory?
+
+        String audioFilePath = "src/test/resources/ShortAudioSample.mp3";
+        String result = null;
+
+        if (transcriber.isAvailable()) {
+            try {
+                result = transcriber.transcribeAudio(audioFilePath);
+                assertNotNull(result);
+                assertEquals("Result: [" + result
+                                + "]: not equal to expected: [" + expected + "]",
+                        expected, result);
+            } catch (Exception e) {
+                e.printStackTrace();
+                fail(e.getMessage());
+            }
+        }
+    }
+
+    @Test
+    public void AmazonTranscribeGuessLanguageAudioLongTest() {
+        String expected = "where is the bus stop? where is the bus stop?";
+        //TODO: "expected" should be changed to reflect the contents of LongAudioSample.mp3
+        String audioFilePath = "src/test/resources/LongAudioSample.mp3";

Review comment:
       We create a resource folder in the test folder to store test file




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on a change in pull request #406: WIP: TIKA-94 Speech recognition

Posted by GitBox <gi...@apache.org>.
lewismc commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r585064537



##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given audio file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException

Review comment:
       Please complete

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given audio file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException
+     * @since TODO
+     */
+
+    public String startTranscribeAudio(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the audio file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public String startTranscribeAudio(String filePath, String sourceLanguage) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the video file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException

Review comment:
       Complete

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given audio file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException
+     * @since TODO
+     */
+
+    public String startTranscribeAudio(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the audio file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException

Review comment:
       Please complete

##########
File path: tika-transcribe/src/main/java/org/apache/tika/transcribe/AmazonTranscribe.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.model.PutObjectRequest;
+import com.amazonaws.services.transcribe.AmazonTranscribeAsync;
+import com.amazonaws.services.transcribe.model.*;
+import org.apache.tika.exception.TikaException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Properties;
+import java.util.UUID;
+
+
+public class AmazonTranscribe implements Transcriber {
+
+    public static final String PROPERTIES_FILE = "transcribe.amazon.properties";
+    public static final String ID_PROPERTY = "transcribe.AWS_ACCESS_KEY";
+    public static final String SECRET_PROPERTY = "transcribe.AWS_SECRET_KEY";
+    public static final String DEFAULT_ID = "dummy-id";
+    public static final String DEFAULT_SECRET = "dummy-secret";
+    public static final String DEFAULT_BUCKET = "dummy-bucket";
+    public static final String BUCKET_NAME = "transcribe.BUCKET_NAME";
+
+    private static final Logger LOG = LoggerFactory.getLogger(AmazonTranscribe.class);
+    private AmazonTranscribeAsync amazonTranscribe;
+    private AmazonS3 amazonS3;
+    private String bucketName;
+    private boolean isAvailable; // Flag for whether or not translation is available.
+    private String clientId;
+    private String clientSecret;  // Keys used for the API calls.
+//    private HashSet<String> validSourceLanguages = new HashSet<>(Arrays.asList("en-US", "en-GB", "es-US", "fr-CA", "fr-FR", "en-AU",

Review comment:
       Remove

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given audio file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException
+     * @since TODO

Review comment:
       2.1

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given audio file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException
+     * @since TODO
+     */
+
+    public String startTranscribeAudio(String filePath) throws TikaException, IOException;

Review comment:
       Please change al method names from `startTranscribe...` to simply `transcribe...`
   Please also populate this throughout the `tika-transcribe` module. 

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given audio file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException
+     * @since TODO
+     */
+
+    public String startTranscribeAudio(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the audio file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public String startTranscribeAudio(String filePath, String sourceLanguage) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the video file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public String startTranscribeVideo(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the video file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO

Review comment:
       2.1

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given audio file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException
+     * @since TODO
+     */
+
+    public String startTranscribeAudio(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the audio file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO

Review comment:
       2.1

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given audio file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException
+     * @since TODO
+     */
+
+    public String startTranscribeAudio(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the audio file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public String startTranscribeAudio(String filePath, String sourceLanguage) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the video file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO

Review comment:
       2.1

##########
File path: tika-transcribe/src/test/java/org/apache/tika/transcribe/AmazonTranscribeSimpleTest.java
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.tika.transcribe;
+
+import org.apache.tika.transcribe.AmazonTranscribe;
+import org.junit.Before;
+import org.junit.Ignore;
+import org.junit.Test;
+
+import static org.junit.Assert.*;
+
+//TODO: Check whether the expected Strings are correct (does it include punctuation? case?)
+//TODO: Consider testing longer audio and video file, is there any points doing that?

Review comment:
       Please address both of these issues by implementing unit tests.
   One for punctuation and one for a longer file. 

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given audio file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException
+     * @since TODO
+     */
+
+    public String startTranscribeAudio(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the audio file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public String startTranscribeAudio(String filePath, String sourceLanguage) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the video file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public String startTranscribeVideo(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the video file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public String startTranscribeVideo(String filePath, String sourceLanguage) throws TikaException, IOException;
+
+    /**
+     * @return true if this Transcriber is probably able to translate right now.
+     * @since Tika TODO

Review comment:
       2.1

##########
File path: tika-transcribe/src/main/java/org/apache/tika/transcribe/AmazonTranscribe.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.model.PutObjectRequest;
+import com.amazonaws.services.transcribe.AmazonTranscribeAsync;
+import com.amazonaws.services.transcribe.model.*;
+import org.apache.tika.exception.TikaException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Properties;
+import java.util.UUID;
+
+
+public class AmazonTranscribe implements Transcriber {

Review comment:
       Add description about required configuration and any specific characteristics for this Transcriber

##########
File path: tika-transcribe/src/main/java/org/apache/tika/transcribe/AmazonTranscribe.java
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.model.PutObjectRequest;
+import com.amazonaws.services.transcribe.AmazonTranscribeAsync;
+import com.amazonaws.services.transcribe.model.*;

Review comment:
       Never use wildcard imports. All explicitly define the import you need. 

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+import com.amazonaws.services.transcribe.model.LanguageCode;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO

Review comment:
       Also please describe the interface. 

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+import com.amazonaws.services.transcribe.model.LanguageCode;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO

Review comment:
       Please just add 2.1

##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika TODO
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given audio file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException
+     * @since TODO
+     */
+
+    public String startTranscribeAudio(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the audio file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public String startTranscribeAudio(String filePath, String sourceLanguage) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the video file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException
+     * @since TODO
+     */
+    public String startTranscribeVideo(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the video file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException       When there is an error transcribing.
+     * @throws java.io.IOException

Review comment:
       Complete




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc merged pull request #406: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
lewismc merged pull request #406:
URL: https://github.com/apache/tika/pull/406


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] abehara2 commented on a change in pull request #406: WIP: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
abehara2 commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r587923974



##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.transcribe;
+
+import org.apache.tika.exception.TikaException;
+
+import java.io.IOException;
+
+/**
+ * Interface for Transcriber services.
+ *
+ * @since Tika 2.1
+ */
+public interface Transcriber {
+    /**
+     * Transcribe the given file.
+     *
+     * @param filePath The path of the file to be transcribed.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException   If an I/O exception of some sort has occurred.
+     * @since 2.1
+     */
+    public String transcribe(String filePath) throws TikaException, IOException;
+
+    /**
+     * Transcribe the given the file and the source language.
+     *
+     * @param filePath       The path of the file to be transcribed.
+     * @param sourceLanguage The language code for the language used in the input media file.
+     * @return key for transcription lookup
+     * @throws TikaException When there is an error transcribing.
+     * @throws IOException   If an I/O exception of some sort has occurred.
+     * @since 2.1
+     */
+    public String transcribe(String filePath, String sourceLanguage) throws TikaException, IOException;

Review comment:
       Just a comment on this, the uploadFiletoBucket function requires a java File type so you might have to alter the corresponding function to read the file and convert from InputStream type to File




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc closed pull request #406: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
lewismc closed pull request #406:
URL: https://github.com/apache/tika/pull/406


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] rohan2810 commented on a change in pull request #406: WIP: TIKA-94 Speech recognition

Posted by GitBox <gi...@apache.org>.
rohan2810 commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r583426359



##########
File path: tika-core/pom.xml
##########
@@ -84,6 +84,12 @@
       <artifactId>junit</artifactId>
       <scope>test</scope>
     </dependency>
+      <dependency>

Review comment:
       do you mean tika-transcribe?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on pull request #406: [TIKA-94] Speech-to-text transcription

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #406:
URL: https://github.com/apache/tika/pull/406#issuecomment-831595625


   @tballison I know you and I spoke about refactoring this as simple a parser interface... 
   I would like to merge it for the time being and I can begin to work on the refactoring in a separate ticket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] rohan2810 commented on a change in pull request #406: WIP: TIKA-94 Speech recognition

Posted by GitBox <gi...@apache.org>.
rohan2810 commented on a change in pull request #406:
URL: https://github.com/apache/tika/pull/406#discussion_r583421600



##########
File path: tika-core/src/main/java/org/apache/tika/transcribe/Transcriber.java
##########
@@ -0,0 +1,90 @@
+/*

Review comment:
       I was trying to follow a similar pattern to that of the Translator.
   The interface for translating is called "Translator".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org