You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Sergei Lilichenko (Jira)" <ji...@apache.org> on 2020/08/12 20:22:00 UTC

[jira] [Created] (BEAM-10692) Change org.apache.beam.sdk.extensions.ml.CloudVision to associate the AnnotateImageResponses with the image data used for the annotation

Sergei Lilichenko created BEAM-10692:
----------------------------------------

             Summary: Change org.apache.beam.sdk.extensions.ml.CloudVision to associate the AnnotateImageResponses with the image data used for the annotation
                 Key: BEAM-10692
                 URL: https://issues.apache.org/jira/browse/BEAM-10692
             Project: Beam
          Issue Type: New Feature
          Components: extensions-java-gcp
    Affects Versions: 2.22.0
            Reporter: Sergei Lilichenko


There is a problem with the design of that transform. It takes a PCollection<String> (in case of GCS URIs) in and outputs PCollection<List<AnnotateImageResponse>>. There is no way to associate the responses with the original file URIs. [ImageAnnotationContext|https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#ImageAnnotationContext] is returned as part of the response, but the "uri" is empty for the majority of annotations (looks like it's only populated for file annotations and not for image annotations).

One approach is to return KV<String, List<AnnotateImageResponse>> for images where the key is the GCS URI and for bytes to pass an id of any type and do KV<IDTYPE, List<AnnotateImageResponse>>.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)