You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Sergei Lilichenko (Jira)" <ji...@apache.org> on 2020/08/12 20:22:00 UTC
[jira] [Created] (BEAM-10692) Change
org.apache.beam.sdk.extensions.ml.CloudVision to associate the
AnnotateImageResponses with the image data used for the annotation
Sergei Lilichenko created BEAM-10692:
----------------------------------------
Summary: Change org.apache.beam.sdk.extensions.ml.CloudVision to associate the AnnotateImageResponses with the image data used for the annotation
Key: BEAM-10692
URL: https://issues.apache.org/jira/browse/BEAM-10692
Project: Beam
Issue Type: New Feature
Components: extensions-java-gcp
Affects Versions: 2.22.0
Reporter: Sergei Lilichenko
There is a problem with the design of that transform. It takes a PCollection<String> (in case of GCS URIs) in and outputs PCollection<List<AnnotateImageResponse>>. There is no way to associate the responses with the original file URIs. [ImageAnnotationContext|https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#ImageAnnotationContext] is returned as part of the response, but the "uri" is empty for the majority of annotations (looks like it's only populated for file annotations and not for image annotations).
One approach is to return KV<String, List<AnnotateImageResponse>> for images where the key is the GCS URI and for bytes to pass an id of any type and do KV<IDTYPE, List<AnnotateImageResponse>>.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)