You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@beam.apache.org by "Frank Yellin (JIRA)" <ji...@apache.org> on 2018/05/05 17:23:00 UTC

[jira] [Commented] (BEAM-4186) Need to be able to set QuerySplitter in DatastoreIO.v1()

    [ https://issues.apache.org/jira/browse/BEAM-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464843#comment-16464843 ] 

Frank Yellin commented on BEAM-4186:
------------------------------------

I've just looked at 

[https://github.com/GoogleCloudPlatform/appengine-mapreduce/blob/master/java/src/main/java/com/google/appengine/tools/mapreduce/inputs/DatastoreShardStrategy.java]

It seems that the strategy used by the AppEngine mapreduce  is to make the QuerySplitter smarter.  If the Query has an inequality in it, it uses one splitter, if it doesn't, it uses a different splitter.  

That is probably the best option to make everyone happy.  Rather than offering the user their own Query Splitter, just make QuerySplitter be better.  

That effectively moves this problem from apache_beam to Datastore.   I will close this bug

> Need to be able to set QuerySplitter in DatastoreIO.v1()
> --------------------------------------------------------
>
>                 Key: BEAM-4186
>                 URL: https://issues.apache.org/jira/browse/BEAM-4186
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-gcp
>    Affects Versions: 2.4.0
>            Reporter: Frank Yellin
>            Assignee: Frank Yellin
>            Priority: Minor
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> I want to add a method
>       withQuerySplitter(QuerySplitter querySplitter)
> to DatastoreV1.Reader.  The implementation is fairly straightforward, except for enforcing the requirement that the query splitter must be Serializable for this to work.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)