You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 18:40:43 UTC

[GitHub] [beam] kennknowles opened a new issue, #18496: AppliedPTransform is used as a key in hashmaps but PTransform is not hashable/equality-comparable

kennknowles opened a new issue, #18496:
URL: https://github.com/apache/beam/issues/18496

   There's plenty of occurrences in runners-core of Map or BiMap where the key is an AppliedPTransform.
   
   However, PTransform does not advertise that it is required to implement equals/hashCode, and some transforms can't do it properly anyway - for example, transforms that capture a ValueProvider which is also not hashable/eq-comparable. I'm surprised that things aren't already very broken because of this.
   
   Fundamentally, I don't see why we should ever compare two PTransform's for equality.
   
   I looked at the code and wondered "can AppliedPTransform simply be identity-hashable", but right now the answer is no because we can create an AppliedPTransform for the same transform applied to the same thing multiple times.
   
   Fixing that appears to be not very easy, but definitely possible. Ideally TransformHierarchy.Node would just know its AppliedPTransform, however a Node can be constructed when there's yet no Pipeline. Suppose there's gotta be some way to propagate a Pipeline into Node.finishSpecifying() (which should be called exactly once on the Node, and this should be enforced), and have finishSpecifying() return the AppliedPTransform, and have the caller use that instead of potentially repeatedly calling .toAppliedPTransform() on the same Node.
   
   [~kenn] is on vacation but perhaps [~tgroh] can help with this meanwhile?
   
   CC: [~reuvenlax]
   
   Imported from Jira [BEAM-2699](https://issues.apache.org/jira/browse/BEAM-2699). Original Jira may contain additional context.
   Reported by: jkff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org