You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by GitBox <gi...@apache.org> on 2022/04/14 00:20:26 UTC

[GitHub] [datafu] petrpulc opened a new pull request, #21: Keep the unmatched single records in joinWithRange

petrpulc opened a new pull request, #21:
URL: https://github.com/apache/datafu/pull/21

   The filter at the end in fact causes the join to behave like 'inner' because it filters out the records from singleDf that have no matching range... because range_start and range_end are null in that case.
   
   There may be addition places where the filtering needs to be changed, I just taken this function for my project...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@datafu.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [datafu] petrpulc closed pull request #21: Keep the unmatched single records in joinWithRange

Posted by GitBox <gi...@apache.org>.
petrpulc closed pull request #21: Keep the unmatched single records in joinWithRange
URL: https://github.com/apache/datafu/pull/21


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@datafu.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [datafu] eyala commented on pull request #21: Keep the unmatched single records in joinWithRange

Posted by GitBox <gi...@apache.org>.
eyala commented on PR #21:
URL: https://github.com/apache/datafu/pull/21#issuecomment-1099986584

   I think you're correct in your analysis - this does make the join basically an inner join. There are two issues that need to be addressed before we can merge this, one theoretical and one practical.
   
   1) I think making it a working left outer join is a good idea, but I don't want to change the existing behavior. What do you think about adding a parameter like the Spark join type? (@uzadude, what do you think?) Then we can make an inner join remain the default.
   
   2) In order to merge it, we would need you to modify/add a test so that your code can be checked automatically. (this is crucial for keeping our project maintainable)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@datafu.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [datafu] petrpulc commented on pull request #21: Keep the unmatched single records in joinWithRange

Posted by GitBox <gi...@apache.org>.
petrpulc commented on PR #21:
URL: https://github.com/apache/datafu/pull/21#issuecomment-1111000052

   Hi, I agree with your suggestions, the initial change set was just to spark (pun intended) the discussion and as my braindump if someone would like to take the issue faster than I was able to. Now I a have a bit of time and will happily provide requested changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@datafu.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [datafu] eyala commented on pull request #21: Keep the unmatched single records in joinWithRange

Posted by GitBox <gi...@apache.org>.
eyala commented on PR #21:
URL: https://github.com/apache/datafu/pull/21#issuecomment-1167388998

   If you want to submit just your test cases, I've made [a JIRA issue for generic test improvements](https://issues.apache.org/jira/browse/DATAFU-164).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@datafu.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [datafu] petrpulc commented on pull request #21: Keep the unmatched single records in joinWithRange

Posted by GitBox <gi...@apache.org>.
petrpulc commented on PR #21:
URL: https://github.com/apache/datafu/pull/21#issuecomment-1111101928

   Well, during testing I actually found a pretty serious issue... if the record falls into `decreased_range_single`, but `range_start` and `range_end` does not contain `single` then I would need to replace the joined data with nulls (and deduplicate) which makes for a significantly harder operation than I anticipated.
   
   Will mark the PR as closed for now but will provide all of my current changes to the branch, so we can return to them later if we need to.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@datafu.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [datafu] eyala commented on pull request #21: Keep the unmatched single records in joinWithRange

Posted by GitBox <gi...@apache.org>.
eyala commented on PR #21:
URL: https://github.com/apache/datafu/pull/21#issuecomment-1127673751

   I think you're right. I would say that it's still worth doing ... but if there are multiple records with the same "key" (the column provided as _single_) I don't see how the records without a range can be safely deduplicated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@datafu.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [datafu] petrpulc closed pull request #21: Keep the unmatched single records in joinWithRange

Posted by GitBox <gi...@apache.org>.
petrpulc closed pull request #21: Keep the unmatched single records in joinWithRange
URL: https://github.com/apache/datafu/pull/21


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@datafu.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [datafu] uzadude commented on pull request #21: Keep the unmatched single records in joinWithRange

Posted by GitBox <gi...@apache.org>.
uzadude commented on PR #21:
URL: https://github.com/apache/datafu/pull/21#issuecomment-1100003021

   sure, let's add a `joinType` parameter like in the skew join methods. let's keep it backward compatible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@datafu.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org