You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/12/16 17:53:10 UTC

[GitHub] [accumulo] ctubbsii commented on pull request #1259: update du command to use hdfs iterator

ctubbsii commented on pull request #1259:
URL: https://github.com/apache/accumulo/pull/1259#issuecomment-996045874


   > I am not sure exactly what the first and 3rd numbers are saying or how useful they are
   
   They are saying that's the amount of space that is used by files only on that table, and not on the other table, whereas the line in the middle is the amount of space used by files the tables have in common. This seems very useful. One example is to see how much space you would recover from deleting the second table. You can see that you might not save much, because a lot of its data is in files shared with the first table.
   
   > 2021-12-16T12:14:36,461 [Shell.audit] INFO : root@uno> du ci  -h
   >    36.37G [ci2, ci3]
   > This telling the user it has files shared with those other tables. We could add an option to print shared file paths as well.
   
   I think that's a little confusing. Does it mean that all of its data is shared with the second table, or just that it has some data in common? The current view is much more clear about that, and doesn't require us to do any more work than necessary to evaluate the files for the table that was asked about.
   
   In any case, changing the behavior of this command seems out of scope of this PR. I think the idea here was simply to optimize performance in collecting the file sizes. If that can be done, that'd be great, but I don't think we should be changing the behavior at the same time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org