You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Jason Kushmaul (JIRA)" <ji...@apache.org> on 2016/10/05 12:55:20 UTC

[jira] [Comment Edited] (FLUME-2994) flume-taildir-source: support for windows

    [ https://issues.apache.org/jira/browse/FLUME-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15548622#comment-15548622 ] 

Jason Kushmaul edited comment on FLUME-2994 at 10/5/16 12:54 PM:
-----------------------------------------------------------------

Uniqueness of FIleKey.hashCode:
Can you be more specific about why it might not be as unique?  My hope was that with this hashCode override it would be unique on the filesystem.
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/classes/sun/nio/ch/FileKey.java
{noformat}
   52      public int hashCode() {
   53           return (int)(dwVolumeSerialNumber ^ (dwVolumeSerialNumber >>> 32)) +
   54                  (int)(nFileIndexHigh ^ (nFileIndexHigh >>> 32)) +
   55                  (int)(nFileIndexLow ^ (nFileIndexHigh >>> 32));
   56       }
{noformat}
I'm not defending it, it's more that I can't tell you how unique that will be, so I was hoping you could do the opposite and tell my how unique it will not be.  What I can tell you is that from run to run, the same value was achieved, and was different for the very small number of files I tested.

I think this is warranted now - I will provide some data on this and how unique it is.  If you have any suggestions on that please let me know and I'll be sure to include them, otherwise, I'll just get started with what I am thinking of right now which is to generate a configurable amount of files and then check the fileKey.hashCode on them for uniqueness.  Crude but I think will prove it worthy (or not).

tailFiles Map:
The only place FileKey is used is to get an "inode" like value on windows so I don't think we should use that in tailFiles map as it would proliferate windows workaround object to the rest of the code rather than keeping it contained in that single function.  (Did I misread what you were asking).
I would continue to use Long in tailFiles map, because on unix, that is the primary way to identify the files other than path (which path can change if a file is "mv"d). 
  


was (Author: jkushmaul):
Uniqueness of FIleKey.hashCode:
Can you be more specific about why it might not be as unique?  My hope was that with this hashCode override: 
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/windows/classes/sun/nio/ch/FileKey.java
{noformat}
   52      public int hashCode() {
   53           return (int)(dwVolumeSerialNumber ^ (dwVolumeSerialNumber >>> 32)) +
   54                  (int)(nFileIndexHigh ^ (nFileIndexHigh >>> 32)) +
   55                  (int)(nFileIndexLow ^ (nFileIndexHigh >>> 32));
   56       }
{noformat}
I'm not defending it, it's more that I can't tell you how unique that will be, so I was hoping you could do the opposite and tell my how unique it will not be.  What I can tell you is that from run to run, the same value was achieved, and was different for the very small number of files I tested.

I think this is warranted now - I will provide some data on this and how unique it is.  If you have any suggestions on that please let me know and I'll be sure to include them, otherwise, I'll just get started with what I am thinking of right now which is to generate a configurable amount of files and then check the fileKey.hashCode on them for uniqueness.  Crude but I think will prove it worthy (or not).

tailFiles Map:
The only place FileKey is used is to get an "inode" like value on windows so I don't think we should use that in tailFiles map as it would proliferate windows workaround object to the rest of the code rather than keeping it contained in that single function.  (Did I misread what you were asking).
I would continue to use Long in tailFiles map, because on unix, that is the primary way to identify the files other than path (which path can change if a file is "mv"d). 
  

> flume-taildir-source: support for windows
> -----------------------------------------
>
>                 Key: FLUME-2994
>                 URL: https://issues.apache.org/jira/browse/FLUME-2994
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources, Windows
>    Affects Versions: v1.7.0
>            Reporter: Jason Kushmaul
>            Assignee: Jason Kushmaul
>            Priority: Trivial
>             Fix For: v1.7.0
>
>         Attachments: FLUME-2994-2.patch, taildir-mac.conf, taildir-win8.1.conf
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of flume-taildir-source does not support windows.
> The only reason for this from what I can see is a simple call to Files.getAttribute(file.toPath(), "unix:ino");
> I've tested an equivalent for windows (which of course does not work on non-windows).  With an OS switch we should be able to identify a file independent of file name on either system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)