You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@jena.apache.org by GitBox <gi...@apache.org> on 2020/11/25 10:44:07 UTC

[GitHub] [jena] afs opened a new pull request #871: JENA-2003: Handle file URIs with URI scheme name

afs opened a new pull request #871:
URL: https://github.com/apache/jena/pull/871


   This fixes the problems with "file:".
   
   Before passing to the OS, this has always been stripped and converted to an OS file name. There are other uses of filename handling code from Apache Commons IO, one of which `RDFLangauages.filenameToLang`, which takes a resource name (URI of filename) and uses the extension to guess the RDF syntax. It is used for file names and also URIs when there isn't content negotiation.
   
   The fix, in IO.java, around line 100  (the rest of the PR is naming and cosmetic changes), is to extract the code from Apache Commons IO , remove the check for ":" as a trailing component of the filename ([NTFS ADS](https://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_(ADS))) - windows uses ":" in two places, drive and ADS. This clashes with the use of URI scheme name, which also uses ":".
   
   The fix is the safest approach - copying and restoring the old behaviour in a low risk fashion. Further factoring can be done after the 3.17.0 release to be tidier.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] kinow commented on a change in pull request #871: JENA-2003: Handle file URIs with URI scheme name

Posted by GitBox <gi...@apache.org>.
kinow commented on a change in pull request #871:
URL: https://github.com/apache/jena/pull/871#discussion_r530282869



##########
File path: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java
##########
@@ -85,24 +88,75 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo
             filename = IRILib.decodeHex(filename);
         }
         InputStream in = new FileInputStream(filename);
-        String ext = FilenameUtils.getExtension(filename);
+        String ext = getExtension(filename);
         switch ( ext ) {
             case "":        return in;
-            case "gz":      return new GZIPInputStream(in);
-            case "bz2":     return new BZip2CompressorInputStream(in);
-            case "sz":      return new SnappyCompressorInputStream(in);
+            case ext_gz:    return new GZIPInputStream(in);
+            case ext_bz2:   return new BZip2CompressorInputStream(in);
+            case ext_sz:    return new SnappyCompressorInputStream(in);
         }
         return in;
     }
 
-    private static String[] extensions = { "gz", "bz2", "sz" };
+    // ---- Extracted from Apache CommonsIO : FilenameUtils (2.8.0) because of the drive letter handling.
+    private static final int NOT_FOUND = -1;
+    private static final String EMPTY_STRING = "";
+    private static final String EXTENSION_SEPARATOR = ".";
+    private static final char UNIX_SEPARATOR = '/';
+    private static final char WINDOWS_SEPARATOR = '\\';
+
+    private static int indexOfLastSeparator(final String fileName) {
+        if (fileName == null) {
+            return NOT_FOUND;
+        }
+        final int lastUnixPos = fileName.lastIndexOf(UNIX_SEPARATOR);
+        final int lastWindowsPos = fileName.lastIndexOf(WINDOWS_SEPARATOR);
+        return Math.max(lastUnixPos, lastWindowsPos);
+    }
 
-    /** The filename without any compression extension, or the original filename.
-     *  It tests for compression types handled by {@link #openFileEx}.
+    private static int indexOfExtension(final String fileName) throws IllegalArgumentException {
+        if (fileName == null) {
+            return NOT_FOUND;
+        }
+//        if (isSystemWindows()) {
+//            // Special handling for NTFS ADS: Don't accept colon in the fileName.
+//            final int offset = fileName.indexOf(':', getAdsCriticalOffset(fileName));
+//            if (offset != -1) {
+//                throw new IllegalArgumentException("NTFS ADS separator (':') in file name is forbidden.");
+//            }
+//        }

Review comment:
       :+1: 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org


[GitHub] [jena] afs merged pull request #871: JENA-2003: Handle file URIs with URI scheme name

Posted by GitBox <gi...@apache.org>.
afs merged pull request #871:
URL: https://github.com/apache/jena/pull/871


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@jena.apache.org
For additional commands, e-mail: pr-help@jena.apache.org