You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2023/01/11 10:29:04 UTC

[GitHub] [cassandra] Claudenw opened a new pull request, #2081: Fixes ordering in StandaloneUpgrader

Claudenw opened a new pull request, #2081:
URL: https://github.com/apache/cassandra/pull/2081

   Fixed issue where sstables were not updated in order by sorting on SSTableId
   
   patch by Claude Warren; reviewed by <Reviewers> for CASSANDRA-18143
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] jacek-lewandowski commented on a diff in pull request #2081: Fixes ordering in StandaloneUpgrader

Posted by GitBox <gi...@apache.org>.
jacek-lewandowski commented on code in PR #2081:
URL: https://github.com/apache/cassandra/pull/2081#discussion_r1069386615


##########
test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java:
##########
@@ -686,4 +689,77 @@ private static void copyFile(File cfDir, File file) throws IOException
                 }
         }
     }
+
+    /**
+     * Finds the first point at which the HashSet inserts a generation before a previous
+     * generation in the set. This is used in testing to verfy the sequence of tables where
+     * the generation order is significant.
+     * @param legacyVersion the legacy version to work with.
+     * @return the generation that causes of the inversion.
+     */
+    public static int findInversion(String legacyVersion) {
+        String ksname = "legacy_tables";
+        String cfname = String.format("legacy_%s_multiple", legacyVersion);;
+        SSTableFormat.Type formatType = SSTableFormat.Type.BIG;
+        int generation = 0;
+        SequenceBasedSSTableId id = new SequenceBasedSSTableId(generation);
+        File directory = new File( System.getProperty("java.io.tmpdir"));
+        Version version = formatType.info.getVersion(legacyVersion);
+        Descriptor descriptor = new Descriptor( version,  directory,  ksname,  cfname, id, formatType);
+        HashSet<Descriptor> set = new HashSet<>();
+        set.add( descriptor );
+        Object[] arry = set.toArray();

Review Comment:
   I was thinking about something like this:
   
   ```java
       static <T> T findFirstUnordered(java.util.function.Function<T, T> provider)
       {
           HashSet<T> set = new HashSet<>(2);
           T first = null;
           while (true)
           {
               T second = provider.apply(first);
               if (first != null)
               {
                   set.add(first);
                   set.add(second);
                   if (set.iterator().next() != first)
                       return second;
               }
               first = second;
           }
       }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] jacek-lewandowski commented on a diff in pull request #2081: Fixes ordering in StandaloneUpgrader

Posted by GitBox <gi...@apache.org>.
jacek-lewandowski commented on code in PR #2081:
URL: https://github.com/apache/cassandra/pull/2081#discussion_r1069386615


##########
test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java:
##########
@@ -686,4 +689,77 @@ private static void copyFile(File cfDir, File file) throws IOException
                 }
         }
     }
+
+    /**
+     * Finds the first point at which the HashSet inserts a generation before a previous
+     * generation in the set. This is used in testing to verfy the sequence of tables where
+     * the generation order is significant.
+     * @param legacyVersion the legacy version to work with.
+     * @return the generation that causes of the inversion.
+     */
+    public static int findInversion(String legacyVersion) {
+        String ksname = "legacy_tables";
+        String cfname = String.format("legacy_%s_multiple", legacyVersion);;
+        SSTableFormat.Type formatType = SSTableFormat.Type.BIG;
+        int generation = 0;
+        SequenceBasedSSTableId id = new SequenceBasedSSTableId(generation);
+        File directory = new File( System.getProperty("java.io.tmpdir"));
+        Version version = formatType.info.getVersion(legacyVersion);
+        Descriptor descriptor = new Descriptor( version,  directory,  ksname,  cfname, id, formatType);
+        HashSet<Descriptor> set = new HashSet<>();
+        set.add( descriptor );
+        Object[] arry = set.toArray();

Review Comment:
   I was thinking about something like this:
   
   ```java
       static <T> T findFirstUnordered(java.util.function.Function<T, T> provider)
       {
           HashSet<T> set = new HashSet<>(2);
           T first = null;
           while (true)
           {
               set.clear();
               T second = provider.apply(first);
               if (first != null)
               {
                   set.add(first);
                   set.add(second);
                   if (set.iterator().next() != first)
                       return second;
               }
               first = second;
           }
       }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] jacek-lewandowski commented on a diff in pull request #2081: Fixes ordering in StandaloneUpgrader

Posted by GitBox <gi...@apache.org>.
jacek-lewandowski commented on code in PR #2081:
URL: https://github.com/apache/cassandra/pull/2081#discussion_r1069237541


##########
test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java:
##########
@@ -686,4 +689,77 @@ private static void copyFile(File cfDir, File file) throws IOException
                 }
         }
     }
+
+    /**
+     * Finds the first point at which the HashSet inserts a generation before a previous
+     * generation in the set. This is used in testing to verfy the sequence of tables where
+     * the generation order is significant.
+     * @param legacyVersion the legacy version to work with.
+     * @return the generation that causes of the inversion.
+     */
+    public static int findInversion(String legacyVersion) {
+        String ksname = "legacy_tables";
+        String cfname = String.format("legacy_%s_multiple", legacyVersion);;
+        SSTableFormat.Type formatType = SSTableFormat.Type.BIG;
+        int generation = 0;
+        SequenceBasedSSTableId id = new SequenceBasedSSTableId(generation);
+        File directory = new File( System.getProperty("java.io.tmpdir"));
+        Version version = formatType.info.getVersion(legacyVersion);
+        Descriptor descriptor = new Descriptor( version,  directory,  ksname,  cfname, id, formatType);
+        HashSet<Descriptor> set = new HashSet<>();
+        set.add( descriptor );
+        Object[] arry = set.toArray();

Review Comment:
   I suppose it should be a generic method somewhere in the utility class. Something which take a function, for example `<T> T generateUntilUnordered(Collection<T> collection, Function<T, T> generator)` where `generator` accepts previously generated value, and initially `null`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] Claudenw commented on a diff in pull request #2081: Fixes ordering in StandaloneUpgrader

Posted by GitBox <gi...@apache.org>.
Claudenw commented on code in PR #2081:
URL: https://github.com/apache/cassandra/pull/2081#discussion_r1069371364


##########
test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java:
##########
@@ -686,4 +689,77 @@ private static void copyFile(File cfDir, File file) throws IOException
                 }
         }
     }
+
+    /**
+     * Finds the first point at which the HashSet inserts a generation before a previous
+     * generation in the set. This is used in testing to verfy the sequence of tables where
+     * the generation order is significant.
+     * @param legacyVersion the legacy version to work with.
+     * @return the generation that causes of the inversion.
+     */
+    public static int findInversion(String legacyVersion) {
+        String ksname = "legacy_tables";
+        String cfname = String.format("legacy_%s_multiple", legacyVersion);;
+        SSTableFormat.Type formatType = SSTableFormat.Type.BIG;
+        int generation = 0;
+        SequenceBasedSSTableId id = new SequenceBasedSSTableId(generation);
+        File directory = new File( System.getProperty("java.io.tmpdir"));
+        Version version = formatType.info.getVersion(legacyVersion);
+        Descriptor descriptor = new Descriptor( version,  directory,  ksname,  cfname, id, formatType);
+        HashSet<Descriptor> set = new HashSet<>();
+        set.add( descriptor );
+        Object[] arry = set.toArray();

Review Comment:
   I don't think this will work here.  The issue is that the HashSet wraps the values of the hash long before Hash collides. The method literally has to insert into the Hash table and then verify that the last item in the Hash table was the one inserted.  I have simplified the code so that it only retains two Descriptors at a time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] jacek-lewandowski commented on a diff in pull request #2081: Fixes ordering in StandaloneUpgrader

Posted by GitBox <gi...@apache.org>.
jacek-lewandowski commented on code in PR #2081:
URL: https://github.com/apache/cassandra/pull/2081#discussion_r1069243661


##########
test/unit/org/apache/cassandra/tools/StandaloneUpgraderOnSStablesTest.java:
##########
@@ -138,22 +138,58 @@ public void testUpgrade() throws Throwable
         Keyspace.open("legacy_tables").getColumnFamilyStore("legacy_" + legacyId + "_simple").loadNewSSTables();
     }
 
+    @Test
+    public void testUpgradeSequence() throws Throwable
+    {
+        int startGeneration = LegacySSTableTest.generateMultipleTables( legacyId );
+
+        String tableName = "legacy_" + legacyId + "_multiple";
+
+        List<String> origFiles = getSStableFiles("legacy_tables", tableName);
+        // verify that the files are not returned in the correct order.
+        Set<String> treeSet = new TreeSet<>();
+        treeSet.addAll( origFiles );
+        assertNotEquals( treeSet.toArray(), origFiles.toArray());
+
+        ToolResult tool = ToolRunner.invokeClass(StandaloneUpgrader.class,
+                                                 "legacy_tables",
+                                                 tableName);
+        System.out.println( tool.getStdout() );
+        Assertions.assertThat(tool.getStdout()).contains("Found 3 sstables that need upgrading.");
+        Assertions.assertThat(tool.getStdout()).contains("legacy_tables/" + tableName);
+        int[] loc = new int[3];
+        for (int i=0;i<3;i++) {
+            String fileName = String.format( "%s-%s-big-Data.db", legacyId, startGeneration+i);
+            Assertions.assertThat(tool.getStdout()).contains(fileName);
+            loc[i] = tool.getStdout().lastIndexOf(fileName);
+        }
+        Assert.assertTrue( "Files processed out of sequence", loc[0] < loc[1] );

Review Comment:
   Use `Assertions.assertThat(loc).isSorted` or something like that from assertj



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] jacek-lewandowski commented on a diff in pull request #2081: Fixes ordering in StandaloneUpgrader

Posted by GitBox <gi...@apache.org>.
jacek-lewandowski commented on code in PR #2081:
URL: https://github.com/apache/cassandra/pull/2081#discussion_r1069226909


##########
test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java:
##########
@@ -686,4 +689,77 @@ private static void copyFile(File cfDir, File file) throws IOException
                 }
         }
     }
+
+    /**
+     * Finds the first point at which the HashSet inserts a generation before a previous
+     * generation in the set. This is used in testing to verfy the sequence of tables where
+     * the generation order is significant.
+     * @param legacyVersion the legacy version to work with.
+     * @return the generation that causes of the inversion.
+     */
+    public static int findInversion(String legacyVersion) {
+        String ksname = "legacy_tables";
+        String cfname = String.format("legacy_%s_multiple", legacyVersion);;
+        SSTableFormat.Type formatType = SSTableFormat.Type.BIG;
+        int generation = 0;
+        SequenceBasedSSTableId id = new SequenceBasedSSTableId(generation);
+        File directory = new File( System.getProperty("java.io.tmpdir"));
+        Version version = formatType.info.getVersion(legacyVersion);
+        Descriptor descriptor = new Descriptor( version,  directory,  ksname,  cfname, id, formatType);
+        HashSet<Descriptor> set = new HashSet<>();
+        set.add( descriptor );
+        Object[] arry = set.toArray();
+        do
+        {
+            id = new SequenceBasedSSTableId(++generation);
+            descriptor = new Descriptor(version , directory , ksname , cfname , id , formatType);
+            set.add(descriptor);
+            arry = set.toArray();
+        }  while( descriptor.equals(arry[arry.length-1]));
+        return generation;
+    }
+
+    /**
+     * Generates 3 sstables generations in legacy_x_multiple and adds it to the column store.
+     * Tables are guaranteed to not be returned in generation order when iterated via the
+     * {@code Directories.SSTableLister} class.
+     * @param legacyVersion the version to create tables for
+     * @throws IOException on IO error.
+     */
+    public static int generateMultipleTables(String legacyVersion) throws IOException {
+
+        int start = findInversion(legacyVersion) - 1;
+        String tableName = String.format( "legacy_%s_multiple", legacyVersion);
+
+        File sourceDir = getTableDir(legacyVersion, String.format( "legacy_%s_simple", legacyVersion));
+
+        logger.info("creating legacy table {}", tableName);
+
+        QueryProcessor.executeInternal(String.format("CREATE TABLE legacy_tables.%s (pk text PRIMARY KEY, val text)", tableName));
+        ColumnFamilyStore cfs = Keyspace.open("legacy_tables").getColumnFamilyStore(tableName);
+
+        List<File> cfDirs = cfs.getDirectories().getCFDirectories();
+        File cfDir = cfDirs.get(0);
+
+        for (int i=0;i<3;i++) {
+            for (File file : sourceDir.tryList())
+            {
+                byte[] buf = new byte[65536];
+                if (file.isFile())
+                {
+                    String[] fileNameParts = file.name().split("-");
+                    String targetName = String.format( "%s-%s-%s-%s", legacyVersion, start+i, fileNameParts[2], fileNameParts[3]);
+                    logger.info("creating legacy sstable {}", targetName);
+                    File target = new File(cfDir, targetName);
+                    int rd;
+                    try (FileInputStreamPlus is = new FileInputStreamPlus(file);

Review Comment:
   I believe there is some `FileUtils.copy` method



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org