You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by GitBox <gi...@apache.org> on 2021/12/16 14:58:05 UTC

[GitHub] [tinkerpop] mikepersonick opened a new pull request #1523: Added an `AnonymizingTypeTranslator` for use with `GroovyTranslator` …

mikepersonick opened a new pull request #1523:
URL: https://github.com/apache/tinkerpop/pull/1523


   …which strips PII (anonymizes any String, Numeric, Date, Timestamp, or UUID data).
   
     https://issues.apache.org/jira/browse/TINKERPOP-2666


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1523: Added an `AnonymizingTypeTranslator` for use with `GroovyTranslator` …

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1523:
URL: https://github.com/apache/tinkerpop/pull/1523#discussion_r772666243



##########
File path: gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/translator/AnonymizingTypeTranslatorTest.java
##########
@@ -0,0 +1,88 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.process.traversal.translator;
+
+import org.apache.tinkerpop.gremlin.language.grammar.GremlinQueryParser;
+import org.apache.tinkerpop.gremlin.language.grammar.NoOpTerminalVisitor;
+import org.apache.tinkerpop.gremlin.process.traversal.Bytecode;
+import org.apache.tinkerpop.gremlin.process.traversal.Translator;
+import org.javatuples.Pair;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Arrays;
+
+public class AnonymizingTypeTranslatorTest {
+
+    private void testAnonymize(final String query, final String expected) {
+        try {
+            final Bytecode bc = (Bytecode) GremlinQueryParser.parse(query, new NoOpTerminalVisitor());
+            final Translator.ScriptTranslator translator = GroovyTranslator.of("g", new AnonymizingTypeTranslator());

Review comment:
       The test wouldn't work, but the test uses "g". Should be fine. For production code you would set up the GroovyTranslator differently.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] divijvaidya commented on a change in pull request #1523: Added an `AnonymizingTypeTranslator` for use with `GroovyTranslator` …

Posted by GitBox <gi...@apache.org>.
divijvaidya commented on a change in pull request #1523:
URL: https://github.com/apache/tinkerpop/pull/1523#discussion_r773158574



##########
File path: gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/translator/AnonymizingTypeTranslatorTest.java
##########
@@ -0,0 +1,88 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.process.traversal.translator;
+
+import org.apache.tinkerpop.gremlin.language.grammar.GremlinQueryParser;
+import org.apache.tinkerpop.gremlin.language.grammar.NoOpTerminalVisitor;
+import org.apache.tinkerpop.gremlin.process.traversal.Bytecode;
+import org.apache.tinkerpop.gremlin.process.traversal.Translator;
+import org.javatuples.Pair;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Arrays;
+
+public class AnonymizingTypeTranslatorTest {
+
+    private void testAnonymize(final String query, final String expected) {
+        try {
+            final Bytecode bc = (Bytecode) GremlinQueryParser.parse(query, new NoOpTerminalVisitor());
+            final Translator.ScriptTranslator translator = GroovyTranslator.of("g", new AnonymizingTypeTranslator());

Review comment:
       Ah. You are right. My bad. I didn't see that this is specific to a test.

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/translator/AnonymizingTypeTranslator.java
##########
@@ -0,0 +1,170 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.process.traversal.translator;
+
+import org.apache.tinkerpop.gremlin.process.traversal.*;
+import org.apache.tinkerpop.gremlin.structure.Edge;
+import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.apache.tinkerpop.gremlin.structure.VertexProperty;
+
+import java.sql.Timestamp;
+import java.util.*;
+
+
+/**
+ * This Translator will translate {@link Bytecode} into a representation that has been stripped of any user data
+ * (anonymized). A default anonymizer is provided, but can be replaced with a custom anonymizer as needed. The
+ * default anonymizer replaces any String, Numeric, Date, Timestamp, or UUID with a type-based token. Identical values
+ * will receive the same token (e.g. if "foo" is assigned "string0" then all occurrences of "foo" will be replaced
+ * with "string0").
+ */
+public class AnonymizingTypeTranslator extends GroovyTranslator.DefaultTypeTranslator {
+
+    /**
+     * Customizable anonymizer interface.
+     */
+    public interface Anonymizer {
+        /**
+         * Return an anonymized token for the supplied object.
+         *
+         * @param obj a {@link Traversal} object of one of the following types: String, Long, Double, FLoat, Integer,
+         *            Class, TImestamp, Date, UUID, {@link Vertex}, {@link Edge}, {@link VertexProperty}
+         * @return    an anonymized version of the supplied object
+         */
+        Object anonymize(Object obj);
+    }
+
+    /**
+     * This default implementation keeps a map from type (Java Class) to another map from instances to anonymized
+     * token.
+     */
+    public static class DefaultAnonymizer implements Anonymizer {
+        /*
+         * Map<ClassName, Map<Object, AnonymizedValue>>
+         */
+        private final Map<String, Map<Object, String>> simpleNameToObjectCache = new HashMap<>();

Review comment:
       Yes I understand that. I was wondering if there is a possibility where the class does not go out of scope through the lifecycle of JVM but seems like we create a new instance of anonymizer every time and it goes out of scope quite quickly, so my point is not valid. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] divijvaidya commented on pull request #1523: Added an `AnonymizingTypeTranslator` for use with `GroovyTranslator` …

Posted by GitBox <gi...@apache.org>.
divijvaidya commented on pull request #1523:
URL: https://github.com/apache/tinkerpop/pull/1523#issuecomment-998810635


   Vote +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] divijvaidya commented on a change in pull request #1523: Added an `AnonymizingTypeTranslator` for use with `GroovyTranslator` …

Posted by GitBox <gi...@apache.org>.
divijvaidya commented on a change in pull request #1523:
URL: https://github.com/apache/tinkerpop/pull/1523#discussion_r772308198



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/translator/AnonymizingTypeTranslator.java
##########
@@ -0,0 +1,170 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.process.traversal.translator;
+
+import org.apache.tinkerpop.gremlin.process.traversal.*;

Review comment:
       Please expand all imports to match the convention of rest of the code base.

##########
File path: gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/translator/AnonymizingTypeTranslatorTest.java
##########
@@ -0,0 +1,88 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.process.traversal.translator;
+
+import org.apache.tinkerpop.gremlin.language.grammar.GremlinQueryParser;
+import org.apache.tinkerpop.gremlin.language.grammar.NoOpTerminalVisitor;
+import org.apache.tinkerpop.gremlin.process.traversal.Bytecode;
+import org.apache.tinkerpop.gremlin.process.traversal.Translator;
+import org.javatuples.Pair;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Arrays;
+
+public class AnonymizingTypeTranslatorTest {
+
+    private void testAnonymize(final String query, final String expected) {
+        try {
+            final Bytecode bc = (Bytecode) GremlinQueryParser.parse(query, new NoOpTerminalVisitor());
+            final Translator.ScriptTranslator translator = GroovyTranslator.of("g", new AnonymizingTypeTranslator());
+            final String converted = translator.translate(bc).getScript();
+            Assert.assertEquals(expected, converted);
+        } catch (Exception ex) {
+            throw ex instanceof RuntimeException ? (RuntimeException) ex : new RuntimeException(ex);
+        }
+    }
+
+    @Test
+    public void testAnonymize() {

Review comment:
       1. Please add tests for all data types. Some of the missing ones are Date, Timestamp, boolean, UUID.
   2. Please add a test with as() such as `as("ssn")` and with `aggregate('allSSNs')`. This would make it very clear to the reader that ALL strings are anonymised irrespective of whether they are user input on data OR user input for labels.
   3. Please add a test using Gremlin constants such as Order.Asc or Cardinality.Single. These constants should not be anonymized. This can be achieved by not anonymizing any reserved string in Gremlin.

##########
File path: gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/translator/AnonymizingTypeTranslatorTest.java
##########
@@ -0,0 +1,88 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.process.traversal.translator;
+
+import org.apache.tinkerpop.gremlin.language.grammar.GremlinQueryParser;
+import org.apache.tinkerpop.gremlin.language.grammar.NoOpTerminalVisitor;
+import org.apache.tinkerpop.gremlin.process.traversal.Bytecode;
+import org.apache.tinkerpop.gremlin.process.traversal.Translator;
+import org.javatuples.Pair;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Arrays;
+
+public class AnonymizingTypeTranslatorTest {
+
+    private void testAnonymize(final String query, final String expected) {
+        try {
+            final Bytecode bc = (Bytecode) GremlinQueryParser.parse(query, new NoOpTerminalVisitor());
+            final Translator.ScriptTranslator translator = GroovyTranslator.of("g", new AnonymizingTypeTranslator());

Review comment:
       would this work if the `GraphTraversalSourceAlias` is not "g" (which is not necessary in Gremlin language)?

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/translator/AnonymizingTypeTranslator.java
##########
@@ -0,0 +1,170 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.process.traversal.translator;
+
+import org.apache.tinkerpop.gremlin.process.traversal.*;
+import org.apache.tinkerpop.gremlin.structure.Edge;
+import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.apache.tinkerpop.gremlin.structure.VertexProperty;
+
+import java.sql.Timestamp;
+import java.util.*;
+
+
+/**
+ * This Translator will translate {@link Bytecode} into a representation that has been stripped of any user data
+ * (anonymized). A default anonymizer is provided, but can be replaced with a custom anonymizer as needed. The
+ * default anonymizer replaces any String, Numeric, Date, Timestamp, or UUID with a type-based token. Identical values
+ * will receive the same token (e.g. if "foo" is assigned "string0" then all occurrences of "foo" will be replaced
+ * with "string0").
+ */
+public class AnonymizingTypeTranslator extends GroovyTranslator.DefaultTypeTranslator {
+
+    /**
+     * Customizable anonymizer interface.
+     */
+    public interface Anonymizer {
+        /**
+         * Return an anonymized token for the supplied object.
+         *
+         * @param obj a {@link Traversal} object of one of the following types: String, Long, Double, FLoat, Integer,
+         *            Class, TImestamp, Date, UUID, {@link Vertex}, {@link Edge}, {@link VertexProperty}
+         * @return    an anonymized version of the supplied object
+         */
+        Object anonymize(Object obj);
+    }
+
+    /**
+     * This default implementation keeps a map from type (Java Class) to another map from instances to anonymized
+     * token.
+     */
+    public static class DefaultAnonymizer implements Anonymizer {
+        /*
+         * Map<ClassName, Map<Object, AnonymizedValue>>
+         */
+        private final Map<String, Map<Object, String>> simpleNameToObjectCache = new HashMap<>();

Review comment:
       Correct me if I am wrong here but this map will not be garbage collected until JVM is destroyed (since it is allocated on static initialization of this class). With large number of queries, this will eat up heap memory quickly. 
   
   Is it possible to use weak reference here or clear the map or move this class outside (non-inner)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1523: Added an `AnonymizingTypeTranslator` for use with `GroovyTranslator` …

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1523:
URL: https://github.com/apache/tinkerpop/pull/1523#discussion_r772666897



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/translator/AnonymizingTypeTranslator.java
##########
@@ -0,0 +1,170 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing,
+ *  software distributed under the License is distributed on an
+ *  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ *  KIND, either express or implied.  See the License for the
+ *  specific language governing permissions and limitations
+ *  under the License.
+ */
+
+package org.apache.tinkerpop.gremlin.process.traversal.translator;
+
+import org.apache.tinkerpop.gremlin.process.traversal.*;
+import org.apache.tinkerpop.gremlin.structure.Edge;
+import org.apache.tinkerpop.gremlin.structure.Vertex;
+import org.apache.tinkerpop.gremlin.structure.VertexProperty;
+
+import java.sql.Timestamp;
+import java.util.*;
+
+
+/**
+ * This Translator will translate {@link Bytecode} into a representation that has been stripped of any user data
+ * (anonymized). A default anonymizer is provided, but can be replaced with a custom anonymizer as needed. The
+ * default anonymizer replaces any String, Numeric, Date, Timestamp, or UUID with a type-based token. Identical values
+ * will receive the same token (e.g. if "foo" is assigned "string0" then all occurrences of "foo" will be replaced
+ * with "string0").
+ */
+public class AnonymizingTypeTranslator extends GroovyTranslator.DefaultTypeTranslator {
+
+    /**
+     * Customizable anonymizer interface.
+     */
+    public interface Anonymizer {
+        /**
+         * Return an anonymized token for the supplied object.
+         *
+         * @param obj a {@link Traversal} object of one of the following types: String, Long, Double, FLoat, Integer,
+         *            Class, TImestamp, Date, UUID, {@link Vertex}, {@link Edge}, {@link VertexProperty}
+         * @return    an anonymized version of the supplied object
+         */
+        Object anonymize(Object obj);
+    }
+
+    /**
+     * This default implementation keeps a map from type (Java Class) to another map from instances to anonymized
+     * token.
+     */
+    public static class DefaultAnonymizer implements Anonymizer {
+        /*
+         * Map<ClassName, Map<Object, AnonymizedValue>>
+         */
+        private final Map<String, Map<Object, String>> simpleNameToObjectCache = new HashMap<>();

Review comment:
       The map is not static, just the class. So the cache will be garbage collected when the Anonymizer goes out of scope (right after translation).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette merged pull request #1523: Added an `AnonymizingTypeTranslator` for use with `GroovyTranslator` …

Posted by GitBox <gi...@apache.org>.
spmallette merged pull request #1523:
URL: https://github.com/apache/tinkerpop/pull/1523


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org