You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by GitBox <gi...@apache.org> on 2020/06/10 07:51:04 UTC

[GitHub] [systemml] Baunsgaard commented on a change in pull request #971: [WIP] - Add protobuf support to write and read FrameBlocks to HDFS

Baunsgaard commented on a change in pull request #971:
URL: https://github.com/apache/systemml/pull/971#discussion_r437914794



##########
File path: src/main/java/org/apache/sysds/parser/dml/DmlBaseListener.java
##########
@@ -1,4 +1,4 @@
-// Generated from org\apache\sysds\parser\dml\Dml.g4 by ANTLR 4.5.3
+// Generated from org/apache/sysds/parser/dml/Dml.g4 by ANTLR 4.5.3

Review comment:
       These files change because they are generated on your system not windows.
   If you revert the change on you branch in a new commit, and thereafter look inside our git ignore.
   then it will show how to ignore these files for future commits :).
   
   Same comment for all `src/main/java/org/apache/sysds/parser/dml/XXX.java` files.

##########
File path: pom.xml
##########
@@ -1021,6 +1021,25 @@
 			<artifactId>maven-gpg-plugin</artifactId>
 			<version>1.6</version>
 		</dependency>
+
+		<!-- https://github.com/protocolbuffers/protobuf/tree/master/java -->
+		<dependency>
+			<groupId>com.google.protobuf</groupId>
+			<artifactId>protobuf-java</artifactId>
+			<version>3.11.0</version>

Review comment:
       It is not 100% updated on their GitHub.
   cold you try the 3.12.X out?
   <https://mvnrepository.com/artifact/com.google.protobuf/protobuf-java>
   
   

##########
File path: src/main/java/org/apache/sysds/runtime/io/FrameReaderFactory.java
##########
@@ -58,6 +58,9 @@ public static FrameReader createFrameReader( FileFormat fmt, FileFormatPropertie
 				else
 					reader = new FrameReaderBinaryBlock();
 				break;
+			case PROTO:
+				reader = new FrameReaderProto();
+				break;

Review comment:
       I know it is out of scope, but could you check out if a parallel readers is possible, if so you don't have to add it but just write a TODO specifying it is, or if not write not possible and reason?

##########
File path: src/test/java/org/apache/sysds/test/functions/io/proto/FrameReaderWriterProtoTest.java
##########
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.test.functions.io.proto;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.Random;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.sysds.common.Types;
+import org.apache.sysds.runtime.io.FrameReader;
+import org.apache.sysds.runtime.io.FrameReaderFactory;
+import org.apache.sysds.runtime.io.FrameWriter;
+import org.apache.sysds.runtime.io.FrameWriterFactory;
+import org.apache.sysds.runtime.matrix.data.FrameBlock;
+import org.apache.sysds.runtime.util.DataConverter;
+import org.apache.sysds.test.TestUtils;
+import org.apache.wink.json4j.JSONException;
+import org.junit.Test;
+
+public class FrameReaderWriterProtoTest
+{
+	private static final String FILENAME_SINGLE = "target/testTemp/functions/data/FrameReaderWriterProtoTest/testFrameBlock.proto";
+
+	@Test
+	public void testWriteReadFrameBlockSingleSingleFromHDFS() throws IOException, JSONException {
+		testWriteReadFrameBlockComplete(1,1, 4669201);
+	}

Review comment:
       Since these tests are the same as in `i0/json/freameReaderWriterJSONLTest`.
   You could make it into a class that extends and overwrites the json FrameReaderWriter, or another super class for both of them. 

##########
File path: src/main/java/org/apache/sysds/protobuf/SysdsProtos.java
##########
@@ -0,0 +1,2985 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+// Generated by the protocol buffer compiler.  DO NOT EDIT!
+// source: src/main/resources/protobuf/Frame.proto
+
+package org.apache.sysds.protobuf;
+
+public final class SysdsProtos {
+  private SysdsProtos() {}

Review comment:
       - If you change indentation to Tabs that would be great I don't know if it is possible while it is auto generated.. 
   - If you write the Generated in the first line above license then we have the same format as the DML files auto-generated
   - Also we have a code-style file in `/dev/docs/CodeStyle_eclipse.xml`, that could be applied again if possible

##########
File path: src/main/resources/protobuf/Frame.proto
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+syntax = "proto3";
+package sysds;
+option java_package = "org.apache.sysds.protobuf";
+option java_outer_classname = "SysdsProtos";
+option java_multiple_files = false;
+
+message Frame {
+    repeated Row rows = 1;
+}
+
+message Row {
+    repeated string column_names = 1;
+    repeated string column_data = 2;
+    repeated Schema column_schema = 3;
+}
+
+message Schema {
+    enum ValueType {
+        FP32 = 0;
+        FP64 = 1;
+        INT32 = 2;
+        INT64 = 3;
+        BOOLEAN = 4;
+        STRING = 5;
+        UNKNOWN = 6;

Review comment:
       Is this list somewhat the same as in `src/main/python/systemds/onnx_systemds/onnx_helper.py`? 
   or rather is there some predetermined specific schema needed, and why is this specific one used here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org