You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by GitBox <gi...@apache.org> on 2022/05/26 14:15:48 UTC

[GitHub] [incubator-inlong] Greedyu opened a new pull request, #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Greedyu opened a new pull request, #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404

   Supports collect of full data for file type
   
   Fixes #4397
   ![image](https://user-images.githubusercontent.com/20356765/170505833-10d12b2c-1e8a-437e-bafd-17c2e304b5de.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] dockerzhang commented on pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
dockerzhang commented on PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#issuecomment-1143378790

   @EMsnap @pocozh PTAL, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] healchow merged pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
healchow merged PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] healchow commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
healchow commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r891174361


##########
inlong-agent/agent-common/src/main/java/org/apache/inlong/agent/constant/FileCollectType.java:
##########
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.inlong.agent.constant;
+
+/**
+ * Data source synchronization type

Review Comment:
   It is suggested to comment it with `Collect type of the File data`.



##########
inlong-agent/agent-plugins/src/main/java/org/apache/inlong/agent/plugin/sources/TextFileSource.java:
##########
@@ -75,6 +75,7 @@ public List<Reader> split(JobProfile jobConf) {
         for (File file : allFiles) {
             int seekPosition = jobConf.getInt(file.getAbsolutePath() + POSITION_SUFFIX, 0);
             LOGGER.info("read from history position {} with job profile {}", seekPosition, jobConf.getInstanceId());
+            LOGGER.info("file absolute path: {}", file.getAbsolutePath());

Review Comment:
   Please use one line to print log info.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] dockerzhang commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
dockerzhang commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r884741585


##########
inlong-agent/agent-common/src/main/java/org/apache/inlong/agent/constant/JobConstants.java:
##########
@@ -48,6 +48,7 @@ public class JobConstants extends CommonConstants {
     public static final String JOB_FILE_TIME_OFFSET = "job.fileJob.timeOffset";
     public static final String JOB_FILE_MAX_WAIT = "job.fileJob.file.max.wait";
     public static final String JOB_CYCLE_UNIT = "job.fileJob.cycleUnit";
+    public static final String JOB_FILE_SYNC_TYPE = "job.fileJob.syncType";

Review Comment:
   `collect` is better than `sync`



##########
inlong-agent/agent-common/src/main/java/org/apache/inlong/agent/constant/SyncTypeConstants.java:
##########
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.inlong.agent.constant;
+
+/**
+ * Data source synchronization type
+ */
+public class SyncTypeConstants {

Review Comment:
   SyncTypeConstants
   ->
   FileCollectType



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] Greedyu commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
Greedyu commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r884873706


##########
inlong-agent/agent-common/src/main/java/org/apache/inlong/agent/constant/SyncTypeConstants.java:
##########
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.inlong.agent.constant;
+
+/**
+ * Data source synchronization type
+ */
+public class SyncTypeConstants {

Review Comment:
   resolve



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] Greedyu commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
Greedyu commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r891010627


##########
inlong-agent/agent-plugins/src/test/java/org/apache/inlong/agent/plugin/TestFileAgent.java:
##########
@@ -144,6 +146,21 @@ public void testOneJobOnly() throws Exception {
         Assert.assertTrue(checkOnlyOneJob());
     }
 
+    // @Test
+    public void testOneJobFullPath() throws Exception {
+        String jsonString = TestUtils.getTestTriggerProfile();
+        TriggerProfile triggerProfile = TriggerProfile.parseJsonStr(jsonString);
+        triggerProfile.set(JOB_DIR_FILTER_PATTERN,
+                Paths.get(getClass().getClassLoader().getResource("test").toURI()).toString());
+        triggerProfile.set(JOB_FILE_MAX_WAIT, "-1");
+        triggerProfile.set(JOB_FILE_COLLECT_TYPE, FileCollectType.FULL);
+        TriggerManager triggerManager = agent.getManager().getTriggerManager();
+        triggerManager.addTrigger(triggerProfile);
+        while (true) {

Review Comment:
   It has been modified to add files to the code to test the addition of full mode



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] Greedyu commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
Greedyu commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r890199227


##########
inlong-agent/agent-plugins/src/test/java/org/apache/inlong/agent/plugin/TestFileAgent.java:
##########
@@ -144,6 +146,21 @@ public void testOneJobOnly() throws Exception {
         Assert.assertTrue(checkOnlyOneJob());
     }
 
+    // @Test
+    public void testOneJobFullPath() throws Exception {
+        String jsonString = TestUtils.getTestTriggerProfile();
+        TriggerProfile triggerProfile = TriggerProfile.parseJsonStr(jsonString);
+        triggerProfile.set(JOB_DIR_FILTER_PATTERN,
+                Paths.get(getClass().getClassLoader().getResource("test").toURI()).toString());
+        triggerProfile.set(JOB_FILE_MAX_WAIT, "-1");
+        triggerProfile.set(JOB_FILE_COLLECT_TYPE, FileCollectType.FULL);
+        TriggerManager triggerManager = agent.getManager().getTriggerManager();
+        triggerManager.addTrigger(triggerProfile);
+        while (true) {

Review Comment:
   In full mode, you can add files to the target after running to test whether the increment is normal



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] Greedyu commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
Greedyu commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r891006437


##########
inlong-agent/agent-plugins/src/test/java/org/apache/inlong/agent/plugin/TestFileAgent.java:
##########
@@ -144,6 +146,19 @@ public void testOneJobOnly() throws Exception {
         Assert.assertTrue(checkOnlyOneJob());
     }
 
+    // @Test
+    public void testOneJobFullPath() throws Exception {

Review Comment:
   The trigger will be submitted in the method, and the full file will be read



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] Greedyu commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
Greedyu commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r891185102


##########
inlong-agent/agent-plugins/src/main/java/org/apache/inlong/agent/plugin/sources/TextFileSource.java:
##########
@@ -75,6 +75,7 @@ public List<Reader> split(JobProfile jobConf) {
         for (File file : allFiles) {
             int seekPosition = jobConf.getInt(file.getAbsolutePath() + POSITION_SUFFIX, 0);
             LOGGER.info("read from history position {} with job profile {}", seekPosition, jobConf.getInstanceId());
+            LOGGER.info("file absolute path: {}", file.getAbsolutePath());

Review Comment:
   resolve



##########
inlong-agent/agent-common/src/main/java/org/apache/inlong/agent/constant/FileCollectType.java:
##########
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.inlong.agent.constant;
+
+/**
+ * Data source synchronization type

Review Comment:
   resolve



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] dockerzhang commented on pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
dockerzhang commented on PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#issuecomment-1141061001

   @Greedyu I think it's better to add a configuration to limit the file number.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] Greedyu commented on pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
Greedyu commented on PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#issuecomment-1141210578

   > @Greedyu I think it's better to add a configuration to limit the file number.
   The judgment of the file limit is logically processed in the TextFileSource.split() method to read the file
   <img width="678" alt="wecom-temp-57a16e6f48e9d5956ff5ff55c3de70ba" src="https://user-images.githubusercontent.com/20356765/171010331-fbcba965-97c3-4611-858c-4b046fd7e650.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] EMsnap commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
EMsnap commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r890715299


##########
inlong-agent/agent-plugins/src/test/java/org/apache/inlong/agent/plugin/TestFileAgent.java:
##########
@@ -144,6 +146,19 @@ public void testOneJobOnly() throws Exception {
         Assert.assertTrue(checkOnlyOneJob());
     }
 
+    // @Test
+    public void testOneJobFullPath() throws Exception {

Review Comment:
   the test cannot test full mode



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] EMsnap commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
EMsnap commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r889748137


##########
inlong-agent/agent-plugins/src/test/java/org/apache/inlong/agent/plugin/TestFileAgent.java:
##########
@@ -144,6 +146,21 @@ public void testOneJobOnly() throws Exception {
         Assert.assertTrue(checkOnlyOneJob());
     }
 
+    // @Test
+    public void testOneJobFullPath() throws Exception {
+        String jsonString = TestUtils.getTestTriggerProfile();
+        TriggerProfile triggerProfile = TriggerProfile.parseJsonStr(jsonString);
+        triggerProfile.set(JOB_DIR_FILTER_PATTERN,
+                Paths.get(getClass().getClassLoader().getResource("test").toURI()).toString());
+        triggerProfile.set(JOB_FILE_MAX_WAIT, "-1");
+        triggerProfile.set(JOB_FILE_COLLECT_TYPE, FileCollectType.FULL);
+        TriggerManager triggerManager = agent.getManager().getTriggerManager();
+        triggerManager.addTrigger(triggerProfile);
+        while (true) {

Review Comment:
   what's the purpose of this code 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] Greedyu commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
Greedyu commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r884873475


##########
inlong-agent/agent-common/src/main/java/org/apache/inlong/agent/constant/JobConstants.java:
##########
@@ -48,6 +48,7 @@ public class JobConstants extends CommonConstants {
     public static final String JOB_FILE_TIME_OFFSET = "job.fileJob.timeOffset";
     public static final String JOB_FILE_MAX_WAIT = "job.fileJob.file.max.wait";
     public static final String JOB_CYCLE_UNIT = "job.fileJob.cycleUnit";
+    public static final String JOB_FILE_SYNC_TYPE = "job.fileJob.syncType";

Review Comment:
   resolve



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-inlong] EMsnap commented on a diff in pull request #4404: [INLONG-4397][Agent] Supports collect of full data for file type

Posted by GitBox <gi...@apache.org>.
EMsnap commented on code in PR #4404:
URL: https://github.com/apache/incubator-inlong/pull/4404#discussion_r890708191


##########
inlong-agent/agent-plugins/src/test/java/org/apache/inlong/agent/plugin/TestFileAgent.java:
##########
@@ -144,6 +146,21 @@ public void testOneJobOnly() throws Exception {
         Assert.assertTrue(checkOnlyOneJob());
     }
 
+    // @Test
+    public void testOneJobFullPath() throws Exception {
+        String jsonString = TestUtils.getTestTriggerProfile();
+        TriggerProfile triggerProfile = TriggerProfile.parseJsonStr(jsonString);
+        triggerProfile.set(JOB_DIR_FILTER_PATTERN,
+                Paths.get(getClass().getClassLoader().getResource("test").toURI()).toString());
+        triggerProfile.set(JOB_FILE_MAX_WAIT, "-1");
+        triggerProfile.set(JOB_FILE_COLLECT_TYPE, FileCollectType.FULL);
+        TriggerManager triggerManager = agent.getManager().getTriggerManager();
+        triggerManager.addTrigger(triggerProfile);
+        while (true) {

Review Comment:
   runing a dead loop is not proper 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@inlong.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org