You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by HeartSaVioR <gi...@git.apache.org> on 2016/10/21 14:18:03 UTC

[GitHub] storm pull request #1742: STORM-2170 Add built-in socket datasource to runti...

GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/storm/pull/1742

    STORM-2170 Add built-in socket datasource to runtime

    * Add Socket datasource (input/output) in storm-sql-runtime module
      * only for test purpose, no guarantee
    * scheme: 'socket'
    
    Since Socket datasource is in storm-sql-runtime, it's registered to DataSourcesProvider by default.
    So below SQL statements doesn't need any external artifacts at all and can be executed to below command:
    
    ```
    ./bin/storm sql apache_log_error_filtering_socket.sql apache_log_error_filtering_socket
    ```
    
    ```
    CREATE EXTERNAL TABLE APACHE_LOGS (ID INT PRIMARY KEY, REMOTE_IP VARCHAR, REQUEST_URL VARCHAR, REQUEST_METHOD VARCHAR, STATUS VARCHAR, REQUEST_HEADER_USER_AGENT VARCHAR, TIME_RECEIVED_UTC_ISOFORMAT VARCHAR, TIME_US DOUBLE) LOCATION 'socket://localhost:8889'
    CREATE EXTERNAL TABLE APACHE_ERROR_LOGS (ID INT PRIMARY KEY, REMOTE_IP VARCHAR, REQUEST_URL VARCHAR, REQUEST_METHOD VARCHAR, STATUS INT, REQUEST_HEADER_USER_AGENT VARCHAR, TIME_RECEIVED_UTC_ISOFORMAT VARCHAR, TIME_ELAPSED_MS INT) LOCATION 'socket://localhost:8890'
    INSERT INTO APACHE_ERROR_LOGS SELECT ID, REMOTE_IP, REQUEST_URL, REQUEST_METHOD, CAST(STATUS AS INT) AS STATUS_INT, REQUEST_HEADER_USER_AGENT, TIME_RECEIVED_UTC_ISOFORMAT, (TIME_US / 1000) AS TIME_ELAPSED_MS FROM APACHE_LOGS WHERE (CAST(STATUS AS INT) / 100) >= 4
    ```
    
    
    Manually tested via opening tcp servers via netcat.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/storm STORM-2170

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/1742.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1742
    
----
commit 885415eb64e1efc0f9a498a2380d3a67d680bc11
Author: Jungtaek Lim <ka...@gmail.com>
Date:   2016-10-21T12:57:43Z

    STORM-2170 Add built-in socket datasource to runtime
    
    * Add Socket datasource (input/output) in storm-sql-runtime module
      * only for test purpose, no guarantee
    * scheme: 'socket'

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1742: STORM-2170 [Storm SQL] Add built-in socket datasou...

Posted by vesense <gi...@git.apache.org>.
Github user vesense commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1742#discussion_r88447536
  
    --- Diff: external/sql/storm-sql-runtime/src/jvm/org/apache/storm/sql/runtime/datasource/socket/SocketDataSourcesProvider.java ---
    @@ -0,0 +1,94 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.storm.sql.runtime.datasource.socket;
    +
    +import com.google.common.collect.Lists;
    +import org.apache.storm.sql.runtime.DataSource;
    +import org.apache.storm.sql.runtime.DataSourcesProvider;
    +import org.apache.storm.sql.runtime.FieldInfo;
    +import org.apache.storm.sql.runtime.FieldNameExtractor;
    +import org.apache.storm.sql.runtime.ISqlTridentDataSource;
    +import org.apache.storm.sql.runtime.SimpleSqlTridentConsumer;
    +import org.apache.storm.sql.runtime.datasource.socket.trident.SocketState;
    +import org.apache.storm.sql.runtime.datasource.socket.trident.SocketStateUpdater;
    +import org.apache.storm.sql.runtime.datasource.socket.trident.TridentSocketSpout;
    +import org.apache.storm.sql.runtime.serde.json.JsonSerializer;
    +import org.apache.storm.trident.spout.ITridentDataSource;
    +import org.apache.storm.trident.state.StateFactory;
    +import org.apache.storm.trident.state.StateUpdater;
    +
    +import java.net.URI;
    +import java.util.List;
    +
    +/**
    + * Create a Socket data source based on the URI and properties. The URI has the format of
    + * socket://[host]:[port]. Both of host and port are mandatory.
    + *
    + * Note that it connects to given host and port, and receive the message if it's used for input source,
    + * and send the message if it's used for output data source.
    + */
    +public class SocketDataSourcesProvider implements DataSourcesProvider {
    +    @Override
    +    public String scheme() {
    +        return "socket";
    +    }
    +
    +    private static class SocketTridentDataSource implements ISqlTridentDataSource {
    +
    +        private final List<String> fieldNames;
    +        private final String host;
    +        private final int port;
    +
    +        SocketTridentDataSource(List<FieldInfo> fields, String host, int port) {
    +            this.fieldNames = Lists.transform(fields, new FieldNameExtractor());
    --- End diff --
    
    Maybe we need a upmerge. use `FieldInfoUtils.getFieldNames` as the replacement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1742: STORM-2170 [Storm SQL] Add built-in socket datasou...

Posted by vesense <gi...@git.apache.org>.
Github user vesense commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1742#discussion_r88449665
  
    --- Diff: external/sql/storm-sql-runtime/src/jvm/org/apache/storm/sql/runtime/datasource/socket/SocketDataSourcesProvider.java ---
    @@ -0,0 +1,94 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.storm.sql.runtime.datasource.socket;
    +
    +import com.google.common.collect.Lists;
    +import org.apache.storm.sql.runtime.DataSource;
    +import org.apache.storm.sql.runtime.DataSourcesProvider;
    +import org.apache.storm.sql.runtime.FieldInfo;
    +import org.apache.storm.sql.runtime.FieldNameExtractor;
    +import org.apache.storm.sql.runtime.ISqlTridentDataSource;
    +import org.apache.storm.sql.runtime.SimpleSqlTridentConsumer;
    +import org.apache.storm.sql.runtime.datasource.socket.trident.SocketState;
    +import org.apache.storm.sql.runtime.datasource.socket.trident.SocketStateUpdater;
    +import org.apache.storm.sql.runtime.datasource.socket.trident.TridentSocketSpout;
    +import org.apache.storm.sql.runtime.serde.json.JsonSerializer;
    +import org.apache.storm.trident.spout.ITridentDataSource;
    +import org.apache.storm.trident.state.StateFactory;
    +import org.apache.storm.trident.state.StateUpdater;
    +
    +import java.net.URI;
    +import java.util.List;
    +
    +/**
    + * Create a Socket data source based on the URI and properties. The URI has the format of
    + * socket://[host]:[port]. Both of host and port are mandatory.
    + *
    + * Note that it connects to given host and port, and receive the message if it's used for input source,
    + * and send the message if it's used for output data source.
    + */
    +public class SocketDataSourcesProvider implements DataSourcesProvider {
    +    @Override
    +    public String scheme() {
    +        return "socket";
    +    }
    +
    +    private static class SocketTridentDataSource implements ISqlTridentDataSource {
    +
    +        private final List<String> fieldNames;
    +        private final String host;
    +        private final int port;
    +
    +        SocketTridentDataSource(List<FieldInfo> fields, String host, int port) {
    +            this.fieldNames = Lists.transform(fields, new FieldNameExtractor());
    +            this.host = host;
    +            this.port = port;
    +        }
    +
    +        @Override
    +        public ITridentDataSource getProducer() {
    +            return new TridentSocketSpout(fieldNames, host, port);
    +        }
    +
    +        @Override
    +        public SqlTridentConsumer getConsumer() {
    +            StateFactory stateFactory = new SocketState.Factory(host, port);
    +            StateUpdater<SocketState> stateUpdater = new SocketStateUpdater(new JsonSerializer(fieldNames));
    --- End diff --
    
    use `SerdeUtils.getSerializer` for common scenes or we only need address `Json` format?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1742: STORM-2170 [Storm SQL] Add built-in socket datasource to ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1742
  
    @vesense Yes I plan to update #1777 after merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1742: STORM-2170 [Storm SQL] Add built-in socket datasource to ...

Posted by vesense <gi...@git.apache.org>.
Github user vesense commented on the issue:

    https://github.com/apache/storm/pull/1742
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1742: STORM-2170 [Storm SQL] Add built-in socket datasource to ...

Posted by vesense <gi...@git.apache.org>.
Github user vesense commented on the issue:

    https://github.com/apache/storm/pull/1742
  
    OK. nice :smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1742: STORM-2170 [Storm SQL] Add built-in socket datasou...

Posted by vesense <gi...@git.apache.org>.
Github user vesense commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1742#discussion_r88448768
  
    --- Diff: external/sql/storm-sql-runtime/src/resources/META-INF/services/org.apache.storm.sql.runtime.DataSourcesProvider ---
    @@ -0,0 +1,32 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +# http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +# contributor license agreements.  See the NOTICE file distributed with
    --- End diff --
    
    duplicate license


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1742: STORM-2170 [Storm SQL] Add built-in socket datasource to ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1742
  
    Also updated the documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request #1742: STORM-2170 [Storm SQL] Add built-in socket datasou...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/storm/pull/1742


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1742: STORM-2170 [Storm SQL] Add built-in socket datasource to ...

Posted by vesense <gi...@git.apache.org>.
Github user vesense commented on the issue:

    https://github.com/apache/storm/pull/1742
  
    Thanks @HeartSaVioR LGTM +1 Just left several comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1742: STORM-2170 [Storm SQL] Add built-in socket datasource to ...

Posted by HeartSaVioR <gi...@git.apache.org>.
Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1742
  
    @vesense Thanks for reviewing, I addressed all of your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1742: STORM-2170 [Storm SQL] Add built-in socket datasource to ...

Posted by vesense <gi...@git.apache.org>.
Github user vesense commented on the issue:

    https://github.com/apache/storm/pull/1742
  
    @HeartSaVioR It would be better to add some `how to use` document.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---