You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Dmitry Demeshchuk <dm...@postmates.com> on 2018/01/23 22:48:09 UTC

Reading message attributes in PubSub source in Python

Hi list,

My understanding is that ReadStringsFromPubSub
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/pubsub.py#L42>
doesn't
provide any way of getting the message metadata (attributes, publish
timestamp, etc). Looking further suggests
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/pubsub.py#L181>
that the majority of the PubSub functionality is inside Dataflow.

Hence, some questions:

1. Is my understanding of the current state of things correct?

2. Is there any API I can piggyback on to write my own PubSub source? My
guess would be that I can use NativeSource, but is that really so?

3. If the answer to both of the above is "no", is there any idea when this
will be officially supported?

What I'm doing right now is meant only for small things, so I probably
don't mind switching to Java for this specific task. Just trying to make
sure there's no better way.

Thanks!

-- 
Best regards,
Dmitry Demeshchuk.

Re: Reading message attributes in PubSub source in Python

Posted by Dmitry Demeshchuk <dm...@postmates.com>.
Thanks for the explanation, Ahmet!

I'll stick to the Java SDK for now, then.

On Tue, Jan 23, 2018 at 4:01 PM, Ahmet Altay <al...@google.com> wrote:

>
>
> On Tue, Jan 23, 2018 at 2:48 PM, Dmitry Demeshchuk <dm...@postmates.com>
> wrote:
>
>> Hi list,
>>
>> My understanding is that ReadStringsFromPubSub
>> <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/pubsub.py#L42> doesn't
>> provide any way of getting the message metadata (attributes, publish
>> timestamp, etc). Looking further suggests
>> <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/pubsub.py#L181>
>> that the majority of the PubSub functionality is inside Dataflow.
>>
>> Hence, some questions:
>>
>> 1. Is my understanding of the current state of things correct?
>>
>
> This is correct.
>
>
>>
>> 2. Is there any API I can piggyback on to write my own PubSub source? My
>> guess would be that I can use NativeSource, but is that really so?
>>
>
> No, unfortunately. Not yet. SDF for Python will be this API.
>
>
>>
>> 3. If the answer to both of the above is "no", is there any idea when
>> this will be officially supported?
>>
>
> There is no ETA. I am _hoping_ that in 2 releases we will implement an
> improved pubsub source that can do (1). And (2) can happen after that.
>
>
>>
>> What I'm doing right now is meant only for small things, so I probably
>> don't mind switching to Java for this specific task. Just trying to make
>> sure there's no better way.
>>
>
> If you have production need, I will recommend using Java. Otherwise stay
> tuned.
>
>
>>
>> Thanks!
>>
>> --
>> Best regards,
>> Dmitry Demeshchuk.
>>
>
>


-- 
Best regards,
Dmitry Demeshchuk.

Re: Reading message attributes in PubSub source in Python

Posted by Ahmet Altay <al...@google.com>.
On Tue, Jan 23, 2018 at 2:48 PM, Dmitry Demeshchuk <dm...@postmates.com>
wrote:

> Hi list,
>
> My understanding is that ReadStringsFromPubSub
> <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/pubsub.py#L42> doesn't
> provide any way of getting the message metadata (attributes, publish
> timestamp, etc). Looking further suggests
> <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/pubsub.py#L181>
> that the majority of the PubSub functionality is inside Dataflow.
>
> Hence, some questions:
>
> 1. Is my understanding of the current state of things correct?
>

This is correct.


>
> 2. Is there any API I can piggyback on to write my own PubSub source? My
> guess would be that I can use NativeSource, but is that really so?
>

No, unfortunately. Not yet. SDF for Python will be this API.


>
> 3. If the answer to both of the above is "no", is there any idea when this
> will be officially supported?
>

There is no ETA. I am _hoping_ that in 2 releases we will implement an
improved pubsub source that can do (1). And (2) can happen after that.


>
> What I'm doing right now is meant only for small things, so I probably
> don't mind switching to Java for this specific task. Just trying to make
> sure there's no better way.
>

If you have production need, I will recommend using Java. Otherwise stay
tuned.


>
> Thanks!
>
> --
> Best regards,
> Dmitry Demeshchuk.
>