You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streampipes.apache.org by Johannes Tex <te...@apache.org> on 2020/02/14 16:30:17 UTC

Image Labeling

Hi,

Philip started to extend the datalake sink to store images [STREAMPIPES-75]. 
I started now to create an Image labeler that allows users to label images in the datalake. [STREAMPIPES-78]. The Labels will be stored in the COCO Annonation Format. [1] After labeling, the images can be used to train an NN. 

The main features that the labeler should support
- Labeling with Bound boxes
- Labeling with Polygons

Do you have additional features that should also be supported?

Johannes


[1] http://cocodataset.org/#format-data



Re: RE: Image Labeling

Posted by Johannes Tex <te...@apache.org>.
Hi

I created a Wiki page [1].
Everyone is invited to contribute :)

[1] https://cwiki.apache.org/confluence/display/STREAMPIPES/Generic+Data+Store

Johannes


On 2020/02/24 22:54:14, "Dominik Riemer" <ri...@apache.org> wrote: 
> Hi Johannes,
> 
> +1 for having both REST and streaming interfaces to store images, we can probably start with REST and add streaming interfaces later.
> I think the generic data store will be part of the general platform API service, where we can integrate all endpoints that will be required by external services (e.g., registered streams/processors/sinks, historical data, images).
> What do you think, should we create a wiki page to collect all requirements and design the endpoints we are going to need?
> 
> Dominik 
> 
> -----Original Message-----
> From: Johannes Tex <te...@apache.org> 
> Sent: Sunday, February 23, 2020 11:35 AM
> To: dev@streampipes.apache.org
> Subject: Re: Image Labeling 
> 
> Hi,
> 
> I think a simple generic data store API that support the CRUD operations would be a great.
> 
> I think all CRUD operations should be available as REST API and the CREATE API maybe additionally with a messaging protocol (Kafka, MQTT), which is used e.g. by the Data-Lake-Sink to store the images. Or do you think that synchronous communication via REST is fast enough?
> In addition to reading entire files, the READ operation should have a stream interface that can be used directly by the adapters, for example.
> 
> Johannes
> 
> 
> On 2020/02/21 18:49:58, Philipp Zehnder <ze...@apache.org> wrote: 
> > Hi,
> > 
> > I also think we should store it either in a file, in the same directory as the image or in the CouchDB.
> > For now I am not sure what the better solution is. The only requirement is that once a user downloads the data, the labels should be provided in a Coco-JSON file, but this is possible with both options.
> > 
> > Since we have now multiple locations where we store data, we probably should start a discussion of how to Store application data within StreamPipes.
> > It might make sense to have an internal (or external) API for components and other service.
> > How do you think about that? What kind of features would such an API need?
> > 
> > Philipp
> > 
> > > On 19. Feb 2020, at 22:00, Johannes Tex <te...@apache.org> wrote:
> > > 
> > > Hi,
> > > 
> > > I starts with @Dominik question: The first Intention was to be part of the Data-Explorer, with toggling between simple exploring and labelling. @Philipp opened an Issue [STREAMPIPES-79] to refactoring the Data explorer, maybe in this context we could extend the data explorer for this two modes? 
> > > To display images, for example, we need almost the same mechanism like it is necessary for the image labelling, except the Labeling itself. We also need to extend the datalake API for images, which leads to @Philipp question. 
> > > 
> > > The data lake API supports, at the moment, just data that can be aggregated (numeric data). For the Image Labeling and viewing we need to extend the API. My proposal would be to create a paging API for images to the receive the next e.g. 10 images: It could be like this "/datalake/<index> /<timestamp>/<page>". What do you think? While this necessary extension we also can create the API to save the annotation.
> > > 
> > > I see three different options to save the annotations:
> > > * Influx -> save annotation direct with data point
> > >    - when exporting need to create COCO file
> > >    - need extra place to save (image) Labels/Categories
> > >    - need to 'manupilate' data point, which is not possible in influx (just delete and create new one)
> > > * File
> > >     - need to handle a file
> > > * CouchDB
> > >    - file generation is needed
> > > My proposal is to use the CouchDB to use the annotations. 
> > > 
> > > Johannes
> > > 
> > > 
> > > On 2020/02/17 21:12:38, Philipp Zehnder <ze...@apache.org> wrote: 
> > >> Hi Johannes,
> > >> 
> > >> as for the API, do you think we can extend the dataset API, or should we create a separate REST API for image annotation?
> > >> 
> > >> Where do you plan to store the coco annotation information? In files or in a DB?
> > >> 
> > >> Philipp
> > >> 
> > >>> On 16. Feb 2020, at 19:51, Dominik Riemer <ri...@apache.org> wrote:
> > >>> 
> > >>> Hi Johannes,
> > >>> sounds good!
> > >>> I think bounding boxes and polygons are totally fine for the first prototype.
> > >>> 
> > >>> How to you plan to integrate the labeling tool, will it be part of the data explorer or do you plan to add a new component?
> > >>> 
> > >>> Dominik
> > >>> 
> > >>> On 2020/02/14 16:30:17, Johannes Tex <te...@apache.org> wrote: 
> > >>>> Hi,
> > >>>> 
> > >>>> Philip started to extend the datalake sink to store images [STREAMPIPES-75]. 
> > >>>> I started now to create an Image labeler that allows users to label images in the datalake. [STREAMPIPES-78]. The Labels will be stored in the COCO Annonation Format. [1] After labeling, the images can be used to train an NN. 
> > >>>> 
> > >>>> The main features that the labeler should support
> > >>>> - Labeling with Bound boxes
> > >>>> - Labeling with Polygons
> > >>>> 
> > >>>> Do you have additional features that should also be supported?
> > >>>> 
> > >>>> Johannes
> > >>>> 
> > >>>> 
> > >>>> [1] http://cocodataset.org/#format-data
> > >>>> 
> > >>>> 
> > >>>> 
> > >> 
> > >> 
> > >> 
> > 
> > 
> > 
> 
> 

RE: Image Labeling

Posted by Dominik Riemer <ri...@apache.org>.
Hi Johannes,

+1 for having both REST and streaming interfaces to store images, we can probably start with REST and add streaming interfaces later.
I think the generic data store will be part of the general platform API service, where we can integrate all endpoints that will be required by external services (e.g., registered streams/processors/sinks, historical data, images).
What do you think, should we create a wiki page to collect all requirements and design the endpoints we are going to need?

Dominik 

-----Original Message-----
From: Johannes Tex <te...@apache.org> 
Sent: Sunday, February 23, 2020 11:35 AM
To: dev@streampipes.apache.org
Subject: Re: Image Labeling 

Hi,

I think a simple generic data store API that support the CRUD operations would be a great.

I think all CRUD operations should be available as REST API and the CREATE API maybe additionally with a messaging protocol (Kafka, MQTT), which is used e.g. by the Data-Lake-Sink to store the images. Or do you think that synchronous communication via REST is fast enough?
In addition to reading entire files, the READ operation should have a stream interface that can be used directly by the adapters, for example.

Johannes


On 2020/02/21 18:49:58, Philipp Zehnder <ze...@apache.org> wrote: 
> Hi,
> 
> I also think we should store it either in a file, in the same directory as the image or in the CouchDB.
> For now I am not sure what the better solution is. The only requirement is that once a user downloads the data, the labels should be provided in a Coco-JSON file, but this is possible with both options.
> 
> Since we have now multiple locations where we store data, we probably should start a discussion of how to Store application data within StreamPipes.
> It might make sense to have an internal (or external) API for components and other service.
> How do you think about that? What kind of features would such an API need?
> 
> Philipp
> 
> > On 19. Feb 2020, at 22:00, Johannes Tex <te...@apache.org> wrote:
> > 
> > Hi,
> > 
> > I starts with @Dominik question: The first Intention was to be part of the Data-Explorer, with toggling between simple exploring and labelling. @Philipp opened an Issue [STREAMPIPES-79] to refactoring the Data explorer, maybe in this context we could extend the data explorer for this two modes? 
> > To display images, for example, we need almost the same mechanism like it is necessary for the image labelling, except the Labeling itself. We also need to extend the datalake API for images, which leads to @Philipp question. 
> > 
> > The data lake API supports, at the moment, just data that can be aggregated (numeric data). For the Image Labeling and viewing we need to extend the API. My proposal would be to create a paging API for images to the receive the next e.g. 10 images: It could be like this "/datalake/<index> /<timestamp>/<page>". What do you think? While this necessary extension we also can create the API to save the annotation.
> > 
> > I see three different options to save the annotations:
> > * Influx -> save annotation direct with data point
> >    - when exporting need to create COCO file
> >    - need extra place to save (image) Labels/Categories
> >    - need to 'manupilate' data point, which is not possible in influx (just delete and create new one)
> > * File
> >     - need to handle a file
> > * CouchDB
> >    - file generation is needed
> > My proposal is to use the CouchDB to use the annotations. 
> > 
> > Johannes
> > 
> > 
> > On 2020/02/17 21:12:38, Philipp Zehnder <ze...@apache.org> wrote: 
> >> Hi Johannes,
> >> 
> >> as for the API, do you think we can extend the dataset API, or should we create a separate REST API for image annotation?
> >> 
> >> Where do you plan to store the coco annotation information? In files or in a DB?
> >> 
> >> Philipp
> >> 
> >>> On 16. Feb 2020, at 19:51, Dominik Riemer <ri...@apache.org> wrote:
> >>> 
> >>> Hi Johannes,
> >>> sounds good!
> >>> I think bounding boxes and polygons are totally fine for the first prototype.
> >>> 
> >>> How to you plan to integrate the labeling tool, will it be part of the data explorer or do you plan to add a new component?
> >>> 
> >>> Dominik
> >>> 
> >>> On 2020/02/14 16:30:17, Johannes Tex <te...@apache.org> wrote: 
> >>>> Hi,
> >>>> 
> >>>> Philip started to extend the datalake sink to store images [STREAMPIPES-75]. 
> >>>> I started now to create an Image labeler that allows users to label images in the datalake. [STREAMPIPES-78]. The Labels will be stored in the COCO Annonation Format. [1] After labeling, the images can be used to train an NN. 
> >>>> 
> >>>> The main features that the labeler should support
> >>>> - Labeling with Bound boxes
> >>>> - Labeling with Polygons
> >>>> 
> >>>> Do you have additional features that should also be supported?
> >>>> 
> >>>> Johannes
> >>>> 
> >>>> 
> >>>> [1] http://cocodataset.org/#format-data
> >>>> 
> >>>> 
> >>>> 
> >> 
> >> 
> >> 
> 
> 
> 


Re: Image Labeling

Posted by Johannes Tex <te...@apache.org>.
Hi,

I think a simple generic data store API that support the CRUD operations would be a great.

I think all CRUD operations should be available as REST API and the CREATE API maybe additionally with a messaging protocol (Kafka, MQTT), which is used e.g. by the Data-Lake-Sink to store the images. Or do you think that synchronous communication via REST is fast enough?
In addition to reading entire files, the READ operation should have a stream interface that can be used directly by the adapters, for example.

Johannes


On 2020/02/21 18:49:58, Philipp Zehnder <ze...@apache.org> wrote: 
> Hi,
> 
> I also think we should store it either in a file, in the same directory as the image or in the CouchDB.
> For now I am not sure what the better solution is. The only requirement is that once a user downloads the data, the labels should be provided in a Coco-JSON file, but this is possible with both options.
> 
> Since we have now multiple locations where we store data, we probably should start a discussion of how to Store application data within StreamPipes.
> It might make sense to have an internal (or external) API for components and other service.
> How do you think about that? What kind of features would such an API need?
> 
> Philipp
> 
> > On 19. Feb 2020, at 22:00, Johannes Tex <te...@apache.org> wrote:
> > 
> > Hi,
> > 
> > I starts with @Dominik question: The first Intention was to be part of the Data-Explorer, with toggling between simple exploring and labelling. @Philipp opened an Issue [STREAMPIPES-79] to refactoring the Data explorer, maybe in this context we could extend the data explorer for this two modes? 
> > To display images, for example, we need almost the same mechanism like it is necessary for the image labelling, except the Labeling itself. We also need to extend the datalake API for images, which leads to @Philipp question. 
> > 
> > The data lake API supports, at the moment, just data that can be aggregated (numeric data). For the Image Labeling and viewing we need to extend the API. My proposal would be to create a paging API for images to the receive the next e.g. 10 images: It could be like this "/datalake/<index> /<timestamp>/<page>". What do you think? While this necessary extension we also can create the API to save the annotation.
> > 
> > I see three different options to save the annotations:
> > * Influx -> save annotation direct with data point
> >    - when exporting need to create COCO file
> >    - need extra place to save (image) Labels/Categories
> >    - need to 'manupilate' data point, which is not possible in influx (just delete and create new one)
> > * File
> >     - need to handle a file
> > * CouchDB
> >    - file generation is needed
> > My proposal is to use the CouchDB to use the annotations. 
> > 
> > Johannes
> > 
> > 
> > On 2020/02/17 21:12:38, Philipp Zehnder <ze...@apache.org> wrote: 
> >> Hi Johannes,
> >> 
> >> as for the API, do you think we can extend the dataset API, or should we create a separate REST API for image annotation?
> >> 
> >> Where do you plan to store the coco annotation information? In files or in a DB?
> >> 
> >> Philipp
> >> 
> >>> On 16. Feb 2020, at 19:51, Dominik Riemer <ri...@apache.org> wrote:
> >>> 
> >>> Hi Johannes,
> >>> sounds good!
> >>> I think bounding boxes and polygons are totally fine for the first prototype.
> >>> 
> >>> How to you plan to integrate the labeling tool, will it be part of the data explorer or do you plan to add a new component?
> >>> 
> >>> Dominik
> >>> 
> >>> On 2020/02/14 16:30:17, Johannes Tex <te...@apache.org> wrote: 
> >>>> Hi,
> >>>> 
> >>>> Philip started to extend the datalake sink to store images [STREAMPIPES-75]. 
> >>>> I started now to create an Image labeler that allows users to label images in the datalake. [STREAMPIPES-78]. The Labels will be stored in the COCO Annonation Format. [1] After labeling, the images can be used to train an NN. 
> >>>> 
> >>>> The main features that the labeler should support
> >>>> - Labeling with Bound boxes
> >>>> - Labeling with Polygons
> >>>> 
> >>>> Do you have additional features that should also be supported?
> >>>> 
> >>>> Johannes
> >>>> 
> >>>> 
> >>>> [1] http://cocodataset.org/#format-data
> >>>> 
> >>>> 
> >>>> 
> >> 
> >> 
> >> 
> 
> 
> 

Re: Image Labeling

Posted by Philipp Zehnder <ze...@apache.org>.
Hi,

I also think we should store it either in a file, in the same directory as the image or in the CouchDB.
For now I am not sure what the better solution is. The only requirement is that once a user downloads the data, the labels should be provided in a Coco-JSON file, but this is possible with both options.

Since we have now multiple locations where we store data, we probably should start a discussion of how to Store application data within StreamPipes.
It might make sense to have an internal (or external) API for components and other service.
How do you think about that? What kind of features would such an API need?

Philipp

> On 19. Feb 2020, at 22:00, Johannes Tex <te...@apache.org> wrote:
> 
> Hi,
> 
> I starts with @Dominik question: The first Intention was to be part of the Data-Explorer, with toggling between simple exploring and labelling. @Philipp opened an Issue [STREAMPIPES-79] to refactoring the Data explorer, maybe in this context we could extend the data explorer for this two modes? 
> To display images, for example, we need almost the same mechanism like it is necessary for the image labelling, except the Labeling itself. We also need to extend the datalake API for images, which leads to @Philipp question. 
> 
> The data lake API supports, at the moment, just data that can be aggregated (numeric data). For the Image Labeling and viewing we need to extend the API. My proposal would be to create a paging API for images to the receive the next e.g. 10 images: It could be like this "/datalake/<index> /<timestamp>/<page>". What do you think? While this necessary extension we also can create the API to save the annotation.
> 
> I see three different options to save the annotations:
> * Influx -> save annotation direct with data point
>    - when exporting need to create COCO file
>    - need extra place to save (image) Labels/Categories
>    - need to 'manupilate' data point, which is not possible in influx (just delete and create new one)
> * File
>     - need to handle a file
> * CouchDB
>    - file generation is needed
> My proposal is to use the CouchDB to use the annotations. 
> 
> Johannes
> 
> 
> On 2020/02/17 21:12:38, Philipp Zehnder <ze...@apache.org> wrote: 
>> Hi Johannes,
>> 
>> as for the API, do you think we can extend the dataset API, or should we create a separate REST API for image annotation?
>> 
>> Where do you plan to store the coco annotation information? In files or in a DB?
>> 
>> Philipp
>> 
>>> On 16. Feb 2020, at 19:51, Dominik Riemer <ri...@apache.org> wrote:
>>> 
>>> Hi Johannes,
>>> sounds good!
>>> I think bounding boxes and polygons are totally fine for the first prototype.
>>> 
>>> How to you plan to integrate the labeling tool, will it be part of the data explorer or do you plan to add a new component?
>>> 
>>> Dominik
>>> 
>>> On 2020/02/14 16:30:17, Johannes Tex <te...@apache.org> wrote: 
>>>> Hi,
>>>> 
>>>> Philip started to extend the datalake sink to store images [STREAMPIPES-75]. 
>>>> I started now to create an Image labeler that allows users to label images in the datalake. [STREAMPIPES-78]. The Labels will be stored in the COCO Annonation Format. [1] After labeling, the images can be used to train an NN. 
>>>> 
>>>> The main features that the labeler should support
>>>> - Labeling with Bound boxes
>>>> - Labeling with Polygons
>>>> 
>>>> Do you have additional features that should also be supported?
>>>> 
>>>> Johannes
>>>> 
>>>> 
>>>> [1] http://cocodataset.org/#format-data
>>>> 
>>>> 
>>>> 
>> 
>> 
>> 



Re: Image Labeling

Posted by Johannes Tex <te...@apache.org>.
Hi,

I starts with @Dominik question: The first Intention was to be part of the Data-Explorer, with toggling between simple exploring and labelling. @Philipp opened an Issue [STREAMPIPES-79] to refactoring the Data explorer, maybe in this context we could extend the data explorer for this two modes? 
To display images, for example, we need almost the same mechanism like it is necessary for the image labelling, except the Labeling itself. We also need to extend the datalake API for images, which leads to @Philipp question. 

The data lake API supports, at the moment, just data that can be aggregated (numeric data). For the Image Labeling and viewing we need to extend the API. My proposal would be to create a paging API for images to the receive the next e.g. 10 images: It could be like this "/datalake/<index> /<timestamp>/<page>". What do you think? While this necessary extension we also can create the API to save the annotation.

I see three different options to save the annotations:
* Influx -> save annotation direct with data point
    - when exporting need to create COCO file
    - need extra place to save (image) Labels/Categories
    - need to 'manupilate' data point, which is not possible in influx (just delete and create new one)
* File
     - need to handle a file
* CouchDB
    - file generation is needed
My proposal is to use the CouchDB to use the annotations. 

Johannes


On 2020/02/17 21:12:38, Philipp Zehnder <ze...@apache.org> wrote: 
> Hi Johannes,
> 
> as for the API, do you think we can extend the dataset API, or should we create a separate REST API for image annotation?
> 
> Where do you plan to store the coco annotation information? In files or in a DB?
> 
> Philipp
> 
> > On 16. Feb 2020, at 19:51, Dominik Riemer <ri...@apache.org> wrote:
> > 
> > Hi Johannes,
> > sounds good!
> > I think bounding boxes and polygons are totally fine for the first prototype.
> > 
> > How to you plan to integrate the labeling tool, will it be part of the data explorer or do you plan to add a new component?
> > 
> > Dominik
> > 
> > On 2020/02/14 16:30:17, Johannes Tex <te...@apache.org> wrote: 
> >> Hi,
> >> 
> >> Philip started to extend the datalake sink to store images [STREAMPIPES-75]. 
> >> I started now to create an Image labeler that allows users to label images in the datalake. [STREAMPIPES-78]. The Labels will be stored in the COCO Annonation Format. [1] After labeling, the images can be used to train an NN. 
> >> 
> >> The main features that the labeler should support
> >> - Labeling with Bound boxes
> >> - Labeling with Polygons
> >> 
> >> Do you have additional features that should also be supported?
> >> 
> >> Johannes
> >> 
> >> 
> >> [1] http://cocodataset.org/#format-data
> >> 
> >> 
> >> 
> 
> 
> 

Re: Image Labeling

Posted by Philipp Zehnder <ze...@apache.org>.
Hi Johannes,

as for the API, do you think we can extend the dataset API, or should we create a separate REST API for image annotation?

Where do you plan to store the coco annotation information? In files or in a DB?

Philipp

> On 16. Feb 2020, at 19:51, Dominik Riemer <ri...@apache.org> wrote:
> 
> Hi Johannes,
> sounds good!
> I think bounding boxes and polygons are totally fine for the first prototype.
> 
> How to you plan to integrate the labeling tool, will it be part of the data explorer or do you plan to add a new component?
> 
> Dominik
> 
> On 2020/02/14 16:30:17, Johannes Tex <te...@apache.org> wrote: 
>> Hi,
>> 
>> Philip started to extend the datalake sink to store images [STREAMPIPES-75]. 
>> I started now to create an Image labeler that allows users to label images in the datalake. [STREAMPIPES-78]. The Labels will be stored in the COCO Annonation Format. [1] After labeling, the images can be used to train an NN. 
>> 
>> The main features that the labeler should support
>> - Labeling with Bound boxes
>> - Labeling with Polygons
>> 
>> Do you have additional features that should also be supported?
>> 
>> Johannes
>> 
>> 
>> [1] http://cocodataset.org/#format-data
>> 
>> 
>> 



Re: Image Labeling

Posted by Dominik Riemer <ri...@apache.org>.
Hi Johannes,
sounds good!
I think bounding boxes and polygons are totally fine for the first prototype.

How to you plan to integrate the labeling tool, will it be part of the data explorer or do you plan to add a new component?

Dominik

On 2020/02/14 16:30:17, Johannes Tex <te...@apache.org> wrote: 
> Hi,
> 
> Philip started to extend the datalake sink to store images [STREAMPIPES-75]. 
> I started now to create an Image labeler that allows users to label images in the datalake. [STREAMPIPES-78]. The Labels will be stored in the COCO Annonation Format. [1] After labeling, the images can be used to train an NN. 
> 
> The main features that the labeler should support
> - Labeling with Bound boxes
> - Labeling with Polygons
> 
> Do you have additional features that should also be supported?
> 
> Johannes
> 
> 
> [1] http://cocodataset.org/#format-data
> 
> 
>