You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streampipes.apache.org by Philipp Zehnder <ze...@apache.org> on 2020/02/18 17:27:36 UTC

STREAMPIPES-75: Extend data lake sink to store images

Hi all,

I finished the implementation to store images in files instead of base 64 Strings in InfluxDB.

For the first version I mounted a local volume and added the images in a folder in this volume. 
I think this is a good starting point because the images are stored in a local volume on the same host as the sink.
Now the question is how can users access those images? I would suggest to extend the data lake REST API for that.
Therefore, the backend must mount the same volume as the internal sink container with the data lake sink.

Does anyone of you have an alternative solution?

@Dominik, you implemented already an StreamPipes internal file storage. Could we use that for the images as well or would the frequency be too high?

@all What about HDFS. We could set up HDFS, for files. Similar to InfluxDB as a shared service between multiple containers


Philipp

Re: STREAMPIPES-75: Extend data lake sink to store images

Posted by Philipp Zehnder <ze...@apache.org>.
Hi Johannes,

yes this is a very good idea. We should refactor the file adapters to store the files in the service Dominik described.
I created an issue for that: STREAMPIPES 80: Use internal file service in file adapters.

Philipp


> On 19. Feb 2020, at 20:59, Johannes Tex <te...@apache.org> wrote:
> 
> Hi,
> 
> I also think a service for file handling would be a good solution. 
> 
> At the moment we also use files for the Adapters that are stored in the Worker. 
> Maybe this would be another use case for a file service?
> 
> Johannes 
> 
> On 2020/02/19 06:58:11, Dominik Riemer <ri...@fzi.de> wrote: 
>> Hi Philipp,
>> 
>> yes, I think it makes sense to have a single service for handling files.
>> When writing the CSVMetadataEnrichment component for Chris, I started to add a simple file management to the backend and also extended the SDK with methods to receive files from the backend (see CsvMetadataEnrichmentController and FileServingResource in the backend).
>> 
>> We could extend this, isolate the file management to an individual microservice and add a simple API in front of it that can be used by all services that require to store or receive files (e.g., also for the included assets of pipeline elements, which could be documentation, icons or ML models).
>> 
>> Concerning HDFS, in my opinion this might be an option, but as we don't have very large amounts of data by now to store, it would probably be a bit of overkill here (one distributed system more to manage). 
>> 
>> Dominik
>> 
>> -----Original Message-----
>> From: Philipp Zehnder <ze...@apache.org> 
>> Sent: Tuesday, February 18, 2020 6:28 PM
>> To: dev@streampipes.apache.org
>> Subject: STREAMPIPES-75: Extend data lake sink to store images
>> 
>> Hi all,
>> 
>> I finished the implementation to store images in files instead of base 64 Strings in InfluxDB.
>> 
>> For the first version I mounted a local volume and added the images in a folder in this volume. 
>> I think this is a good starting point because the images are stored in a local volume on the same host as the sink.
>> Now the question is how can users access those images? I would suggest to extend the data lake REST API for that.
>> Therefore, the backend must mount the same volume as the internal sink container with the data lake sink.
>> 
>> Does anyone of you have an alternative solution?
>> 
>> @Dominik, you implemented already an StreamPipes internal file storage. Could we use that for the images as well or would the frequency be too high?
>> 
>> @all What about HDFS. We could set up HDFS, for files. Similar to InfluxDB as a shared service between multiple containers
>> 
>> 
>> Philipp
>> 



Re: RE: STREAMPIPES-75: Extend data lake sink to store images

Posted by Johannes Tex <te...@apache.org>.
Hi,

I also think a service for file handling would be a good solution. 

At the moment we also use files for the Adapters that are stored in the Worker. 
Maybe this would be another use case for a file service?

Johannes 

On 2020/02/19 06:58:11, Dominik Riemer <ri...@fzi.de> wrote: 
> Hi Philipp,
> 
> yes, I think it makes sense to have a single service for handling files.
> When writing the CSVMetadataEnrichment component for Chris, I started to add a simple file management to the backend and also extended the SDK with methods to receive files from the backend (see CsvMetadataEnrichmentController and FileServingResource in the backend).
> 
> We could extend this, isolate the file management to an individual microservice and add a simple API in front of it that can be used by all services that require to store or receive files (e.g., also for the included assets of pipeline elements, which could be documentation, icons or ML models).
> 
> Concerning HDFS, in my opinion this might be an option, but as we don't have very large amounts of data by now to store, it would probably be a bit of overkill here (one distributed system more to manage). 
> 
> Dominik
> 
> -----Original Message-----
> From: Philipp Zehnder <ze...@apache.org> 
> Sent: Tuesday, February 18, 2020 6:28 PM
> To: dev@streampipes.apache.org
> Subject: STREAMPIPES-75: Extend data lake sink to store images
> 
> Hi all,
> 
> I finished the implementation to store images in files instead of base 64 Strings in InfluxDB.
> 
> For the first version I mounted a local volume and added the images in a folder in this volume. 
> I think this is a good starting point because the images are stored in a local volume on the same host as the sink.
> Now the question is how can users access those images? I would suggest to extend the data lake REST API for that.
> Therefore, the backend must mount the same volume as the internal sink container with the data lake sink.
> 
> Does anyone of you have an alternative solution?
> 
> @Dominik, you implemented already an StreamPipes internal file storage. Could we use that for the images as well or would the frequency be too high?
> 
> @all What about HDFS. We could set up HDFS, for files. Similar to InfluxDB as a shared service between multiple containers
> 
> 
> Philipp
> 

RE: STREAMPIPES-75: Extend data lake sink to store images

Posted by Dominik Riemer <ri...@fzi.de>.
Hi Philipp,

yes, I think it makes sense to have a single service for handling files.
When writing the CSVMetadataEnrichment component for Chris, I started to add a simple file management to the backend and also extended the SDK with methods to receive files from the backend (see CsvMetadataEnrichmentController and FileServingResource in the backend).

We could extend this, isolate the file management to an individual microservice and add a simple API in front of it that can be used by all services that require to store or receive files (e.g., also for the included assets of pipeline elements, which could be documentation, icons or ML models).

Concerning HDFS, in my opinion this might be an option, but as we don't have very large amounts of data by now to store, it would probably be a bit of overkill here (one distributed system more to manage). 

Dominik

-----Original Message-----
From: Philipp Zehnder <ze...@apache.org> 
Sent: Tuesday, February 18, 2020 6:28 PM
To: dev@streampipes.apache.org
Subject: STREAMPIPES-75: Extend data lake sink to store images

Hi all,

I finished the implementation to store images in files instead of base 64 Strings in InfluxDB.

For the first version I mounted a local volume and added the images in a folder in this volume. 
I think this is a good starting point because the images are stored in a local volume on the same host as the sink.
Now the question is how can users access those images? I would suggest to extend the data lake REST API for that.
Therefore, the backend must mount the same volume as the internal sink container with the data lake sink.

Does anyone of you have an alternative solution?

@Dominik, you implemented already an StreamPipes internal file storage. Could we use that for the images as well or would the frequency be too high?

@all What about HDFS. We could set up HDFS, for files. Similar to InfluxDB as a shared service between multiple containers


Philipp