You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@seatunnel.apache.org by Guangdong Liu <li...@gmail.com> on 2023/06/05 08:38:11 UTC

[DISCUSS] Support Vector Database Connector

Hi , Seatunnel Devs,
      Currently very popular AIGC(AI-Generated Content) technology is used
in many scenarios,the vector database is an indispensable core component in
some current mainstream usage scenarios.The following is the architecture
diagram of Chroma as a vector database.

[image: image.png]
      I found that there is currently no solution for importing other data
into the vector database, however Apache Seatunnel is very suitable for
this, We can access vector database connector for Seatunnel. The current
list of some mainstream open source vector databases is as follows
weaviate, qdrant, milvus, chroma.  I very much hope that the community can
try Seatunnel+AIGC.

--

Best Regards

------------

Liugddx
liugddx@gmail.com

Re: [DISCUSS] Support Vector Database Connector

Posted by Leonard(Lifeng Nie) <ni...@apache.org>.
Good job

Guangdong Liu <li...@gmail.com> 于2023年6月14日周三 09:33写道:

> Hi, I have added a milvus connector for SeaTunnel, here is the pr link
> https://github.com/apache/seatunnel/pull/4885. I want to spread the word
> about this usage in the milvus community, I'd love to do it. So  PTAL
> thanks.
> --
>
> Best Regards
>
> ------------
>
> Liugddx
> liugddx@gmail.com
>
>
> David Zollo <da...@gmail.com> 于2023年6月10日周六 17:54写道:
>
> > good job, looking forward to your demo show
> >
> >
> >
> > Best Regards
> >
> > ---------------
> > Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
> > David
> > Linkedin: https://www.linkedin.com/in/davidzollo
> > Twitter: @WorkflowEasy
> > ---------------
> >
> > On Thu, Jun 8, 2023 at 12:23 AM Guangdong Liu <li...@gmail.com> wrote:
> > >
> > > Thanks David and Jun for your comments.
> > > I have developed connectors for milvus and have tested them,The pr is
> > > https://github.com/apache/seatunnel/pull/4885.
> > > I created a demo to search book titles by semantics rather than
> > keywords. I
> > > imported the data from
> > > https://www.kaggle.com/datasets/jealousleopard/goodreadsbooks into
> > milvus
> > > and used openai's
> > https://platform.openai.com/docs/api-reference/embeddings
> > > api to vectorize the book titles, and then entered the hint words to
> > > search. I will do a share soon and would like to get feedback from the
> > > community.
> > > --
> > >
> > > Best Regards
> > >
> > > ------------
> > >
> > > Liugddx
> > > liugddx@gmail.com
> > >
> > >
> > > David Zollo <da...@gmail.com> 于2023年6月7日周三 12:07写道:
> > >
> > > > Hi Guangdong,
> > > > Thanks for your good suggestion.   Supporting AIGC are indeed
> > > > promising directions for SeaTunnel.
> > > > Welcome to contribute code if anybody is interested in these
> projects.
> > > >
> > > >
> > > >
> > > > Best Regards
> > > >
> > > > ---------------
> > > > Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
> > > > David
> > > > Linkedin: https://www.linkedin.com/in/davidzollo
> > > > Twitter: @WorkflowEasy
> > > > ---------------
> > > >
> > > > On Mon, Jun 5, 2023 at 4:45 PM JUN GAO <ga...@apache.org>
> wrote:
> > > > >
> > > > > Good idea.
> > > > >
> > > > > Guangdong Liu <li...@gmail.com> 于2023年6月5日周一 16:38写道:
> > > > >
> > > > > > Hi , Seatunnel Devs,
> > > > > >       Currently very popular AIGC(AI-Generated Content)
> technology
> > is
> > > > used
> > > > > > in many scenarios,the vector database is an indispensable core
> > > > component in
> > > > > > some current mainstream usage scenarios.The following is the
> > > > architecture
> > > > > > diagram of Chroma as a vector database.
> > > > > >
> > > > > > [image: image.png]
> > > > > >       I found that there is currently no solution for importing
> > other
> > > > data
> > > > > > into the vector database, however Apache Seatunnel is very
> > suitable for
> > > > > > this, We can access vector database connector for Seatunnel. The
> > > > current
> > > > > > list of some mainstream open source vector databases is as
> follows
> > > > > > weaviate, qdrant, milvus, chroma.  I very much hope that the
> > community
> > > > can
> > > > > > try Seatunnel+AIGC.
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best Regards
> > > > > >
> > > > > > ------------
> > > > > >
> > > > > > Liugddx
> > > > > > liugddx@gmail.com
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best Regards
> > > > >
> > > > > ------------
> > > > >
> > > > > EricJoy2048
> > > > > gaojun2048@gmail.com
> > > >
> >
>


-- 
Warm Regards,

Leonard(LiFeng Nie)

Re: [DISCUSS] Support Vector Database Connector

Posted by Guangdong Liu <li...@gmail.com>.
Hi, I have added a milvus connector for SeaTunnel, here is the pr link
https://github.com/apache/seatunnel/pull/4885. I want to spread the word
about this usage in the milvus community, I'd love to do it. So  PTAL
thanks.
--

Best Regards

------------

Liugddx
liugddx@gmail.com


David Zollo <da...@gmail.com> 于2023年6月10日周六 17:54写道:

> good job, looking forward to your demo show
>
>
>
> Best Regards
>
> ---------------
> Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
> David
> Linkedin: https://www.linkedin.com/in/davidzollo
> Twitter: @WorkflowEasy
> ---------------
>
> On Thu, Jun 8, 2023 at 12:23 AM Guangdong Liu <li...@gmail.com> wrote:
> >
> > Thanks David and Jun for your comments.
> > I have developed connectors for milvus and have tested them,The pr is
> > https://github.com/apache/seatunnel/pull/4885.
> > I created a demo to search book titles by semantics rather than
> keywords. I
> > imported the data from
> > https://www.kaggle.com/datasets/jealousleopard/goodreadsbooks into
> milvus
> > and used openai's
> https://platform.openai.com/docs/api-reference/embeddings
> > api to vectorize the book titles, and then entered the hint words to
> > search. I will do a share soon and would like to get feedback from the
> > community.
> > --
> >
> > Best Regards
> >
> > ------------
> >
> > Liugddx
> > liugddx@gmail.com
> >
> >
> > David Zollo <da...@gmail.com> 于2023年6月7日周三 12:07写道:
> >
> > > Hi Guangdong,
> > > Thanks for your good suggestion.   Supporting AIGC are indeed
> > > promising directions for SeaTunnel.
> > > Welcome to contribute code if anybody is interested in these projects.
> > >
> > >
> > >
> > > Best Regards
> > >
> > > ---------------
> > > Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
> > > David
> > > Linkedin: https://www.linkedin.com/in/davidzollo
> > > Twitter: @WorkflowEasy
> > > ---------------
> > >
> > > On Mon, Jun 5, 2023 at 4:45 PM JUN GAO <ga...@apache.org> wrote:
> > > >
> > > > Good idea.
> > > >
> > > > Guangdong Liu <li...@gmail.com> 于2023年6月5日周一 16:38写道:
> > > >
> > > > > Hi , Seatunnel Devs,
> > > > >       Currently very popular AIGC(AI-Generated Content) technology
> is
> > > used
> > > > > in many scenarios,the vector database is an indispensable core
> > > component in
> > > > > some current mainstream usage scenarios.The following is the
> > > architecture
> > > > > diagram of Chroma as a vector database.
> > > > >
> > > > > [image: image.png]
> > > > >       I found that there is currently no solution for importing
> other
> > > data
> > > > > into the vector database, however Apache Seatunnel is very
> suitable for
> > > > > this, We can access vector database connector for Seatunnel. The
> > > current
> > > > > list of some mainstream open source vector databases is as follows
> > > > > weaviate, qdrant, milvus, chroma.  I very much hope that the
> community
> > > can
> > > > > try Seatunnel+AIGC.
> > > > >
> > > > > --
> > > > >
> > > > > Best Regards
> > > > >
> > > > > ------------
> > > > >
> > > > > Liugddx
> > > > > liugddx@gmail.com
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best Regards
> > > >
> > > > ------------
> > > >
> > > > EricJoy2048
> > > > gaojun2048@gmail.com
> > >
>

Re: [DISCUSS] Support Vector Database Connector

Posted by David Zollo <da...@gmail.com>.
good job, looking forward to your demo show



Best Regards

---------------
Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
David
Linkedin: https://www.linkedin.com/in/davidzollo
Twitter: @WorkflowEasy
---------------

On Thu, Jun 8, 2023 at 12:23 AM Guangdong Liu <li...@gmail.com> wrote:
>
> Thanks David and Jun for your comments.
> I have developed connectors for milvus and have tested them,The pr is
> https://github.com/apache/seatunnel/pull/4885.
> I created a demo to search book titles by semantics rather than keywords. I
> imported the data from
> https://www.kaggle.com/datasets/jealousleopard/goodreadsbooks into milvus
> and used openai's https://platform.openai.com/docs/api-reference/embeddings
> api to vectorize the book titles, and then entered the hint words to
> search. I will do a share soon and would like to get feedback from the
> community.
> --
>
> Best Regards
>
> ------------
>
> Liugddx
> liugddx@gmail.com
>
>
> David Zollo <da...@gmail.com> 于2023年6月7日周三 12:07写道:
>
> > Hi Guangdong,
> > Thanks for your good suggestion.   Supporting AIGC are indeed
> > promising directions for SeaTunnel.
> > Welcome to contribute code if anybody is interested in these projects.
> >
> >
> >
> > Best Regards
> >
> > ---------------
> > Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
> > David
> > Linkedin: https://www.linkedin.com/in/davidzollo
> > Twitter: @WorkflowEasy
> > ---------------
> >
> > On Mon, Jun 5, 2023 at 4:45 PM JUN GAO <ga...@apache.org> wrote:
> > >
> > > Good idea.
> > >
> > > Guangdong Liu <li...@gmail.com> 于2023年6月5日周一 16:38写道:
> > >
> > > > Hi , Seatunnel Devs,
> > > >       Currently very popular AIGC(AI-Generated Content) technology is
> > used
> > > > in many scenarios,the vector database is an indispensable core
> > component in
> > > > some current mainstream usage scenarios.The following is the
> > architecture
> > > > diagram of Chroma as a vector database.
> > > >
> > > > [image: image.png]
> > > >       I found that there is currently no solution for importing other
> > data
> > > > into the vector database, however Apache Seatunnel is very suitable for
> > > > this, We can access vector database connector for Seatunnel. The
> > current
> > > > list of some mainstream open source vector databases is as follows
> > > > weaviate, qdrant, milvus, chroma.  I very much hope that the community
> > can
> > > > try Seatunnel+AIGC.
> > > >
> > > > --
> > > >
> > > > Best Regards
> > > >
> > > > ------------
> > > >
> > > > Liugddx
> > > > liugddx@gmail.com
> > > >
> > >
> > >
> > > --
> > >
> > > Best Regards
> > >
> > > ------------
> > >
> > > EricJoy2048
> > > gaojun2048@gmail.com
> >

Re: [DISCUSS] Support Vector Database Connector

Posted by Guangdong Liu <li...@gmail.com>.
Thanks David and Jun for your comments.
I have developed connectors for milvus and have tested them,The pr is
https://github.com/apache/seatunnel/pull/4885.
I created a demo to search book titles by semantics rather than keywords. I
imported the data from
https://www.kaggle.com/datasets/jealousleopard/goodreadsbooks into milvus
and used openai's https://platform.openai.com/docs/api-reference/embeddings
api to vectorize the book titles, and then entered the hint words to
search. I will do a share soon and would like to get feedback from the
community.
--

Best Regards

------------

Liugddx
liugddx@gmail.com


David Zollo <da...@gmail.com> 于2023年6月7日周三 12:07写道:

> Hi Guangdong,
> Thanks for your good suggestion.   Supporting AIGC are indeed
> promising directions for SeaTunnel.
> Welcome to contribute code if anybody is interested in these projects.
>
>
>
> Best Regards
>
> ---------------
> Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
> David
> Linkedin: https://www.linkedin.com/in/davidzollo
> Twitter: @WorkflowEasy
> ---------------
>
> On Mon, Jun 5, 2023 at 4:45 PM JUN GAO <ga...@apache.org> wrote:
> >
> > Good idea.
> >
> > Guangdong Liu <li...@gmail.com> 于2023年6月5日周一 16:38写道:
> >
> > > Hi , Seatunnel Devs,
> > >       Currently very popular AIGC(AI-Generated Content) technology is
> used
> > > in many scenarios,the vector database is an indispensable core
> component in
> > > some current mainstream usage scenarios.The following is the
> architecture
> > > diagram of Chroma as a vector database.
> > >
> > > [image: image.png]
> > >       I found that there is currently no solution for importing other
> data
> > > into the vector database, however Apache Seatunnel is very suitable for
> > > this, We can access vector database connector for Seatunnel. The
> current
> > > list of some mainstream open source vector databases is as follows
> > > weaviate, qdrant, milvus, chroma.  I very much hope that the community
> can
> > > try Seatunnel+AIGC.
> > >
> > > --
> > >
> > > Best Regards
> > >
> > > ------------
> > >
> > > Liugddx
> > > liugddx@gmail.com
> > >
> >
> >
> > --
> >
> > Best Regards
> >
> > ------------
> >
> > EricJoy2048
> > gaojun2048@gmail.com
>

Re: [DISCUSS] Support Vector Database Connector

Posted by David Zollo <da...@gmail.com>.
Hi Guangdong,
Thanks for your good suggestion.   Supporting AIGC are indeed
promising directions for SeaTunnel.
Welcome to contribute code if anybody is interested in these projects.



Best Regards

---------------
Apache DolphinScheduler PMC Chair & Apache SeaTunnel PMC member
David
Linkedin: https://www.linkedin.com/in/davidzollo
Twitter: @WorkflowEasy
---------------

On Mon, Jun 5, 2023 at 4:45 PM JUN GAO <ga...@apache.org> wrote:
>
> Good idea.
>
> Guangdong Liu <li...@gmail.com> 于2023年6月5日周一 16:38写道:
>
> > Hi , Seatunnel Devs,
> >       Currently very popular AIGC(AI-Generated Content) technology is used
> > in many scenarios,the vector database is an indispensable core component in
> > some current mainstream usage scenarios.The following is the architecture
> > diagram of Chroma as a vector database.
> >
> > [image: image.png]
> >       I found that there is currently no solution for importing other data
> > into the vector database, however Apache Seatunnel is very suitable for
> > this, We can access vector database connector for Seatunnel. The current
> > list of some mainstream open source vector databases is as follows
> > weaviate, qdrant, milvus, chroma.  I very much hope that the community can
> > try Seatunnel+AIGC.
> >
> > --
> >
> > Best Regards
> >
> > ------------
> >
> > Liugddx
> > liugddx@gmail.com
> >
>
>
> --
>
> Best Regards
>
> ------------
>
> EricJoy2048
> gaojun2048@gmail.com

Re: [DISCUSS] Support Vector Database Connector

Posted by JUN GAO <ga...@apache.org>.
Good idea.

Guangdong Liu <li...@gmail.com> 于2023年6月5日周一 16:38写道:

> Hi , Seatunnel Devs,
>       Currently very popular AIGC(AI-Generated Content) technology is used
> in many scenarios,the vector database is an indispensable core component in
> some current mainstream usage scenarios.The following is the architecture
> diagram of Chroma as a vector database.
>
> [image: image.png]
>       I found that there is currently no solution for importing other data
> into the vector database, however Apache Seatunnel is very suitable for
> this, We can access vector database connector for Seatunnel. The current
> list of some mainstream open source vector databases is as follows
> weaviate, qdrant, milvus, chroma.  I very much hope that the community can
> try Seatunnel+AIGC.
>
> --
>
> Best Regards
>
> ------------
>
> Liugddx
> liugddx@gmail.com
>


-- 

Best Regards

------------

EricJoy2048
gaojun2048@gmail.com