You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Tim Allison <ta...@apache.org> on 2021/11/09 19:00:51 UTC

Proposed topics for next Tika meetups?

All,
   Many thanks to those who attended today.  It was great to e-meet
old friends and users from around the world.  Many thanks to Lewis
McGibbney for getting the ball rolling on these.
   Let's use this thread to discuss possible topics and scheduling for
the next meetups?

Question 1: Pace...one a month or so?

Question 2: Topics?
a) tika-pipes hands-on workshop
b) get to know the users -- 5 minute go-around the room "this is how
we use it; these are our pain points"
c) ???

  Again, thank you!

           Best,

                  Tim

Re: Proposed topics for next Tika meetups?

Posted by Eric Pugh <ep...@opensourceconnections.com>.
As far as question 1 goes, anything more then once, is infinitely better ;-).   I’d be happy with quarterly, as that would be a lot more then what we do today!

As far as question 2 goes, I think you could do a agenda of:

Get to Know the Users
Presentation
Open Discussion

You have tika-pipes, which I was interested in.   I’d love to learn how many folks use Tika in Solr as well and discuss what the future of Tika in Solr is.

> On Nov 9, 2021, at 2:00 PM, Tim Allison <ta...@apache.org> wrote:
> 
> All,
>   Many thanks to those who attended today.  It was great to e-meet
> old friends and users from around the world.  Many thanks to Lewis
> McGibbney for getting the ball rolling on these.
>   Let's use this thread to discuss possible topics and scheduling for
> the next meetups?
> 
> Question 1: Pace...one a month or so?
> 
> Question 2: Topics?
> a) tika-pipes hands-on workshop
> b) get to know the users -- 5 minute go-around the room "this is how
> we use it; these are our pain points"
> c) ???
> 
>  Again, thank you!
> 
>           Best,
> 
>                  Tim

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>	
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.


Re: Proposed topics for next Tika meetups?

Posted by Tim Allison <ta...@apache.org>.
I've gone with a virtual, hands-on workshop for tika-pipes for Dec 2
at noon (EST).  I'll try to shorten the content (a bit) so there will
be more time for chatting, but this will be similar to the initial
meeting.

https://www.meetup.com/apache-tika-community/events/282123231/

On Tue, Nov 16, 2021 at 11:21 AM Tim Allison <ta...@apache.org> wrote:
>
> Arky,
>   This sounds great.  Are you using the new language models I added to
> Tika 2.x?  Those include the languages you mention and a couple more
> you requested earlier (?).
>
>  Cheers,
>
>          Tim
>
> On Wed, Nov 10, 2021 at 2:13 PM Arky <hi...@gmail.com> wrote:
> >
> > Hi,
> >
> > My personal interest is to get Tika to work better with Southern and
> > Southeast Asian languages.
> >
> > Any conversation on how we could contribute corpora to help train the
> > models for languages like Burmese, Thai, Khmer and Vietnamese would be
> > great.
> >
> >
> > Apart from general introductions, I would be happy to give use case of
> > how downstream projects use Tika for their work to injest and extract
> > data from multi-lingual documents.
> >
> > Cheers
> >
> > --arky
> >
> >
> >
> >
> >
> > On 11/11/21 1:18 AM, Tim Allison wrote:
> > > But seriously... how about a hands-on workshop on tika-pipes for the
> > > first week of December (focus on fileshare to Solr)?  We can follow
> > > Eric's recommendation of having a brief around the room to introduce
> > > each other and then a smaller actual tutorial.
> > >
> > > Was the day of week/time of day ok?  I realize that TWTh can be heavy
> > > meeting days for some, but I also know that folks take MF off. :D
> > >
> > > On Tue, Nov 9, 2021 at 3:53 PM Tim Allison <ta...@apache.org> wrote:
> > >>
> > >> Will sign up Ken for next week....kidding.  Yes, that sounds great
> > >> when you're ready!
> > >>
> > >> On Tue, Nov 9, 2021 at 3:16 PM Ken Krugler <kk...@transpac.com> wrote:
> > >>>
> > >>> Hi Tim,
> > >>>
> > >>> Maybe how to embed Tika in a scalable processing framework (Flink, Spark, AWS Lambda???) to process a large corpus in parallel?
> > >>>
> > >>> — Ken
> > >>>
> > >>>> On Nov 9, 2021, at 11:00 AM, Tim Allison <ta...@apache.org> wrote:
> > >>>>
> > >>>> All,
> > >>>>    Many thanks to those who attended today.  It was great to e-meet
> > >>>> old friends and users from around the world.  Many thanks to Lewis
> > >>>> McGibbney for getting the ball rolling on these.
> > >>>>    Let's use this thread to discuss possible topics and scheduling for
> > >>>> the next meetups?
> > >>>>
> > >>>> Question 1: Pace...one a month or so?
> > >>>>
> > >>>> Question 2: Topics?
> > >>>> a) tika-pipes hands-on workshop
> > >>>> b) get to know the users -- 5 minute go-around the room "this is how
> > >>>> we use it; these are our pain points"
> > >>>> c) ???
> > >>>>
> > >>>>   Again, thank you!
> > >>>>
> > >>>>            Best,
> > >>>>
> > >>>>                   Tim
> > >>>
> > >>> --------------------------
> > >>> Ken Krugler
> > >>> http://www.scaleunlimited.com
> > >>> Custom big data solutions
> > >>> Flink, Pinot, Solr, Elasticsearch
> > >>>
> > >>>
> > >>>
> >

Re: Proposed topics for next Tika meetups?

Posted by Tim Allison <ta...@apache.org>.
Arky,
  This sounds great.  Are you using the new language models I added to
Tika 2.x?  Those include the languages you mention and a couple more
you requested earlier (?).

 Cheers,

         Tim

On Wed, Nov 10, 2021 at 2:13 PM Arky <hi...@gmail.com> wrote:
>
> Hi,
>
> My personal interest is to get Tika to work better with Southern and
> Southeast Asian languages.
>
> Any conversation on how we could contribute corpora to help train the
> models for languages like Burmese, Thai, Khmer and Vietnamese would be
> great.
>
>
> Apart from general introductions, I would be happy to give use case of
> how downstream projects use Tika for their work to injest and extract
> data from multi-lingual documents.
>
> Cheers
>
> --arky
>
>
>
>
>
> On 11/11/21 1:18 AM, Tim Allison wrote:
> > But seriously... how about a hands-on workshop on tika-pipes for the
> > first week of December (focus on fileshare to Solr)?  We can follow
> > Eric's recommendation of having a brief around the room to introduce
> > each other and then a smaller actual tutorial.
> >
> > Was the day of week/time of day ok?  I realize that TWTh can be heavy
> > meeting days for some, but I also know that folks take MF off. :D
> >
> > On Tue, Nov 9, 2021 at 3:53 PM Tim Allison <ta...@apache.org> wrote:
> >>
> >> Will sign up Ken for next week....kidding.  Yes, that sounds great
> >> when you're ready!
> >>
> >> On Tue, Nov 9, 2021 at 3:16 PM Ken Krugler <kk...@transpac.com> wrote:
> >>>
> >>> Hi Tim,
> >>>
> >>> Maybe how to embed Tika in a scalable processing framework (Flink, Spark, AWS Lambda???) to process a large corpus in parallel?
> >>>
> >>> — Ken
> >>>
> >>>> On Nov 9, 2021, at 11:00 AM, Tim Allison <ta...@apache.org> wrote:
> >>>>
> >>>> All,
> >>>>    Many thanks to those who attended today.  It was great to e-meet
> >>>> old friends and users from around the world.  Many thanks to Lewis
> >>>> McGibbney for getting the ball rolling on these.
> >>>>    Let's use this thread to discuss possible topics and scheduling for
> >>>> the next meetups?
> >>>>
> >>>> Question 1: Pace...one a month or so?
> >>>>
> >>>> Question 2: Topics?
> >>>> a) tika-pipes hands-on workshop
> >>>> b) get to know the users -- 5 minute go-around the room "this is how
> >>>> we use it; these are our pain points"
> >>>> c) ???
> >>>>
> >>>>   Again, thank you!
> >>>>
> >>>>            Best,
> >>>>
> >>>>                   Tim
> >>>
> >>> --------------------------
> >>> Ken Krugler
> >>> http://www.scaleunlimited.com
> >>> Custom big data solutions
> >>> Flink, Pinot, Solr, Elasticsearch
> >>>
> >>>
> >>>
>

Re: Proposed topics for next Tika meetups?

Posted by Arky <hi...@gmail.com>.
Hi,

My personal interest is to get Tika to work better with Southern and 
Southeast Asian languages.

Any conversation on how we could contribute corpora to help train the 
models for languages like Burmese, Thai, Khmer and Vietnamese would be 
great.


Apart from general introductions, I would be happy to give use case of 
how downstream projects use Tika for their work to injest and extract 
data from multi-lingual documents.

Cheers

--arky





On 11/11/21 1:18 AM, Tim Allison wrote:
> But seriously... how about a hands-on workshop on tika-pipes for the
> first week of December (focus on fileshare to Solr)?  We can follow
> Eric's recommendation of having a brief around the room to introduce
> each other and then a smaller actual tutorial.
> 
> Was the day of week/time of day ok?  I realize that TWTh can be heavy
> meeting days for some, but I also know that folks take MF off. :D
> 
> On Tue, Nov 9, 2021 at 3:53 PM Tim Allison <ta...@apache.org> wrote:
>>
>> Will sign up Ken for next week....kidding.  Yes, that sounds great
>> when you're ready!
>>
>> On Tue, Nov 9, 2021 at 3:16 PM Ken Krugler <kk...@transpac.com> wrote:
>>>
>>> Hi Tim,
>>>
>>> Maybe how to embed Tika in a scalable processing framework (Flink, Spark, AWS Lambda???) to process a large corpus in parallel?
>>>
>>> — Ken
>>>
>>>> On Nov 9, 2021, at 11:00 AM, Tim Allison <ta...@apache.org> wrote:
>>>>
>>>> All,
>>>>    Many thanks to those who attended today.  It was great to e-meet
>>>> old friends and users from around the world.  Many thanks to Lewis
>>>> McGibbney for getting the ball rolling on these.
>>>>    Let's use this thread to discuss possible topics and scheduling for
>>>> the next meetups?
>>>>
>>>> Question 1: Pace...one a month or so?
>>>>
>>>> Question 2: Topics?
>>>> a) tika-pipes hands-on workshop
>>>> b) get to know the users -- 5 minute go-around the room "this is how
>>>> we use it; these are our pain points"
>>>> c) ???
>>>>
>>>>   Again, thank you!
>>>>
>>>>            Best,
>>>>
>>>>                   Tim
>>>
>>> --------------------------
>>> Ken Krugler
>>> http://www.scaleunlimited.com
>>> Custom big data solutions
>>> Flink, Pinot, Solr, Elasticsearch
>>>
>>>
>>>


Re: Proposed topics for next Tika meetups?

Posted by Tim Allison <ta...@apache.org>.
But seriously... how about a hands-on workshop on tika-pipes for the
first week of December (focus on fileshare to Solr)?  We can follow
Eric's recommendation of having a brief around the room to introduce
each other and then a smaller actual tutorial.

Was the day of week/time of day ok?  I realize that TWTh can be heavy
meeting days for some, but I also know that folks take MF off. :D

On Tue, Nov 9, 2021 at 3:53 PM Tim Allison <ta...@apache.org> wrote:
>
> Will sign up Ken for next week....kidding.  Yes, that sounds great
> when you're ready!
>
> On Tue, Nov 9, 2021 at 3:16 PM Ken Krugler <kk...@transpac.com> wrote:
> >
> > Hi Tim,
> >
> > Maybe how to embed Tika in a scalable processing framework (Flink, Spark, AWS Lambda???) to process a large corpus in parallel?
> >
> > — Ken
> >
> > > On Nov 9, 2021, at 11:00 AM, Tim Allison <ta...@apache.org> wrote:
> > >
> > > All,
> > >   Many thanks to those who attended today.  It was great to e-meet
> > > old friends and users from around the world.  Many thanks to Lewis
> > > McGibbney for getting the ball rolling on these.
> > >   Let's use this thread to discuss possible topics and scheduling for
> > > the next meetups?
> > >
> > > Question 1: Pace...one a month or so?
> > >
> > > Question 2: Topics?
> > > a) tika-pipes hands-on workshop
> > > b) get to know the users -- 5 minute go-around the room "this is how
> > > we use it; these are our pain points"
> > > c) ???
> > >
> > >  Again, thank you!
> > >
> > >           Best,
> > >
> > >                  Tim
> >
> > --------------------------
> > Ken Krugler
> > http://www.scaleunlimited.com
> > Custom big data solutions
> > Flink, Pinot, Solr, Elasticsearch
> >
> >
> >

Re: Proposed topics for next Tika meetups?

Posted by Tim Allison <ta...@apache.org>.
Will sign up Ken for next week....kidding.  Yes, that sounds great
when you're ready!

On Tue, Nov 9, 2021 at 3:16 PM Ken Krugler <kk...@transpac.com> wrote:
>
> Hi Tim,
>
> Maybe how to embed Tika in a scalable processing framework (Flink, Spark, AWS Lambda???) to process a large corpus in parallel?
>
> — Ken
>
> > On Nov 9, 2021, at 11:00 AM, Tim Allison <ta...@apache.org> wrote:
> >
> > All,
> >   Many thanks to those who attended today.  It was great to e-meet
> > old friends and users from around the world.  Many thanks to Lewis
> > McGibbney for getting the ball rolling on these.
> >   Let's use this thread to discuss possible topics and scheduling for
> > the next meetups?
> >
> > Question 1: Pace...one a month or so?
> >
> > Question 2: Topics?
> > a) tika-pipes hands-on workshop
> > b) get to know the users -- 5 minute go-around the room "this is how
> > we use it; these are our pain points"
> > c) ???
> >
> >  Again, thank you!
> >
> >           Best,
> >
> >                  Tim
>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink, Pinot, Solr, Elasticsearch
>
>
>

Re: Proposed topics for next Tika meetups?

Posted by Ken Krugler <kk...@transpac.com>.
Hi Tim,

Maybe how to embed Tika in a scalable processing framework (Flink, Spark, AWS Lambda???) to process a large corpus in parallel?

— Ken

> On Nov 9, 2021, at 11:00 AM, Tim Allison <ta...@apache.org> wrote:
> 
> All,
>   Many thanks to those who attended today.  It was great to e-meet
> old friends and users from around the world.  Many thanks to Lewis
> McGibbney for getting the ball rolling on these.
>   Let's use this thread to discuss possible topics and scheduling for
> the next meetups?
> 
> Question 1: Pace...one a month or so?
> 
> Question 2: Topics?
> a) tika-pipes hands-on workshop
> b) get to know the users -- 5 minute go-around the room "this is how
> we use it; these are our pain points"
> c) ???
> 
>  Again, thank you!
> 
>           Best,
> 
>                  Tim

--------------------------
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch