You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2018/09/24 14:43:43 UTC

[VOTE] Release Apache ManifoldCF 2.11, RC3

Please vote on whether to release ManifoldCF 2.11, RC3.  This release
contains a number of fixes/improvements/additions, described in the
CHANGES.txt file.  In addition, it includes Tika 1.19, which has a number
of fixes for classpath issues specifically requested by ManifoldCF.

This completely fixes a SolrJ related problem with the Solr Connector found
in RC3.  All tests pass.

The release artifact can be found at:

https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11

There is also a tag at:

https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC3

Thanks again,
Karl Wright

[CANCEL] [VOTE] Release Apache ManifoldCF 2.11, RC3

Posted by Karl Wright <da...@gmail.com>.
Looking at how RequestWriter is wired in, I need to make a different hack
to get the content streams to use multipart consistently.  So withdrawing
this candidate too.

On Mon, Sep 24, 2018 at 12:13 PM Karl Wright <da...@gmail.com> wrote:

> Hi Julien,
>
> This has nothing to do with the new Tika.
>
> It is not normal; it means that UpdateRequests are not being sent as
> multipart form posts.  It's going to require work from the Solr team to fix
> this problem, however, because everything I do to work around the issue
> nonetheless seems to fail. :-(
>
> I'm having a back-and-forth with Paul Noble right now.  I'll update
> accordingly when I know more.
>
> Karl
>
>
> On Mon, Sep 24, 2018 at 11:33 AM Julien Massiera <
> julien.massiera@francelabs.com> wrote:
>
>> After testing it, it is a +1 for me
>>
>> However, I found a new interesting issue coming with the new Tika
>> version. I had a jpg file for which some metadata were not extracted
>> before, like the RedTRC, BlueTRC and GreenTRC which contain
>> approximatively 2048 bytes of data each. As the metadata are passed to
>> Solr through the URI, I get the following error : URI is too large >8192
>>
>> Do we consider it as a "normal issue" or is it worth checking the
>> metadata length before sending the ingest request ?
>>
>>
>> On 24/09/2018 16:43, Karl Wright wrote:
>> > Please vote on whether to release ManifoldCF 2.11, RC3.  This release
>> > contains a number of fixes/improvements/additions, described in the
>> > CHANGES.txt file.  In addition, it includes Tika 1.19, which has a
>> number
>> > of fixes for classpath issues specifically requested by ManifoldCF.
>> >
>> > This completely fixes a SolrJ related problem with the Solr Connector
>> found
>> > in RC3.  All tests pass.
>> >
>> > The release artifact can be found at:
>> >
>> >
>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
>> >
>> > There is also a tag at:
>> >
>> > https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC3
>> >
>> > Thanks again,
>> > Karl Wright
>> >
>>
>> --
>> Julien MASSIERA
>> Directeur développement produit
>> France Labs – Les experts du Search
>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
>> www.francelabs.com
>>
>>

Re: [VOTE] Release Apache ManifoldCF 2.11, RC3

Posted by Karl Wright <da...@gmail.com>.
Excellent!! I'll create a new RC.

Thanks again,
Karl


On Tue, Sep 25, 2018 at 8:13 AM Julien Massiera <
julien.massiera@francelabs.com> wrote:

> This new fix seems to work. Ingestions and deletions are working and the
> image file with huge metadata is indexed !
>
> Julien
>
>
> On 25/09/2018 13:59, Karl Wright wrote:
> > I've committed a hack to trunk.  It has been tested for Solr Cell
> > documents, deletions, and for tika-connector-extracted documents that
> don't
> > have a lot of metadata.  I'm asking Julien to test it with his specific
> > image that has lots of metadata to see if the pathway for that case works
> > properly.  If it does, I'll spin another RC.
> >
> > Long term, since I'm a Lucene/Solr committer, I think I'm going to have
> to
> > take SolrJ under my wing if we expect it to work for ManifoldCF.  I don't
> > have a lot of time to do stuff like this anymore but clearly neither does
> > the Solr team.
> >
> > Karl
> >
> >
> > On Tue, Sep 25, 2018 at 6:14 AM Karl Wright <da...@gmail.com> wrote:
> >
> >> The back-and-forth is not going well.  Mr. Noble is needing to be
> >> convinced that it is a valid use case for Solr to have metadata longer
> than
> >> 4096 characters.  In fact it seems like the Solr folks have deliberately
> >> been trying to get rid of support for multipart posts for a while,
> because
> >> they don't see the need for them.  I'm still hoping to convince them
> >> otherwise but I'm not getting a positive feel.
> >>
> >> I'm still trying to figure out if multipart posts have any fundamental
> >> conflict with their RequestWriter architecture.  If not I can perhaps
> >> override the RequestWrite implementation and add multipart support that
> >> way.  But it's not going to be a quick process by any means.
> >>
> >>
> >> On Mon, Sep 24, 2018 at 12:13 PM Karl Wright <da...@gmail.com>
> wrote:
> >>
> >>> Hi Julien,
> >>>
> >>> This has nothing to do with the new Tika.
> >>>
> >>> It is not normal; it means that UpdateRequests are not being sent as
> >>> multipart form posts.  It's going to require work from the Solr team
> to fix
> >>> this problem, however, because everything I do to work around the issue
> >>> nonetheless seems to fail. :-(
> >>>
> >>> I'm having a back-and-forth with Paul Noble right now.  I'll update
> >>> accordingly when I know more.
> >>>
> >>> Karl
> >>>
> >>>
> >>> On Mon, Sep 24, 2018 at 11:33 AM Julien Massiera <
> >>> julien.massiera@francelabs.com> wrote:
> >>>
> >>>> After testing it, it is a +1 for me
> >>>>
> >>>> However, I found a new interesting issue coming with the new Tika
> >>>> version. I had a jpg file for which some metadata were not extracted
> >>>> before, like the RedTRC, BlueTRC and GreenTRC which contain
> >>>> approximatively 2048 bytes of data each. As the metadata are passed to
> >>>> Solr through the URI, I get the following error : URI is too large
> >8192
> >>>>
> >>>> Do we consider it as a "normal issue" or is it worth checking the
> >>>> metadata length before sending the ingest request ?
> >>>>
> >>>>
> >>>> On 24/09/2018 16:43, Karl Wright wrote:
> >>>>> Please vote on whether to release ManifoldCF 2.11, RC3.  This release
> >>>>> contains a number of fixes/improvements/additions, described in the
> >>>>> CHANGES.txt file.  In addition, it includes Tika 1.19, which has a
> >>>> number
> >>>>> of fixes for classpath issues specifically requested by ManifoldCF.
> >>>>>
> >>>>> This completely fixes a SolrJ related problem with the Solr Connector
> >>>> found
> >>>>> in RC3.  All tests pass.
> >>>>>
> >>>>> The release artifact can be found at:
> >>>>>
> >>>>>
> >>>>
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
> >>>>> There is also a tag at:
> >>>>>
> >>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC3
> >>>>>
> >>>>> Thanks again,
> >>>>> Karl Wright
> >>>>>
> >>>> --
> >>>> Julien MASSIERA
> >>>> Directeur développement produit
> >>>> France Labs – Les experts du Search
> >>>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington
> DC
> >>>> www.francelabs.com
> >>>>
> >>>>
>
> --
> Julien MASSIERA
> Directeur développement produit
> France Labs – Les experts du Search
> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
> www.francelabs.com
>
>

Re: [VOTE] Release Apache ManifoldCF 2.11, RC3

Posted by Julien Massiera <ju...@francelabs.com>.
This new fix seems to work. Ingestions and deletions are working and the 
image file with huge metadata is indexed !

Julien


On 25/09/2018 13:59, Karl Wright wrote:
> I've committed a hack to trunk.  It has been tested for Solr Cell
> documents, deletions, and for tika-connector-extracted documents that don't
> have a lot of metadata.  I'm asking Julien to test it with his specific
> image that has lots of metadata to see if the pathway for that case works
> properly.  If it does, I'll spin another RC.
>
> Long term, since I'm a Lucene/Solr committer, I think I'm going to have to
> take SolrJ under my wing if we expect it to work for ManifoldCF.  I don't
> have a lot of time to do stuff like this anymore but clearly neither does
> the Solr team.
>
> Karl
>
>
> On Tue, Sep 25, 2018 at 6:14 AM Karl Wright <da...@gmail.com> wrote:
>
>> The back-and-forth is not going well.  Mr. Noble is needing to be
>> convinced that it is a valid use case for Solr to have metadata longer than
>> 4096 characters.  In fact it seems like the Solr folks have deliberately
>> been trying to get rid of support for multipart posts for a while, because
>> they don't see the need for them.  I'm still hoping to convince them
>> otherwise but I'm not getting a positive feel.
>>
>> I'm still trying to figure out if multipart posts have any fundamental
>> conflict with their RequestWriter architecture.  If not I can perhaps
>> override the RequestWrite implementation and add multipart support that
>> way.  But it's not going to be a quick process by any means.
>>
>>
>> On Mon, Sep 24, 2018 at 12:13 PM Karl Wright <da...@gmail.com> wrote:
>>
>>> Hi Julien,
>>>
>>> This has nothing to do with the new Tika.
>>>
>>> It is not normal; it means that UpdateRequests are not being sent as
>>> multipart form posts.  It's going to require work from the Solr team to fix
>>> this problem, however, because everything I do to work around the issue
>>> nonetheless seems to fail. :-(
>>>
>>> I'm having a back-and-forth with Paul Noble right now.  I'll update
>>> accordingly when I know more.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Sep 24, 2018 at 11:33 AM Julien Massiera <
>>> julien.massiera@francelabs.com> wrote:
>>>
>>>> After testing it, it is a +1 for me
>>>>
>>>> However, I found a new interesting issue coming with the new Tika
>>>> version. I had a jpg file for which some metadata were not extracted
>>>> before, like the RedTRC, BlueTRC and GreenTRC which contain
>>>> approximatively 2048 bytes of data each. As the metadata are passed to
>>>> Solr through the URI, I get the following error : URI is too large >8192
>>>>
>>>> Do we consider it as a "normal issue" or is it worth checking the
>>>> metadata length before sending the ingest request ?
>>>>
>>>>
>>>> On 24/09/2018 16:43, Karl Wright wrote:
>>>>> Please vote on whether to release ManifoldCF 2.11, RC3.  This release
>>>>> contains a number of fixes/improvements/additions, described in the
>>>>> CHANGES.txt file.  In addition, it includes Tika 1.19, which has a
>>>> number
>>>>> of fixes for classpath issues specifically requested by ManifoldCF.
>>>>>
>>>>> This completely fixes a SolrJ related problem with the Solr Connector
>>>> found
>>>>> in RC3.  All tests pass.
>>>>>
>>>>> The release artifact can be found at:
>>>>>
>>>>>
>>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
>>>>> There is also a tag at:
>>>>>
>>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC3
>>>>>
>>>>> Thanks again,
>>>>> Karl Wright
>>>>>
>>>> --
>>>> Julien MASSIERA
>>>> Directeur développement produit
>>>> France Labs – Les experts du Search
>>>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
>>>> www.francelabs.com
>>>>
>>>>

-- 
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
www.francelabs.com


Re: [VOTE] Release Apache ManifoldCF 2.11, RC3

Posted by Karl Wright <da...@gmail.com>.
I've committed a hack to trunk.  It has been tested for Solr Cell
documents, deletions, and for tika-connector-extracted documents that don't
have a lot of metadata.  I'm asking Julien to test it with his specific
image that has lots of metadata to see if the pathway for that case works
properly.  If it does, I'll spin another RC.

Long term, since I'm a Lucene/Solr committer, I think I'm going to have to
take SolrJ under my wing if we expect it to work for ManifoldCF.  I don't
have a lot of time to do stuff like this anymore but clearly neither does
the Solr team.

Karl


On Tue, Sep 25, 2018 at 6:14 AM Karl Wright <da...@gmail.com> wrote:

> The back-and-forth is not going well.  Mr. Noble is needing to be
> convinced that it is a valid use case for Solr to have metadata longer than
> 4096 characters.  In fact it seems like the Solr folks have deliberately
> been trying to get rid of support for multipart posts for a while, because
> they don't see the need for them.  I'm still hoping to convince them
> otherwise but I'm not getting a positive feel.
>
> I'm still trying to figure out if multipart posts have any fundamental
> conflict with their RequestWriter architecture.  If not I can perhaps
> override the RequestWrite implementation and add multipart support that
> way.  But it's not going to be a quick process by any means.
>
>
> On Mon, Sep 24, 2018 at 12:13 PM Karl Wright <da...@gmail.com> wrote:
>
>> Hi Julien,
>>
>> This has nothing to do with the new Tika.
>>
>> It is not normal; it means that UpdateRequests are not being sent as
>> multipart form posts.  It's going to require work from the Solr team to fix
>> this problem, however, because everything I do to work around the issue
>> nonetheless seems to fail. :-(
>>
>> I'm having a back-and-forth with Paul Noble right now.  I'll update
>> accordingly when I know more.
>>
>> Karl
>>
>>
>> On Mon, Sep 24, 2018 at 11:33 AM Julien Massiera <
>> julien.massiera@francelabs.com> wrote:
>>
>>> After testing it, it is a +1 for me
>>>
>>> However, I found a new interesting issue coming with the new Tika
>>> version. I had a jpg file for which some metadata were not extracted
>>> before, like the RedTRC, BlueTRC and GreenTRC which contain
>>> approximatively 2048 bytes of data each. As the metadata are passed to
>>> Solr through the URI, I get the following error : URI is too large >8192
>>>
>>> Do we consider it as a "normal issue" or is it worth checking the
>>> metadata length before sending the ingest request ?
>>>
>>>
>>> On 24/09/2018 16:43, Karl Wright wrote:
>>> > Please vote on whether to release ManifoldCF 2.11, RC3.  This release
>>> > contains a number of fixes/improvements/additions, described in the
>>> > CHANGES.txt file.  In addition, it includes Tika 1.19, which has a
>>> number
>>> > of fixes for classpath issues specifically requested by ManifoldCF.
>>> >
>>> > This completely fixes a SolrJ related problem with the Solr Connector
>>> found
>>> > in RC3.  All tests pass.
>>> >
>>> > The release artifact can be found at:
>>> >
>>> >
>>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
>>> >
>>> > There is also a tag at:
>>> >
>>> > https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC3
>>> >
>>> > Thanks again,
>>> > Karl Wright
>>> >
>>>
>>> --
>>> Julien MASSIERA
>>> Directeur développement produit
>>> France Labs – Les experts du Search
>>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
>>> www.francelabs.com
>>>
>>>

Re: [VOTE] Release Apache ManifoldCF 2.11, RC3

Posted by Karl Wright <da...@gmail.com>.
The back-and-forth is not going well.  Mr. Noble is needing to be convinced
that it is a valid use case for Solr to have metadata longer than 4096
characters.  In fact it seems like the Solr folks have deliberately been
trying to get rid of support for multipart posts for a while, because they
don't see the need for them.  I'm still hoping to convince them otherwise
but I'm not getting a positive feel.

I'm still trying to figure out if multipart posts have any fundamental
conflict with their RequestWriter architecture.  If not I can perhaps
override the RequestWrite implementation and add multipart support that
way.  But it's not going to be a quick process by any means.


On Mon, Sep 24, 2018 at 12:13 PM Karl Wright <da...@gmail.com> wrote:

> Hi Julien,
>
> This has nothing to do with the new Tika.
>
> It is not normal; it means that UpdateRequests are not being sent as
> multipart form posts.  It's going to require work from the Solr team to fix
> this problem, however, because everything I do to work around the issue
> nonetheless seems to fail. :-(
>
> I'm having a back-and-forth with Paul Noble right now.  I'll update
> accordingly when I know more.
>
> Karl
>
>
> On Mon, Sep 24, 2018 at 11:33 AM Julien Massiera <
> julien.massiera@francelabs.com> wrote:
>
>> After testing it, it is a +1 for me
>>
>> However, I found a new interesting issue coming with the new Tika
>> version. I had a jpg file for which some metadata were not extracted
>> before, like the RedTRC, BlueTRC and GreenTRC which contain
>> approximatively 2048 bytes of data each. As the metadata are passed to
>> Solr through the URI, I get the following error : URI is too large >8192
>>
>> Do we consider it as a "normal issue" or is it worth checking the
>> metadata length before sending the ingest request ?
>>
>>
>> On 24/09/2018 16:43, Karl Wright wrote:
>> > Please vote on whether to release ManifoldCF 2.11, RC3.  This release
>> > contains a number of fixes/improvements/additions, described in the
>> > CHANGES.txt file.  In addition, it includes Tika 1.19, which has a
>> number
>> > of fixes for classpath issues specifically requested by ManifoldCF.
>> >
>> > This completely fixes a SolrJ related problem with the Solr Connector
>> found
>> > in RC3.  All tests pass.
>> >
>> > The release artifact can be found at:
>> >
>> >
>> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
>> >
>> > There is also a tag at:
>> >
>> > https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC3
>> >
>> > Thanks again,
>> > Karl Wright
>> >
>>
>> --
>> Julien MASSIERA
>> Directeur développement produit
>> France Labs – Les experts du Search
>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
>> www.francelabs.com
>>
>>

Re: [VOTE] Release Apache ManifoldCF 2.11, RC3

Posted by Karl Wright <da...@gmail.com>.
Hi Julien,

This has nothing to do with the new Tika.

It is not normal; it means that UpdateRequests are not being sent as
multipart form posts.  It's going to require work from the Solr team to fix
this problem, however, because everything I do to work around the issue
nonetheless seems to fail. :-(

I'm having a back-and-forth with Paul Noble right now.  I'll update
accordingly when I know more.

Karl


On Mon, Sep 24, 2018 at 11:33 AM Julien Massiera <
julien.massiera@francelabs.com> wrote:

> After testing it, it is a +1 for me
>
> However, I found a new interesting issue coming with the new Tika
> version. I had a jpg file for which some metadata were not extracted
> before, like the RedTRC, BlueTRC and GreenTRC which contain
> approximatively 2048 bytes of data each. As the metadata are passed to
> Solr through the URI, I get the following error : URI is too large >8192
>
> Do we consider it as a "normal issue" or is it worth checking the
> metadata length before sending the ingest request ?
>
>
> On 24/09/2018 16:43, Karl Wright wrote:
> > Please vote on whether to release ManifoldCF 2.11, RC3.  This release
> > contains a number of fixes/improvements/additions, described in the
> > CHANGES.txt file.  In addition, it includes Tika 1.19, which has a number
> > of fixes for classpath issues specifically requested by ManifoldCF.
> >
> > This completely fixes a SolrJ related problem with the Solr Connector
> found
> > in RC3.  All tests pass.
> >
> > The release artifact can be found at:
> >
> > https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
> >
> > There is also a tag at:
> >
> > https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC3
> >
> > Thanks again,
> > Karl Wright
> >
>
> --
> Julien MASSIERA
> Directeur développement produit
> France Labs – Les experts du Search
> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
> www.francelabs.com
>
>

Re: [VOTE] Release Apache ManifoldCF 2.11, RC3

Posted by Julien Massiera <ju...@francelabs.com>.
After testing it, it is a +1 for me

However, I found a new interesting issue coming with the new Tika 
version. I had a jpg file for which some metadata were not extracted 
before, like the RedTRC, BlueTRC and GreenTRC which contain 
approximatively 2048 bytes of data each. As the metadata are passed to 
Solr through the URI, I get the following error : URI is too large >8192

Do we consider it as a "normal issue" or is it worth checking the 
metadata length before sending the ingest request ?


On 24/09/2018 16:43, Karl Wright wrote:
> Please vote on whether to release ManifoldCF 2.11, RC3.  This release
> contains a number of fixes/improvements/additions, described in the
> CHANGES.txt file.  In addition, it includes Tika 1.19, which has a number
> of fixes for classpath issues specifically requested by ManifoldCF.
>
> This completely fixes a SolrJ related problem with the Solr Connector found
> in RC3.  All tests pass.
>
> The release artifact can be found at:
>
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
>
> There is also a tag at:
>
> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC3
>
> Thanks again,
> Karl Wright
>

-- 
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
www.francelabs.com