You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by TC Tobin-Campbell <TC...@epic.com> on 2013/05/16 18:21:20 UTC

ManifoldCF and Kerberos/Basic Authentication

Hi there,
I'm trying to connect ManifoldCF to an internal wiki at my company. The ManifoldCF wiki connector supplies a username and password field for the wiki api, however, at my company, a username and password is required to connect to the apache server running the wiki site, and after that authentication takes place, those credentials are passed on to the wiki api.

So, essentially, I need a way to have ManifoldCF pass my windows credentials on when trying to make its connection. Using the api login fields does not work.

We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html> (AuthType Kerberos).  My understanding based on that linked documentation is that this module does use Basic Auth to communicate with the browser.

Is there anything we can to make ManifoldCF authenticate in this scenario?

Thanks,


TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000

Sherlock<https://sherlock.epic.com/> (Issue tracking)
Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx> (Common setup and support tasks)
Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch> (Epic reports documentation)
Nova<https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx> (Release note management)
Galaxy<https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic documentation)


RE: ManifoldCF and Kerberos/Basic Authentication

Posted by TC Tobin-Campbell <TC...@epic.com>.
Yeah, I just noticed I have java issues, which is where my problems are coming in. Thanks.

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Friday, May 31, 2013 10:02 AM
To: user@manifoldcf.apache.org
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

You can look at the how-to-build-and-deploy page, but basically this is what you need to do:

ant make-core-deps
ant build
... then you will find everything the same as you are familiar with under the dist directory.

Karl

On Fri, May 31, 2013 at 10:59 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hey Karl,
This is probably a dumb question, so consider yourself warned.

I got the trunk downloaded and built, but now can't figure out how I'm supposed to run ManifoldCF from it. In the prebuilt downloads, there are example directories where I can go in and just click the start.jar file and it all kicks off fine. Not in the trunk. Any suggestions?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Friday, May 24, 2013 12:50 PM

To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

I had a second so I finished this.  Trunk now has support for basic auth.  You enter the credentials on the server tab underneath the API credentials.  Please give it a try and let me know if it works for you.

Karl

On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com>> wrote:
CONNECTORS-692.  I will probably look at this over the weekend.
Karl

On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Unless I'm very much mistaken, there are no Apache kerberos session cookies being used on your site, so it should be a straightforward matter to include basic auth credentials to your Apache mod-auth-kerb module for all pages during crawling.
I'll create a ticket for this.

Karl

On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi Karl,
Here's what I know so far.

Our module is configured to use two auth methods: Negotiate and Basic.  In most cases, we use Negotiate, but I'm guessing you'd prefer Basic.

Here's an example header.

GET / HTTP/1.1
Host: wiki.epic.com<http://wiki.epic.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC; .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC; wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5; wiki_pensieveUserName=Lziobro; wiki_pensieveToken=********************be3a3a990a8a
Connection: keep-alive
Authorization: Basic bHppb**************xMjM0   <-I've censored this line so you cannot get my password

If I'm understanding you correctly, there's no way to accomplish this currently? Or, is there some workaround we could implement?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Thursday, May 16, 2013 12:05 PM
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,

Apparently mod-auth-kerb can be configured in a number of different ways.  But if yours will work with basic auth, we can just transmit the credentials each time.  It will be relatively slow because mod-auth-kerb will then need to talk to the kdc on each page fetch, but it should work.  Better yet would be if Apache set a browser cookie containing your tickets, which it knew how to interpret if returned - but I don't see any Google evidence that mod-auth-kerb is capable of that.  But either of these two approaches we could readily implement.
FWIW, the standard way to work with kerberos is for you to actually have tickets already kinit'd and installed on your machine.  Your browser then picks up those tickets and transmits them to the Wiki server (I presume in a header that mod-auth-kerb knows about), and the kdc does not need to be involved.  But initializing that kind of ticket store, and managing the associated kinit requests when necessary, are beyond the scope of any connector we've so far done, so if we had to go that way, that would effectively make this proposal a Research Project.
What would be great to know in advance is how exactly your browser interacts with your Apache server.  Are you familiar with the process of getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark (windows) in order to capture the packet traffic between a browser session and your Apache server, to see exactly what is happening.  Start by shutting down all your browser windows, so there is no in-memory state, and then start the capture and browse to a part of the wiki that is secured by mod-auth-kerb.  We'd want to see if cookies get set, or if any special headers get transmitted by your browser (other than the standard Basic Auth "Authentication" headers).  If the exchange is protected by SSL, then you'll have to use FireFox and use a plugin called LiveHeaders to see what is going on instead.
Please let me know what you find.
Karl


On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Thanks, this is a big help in understanding your setup.
I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to communicate with the browser, and whether it expects the browser to cache the resulting tickets (in cookies?)  I will have to do some research and get back to you on that.
Basically, security for a Wiki is usually handled by the Wiki, but since you've put added auth in front of it by going through mod-auth-kerb, it's something that the Wiki connector would have to understand (and emulate your browser) in order to implement.  So it does not likely support this right now.  It may be relatively easy to do or it may be a challenge - we'll see.  I would also be somewhat concerned that it may not possible to actually reach the API urls through Apache; that would make everything moot if it were true.  Could you confirm that you can visit API urls through your Apache setup?
Karl

On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi there,
I'm trying to connect ManifoldCF to an internal wiki at my company. The ManifoldCF wiki connector supplies a username and password field for the wiki api, however, at my company, a username and password is required to connect to the apache server running the wiki site, and after that authentication takes place, those credentials are passed on to the wiki api.

So, essentially, I need a way to have ManifoldCF pass my windows credentials on when trying to make its connection. Using the api login fields does not work.

We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html> (AuthType Kerberos).  My understanding based on that linked documentation is that this module does use Basic Auth to communicate with the browser.

Is there anything we can to make ManifoldCF authenticate in this scenario?

Thanks,


TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

Sherlock<https://sherlock.epic.com/> (Issue tracking)
Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx> (Common setup and support tasks)
Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch> (Epic reports documentation)
Nova<https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx> (Release note management)
Galaxy<https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic documentation)








Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
You can look at the how-to-build-and-deploy page, but basically this is
what you need to do:

ant make-core-deps
ant build

... then you will find everything the same as you are familiar with under
the dist directory.

Karl



On Fri, May 31, 2013 at 10:59 AM, TC Tobin-Campbell <TC...@epic.com> wrote:

>  Hey Karl,****
>
> This is probably a dumb question, so consider yourself warned. ****
>
> ** **
>
> I got the trunk downloaded and built, but now can’t figure out how I’m
> supposed to run ManifoldCF from it. In the prebuilt downloads, there are
> example directories where I can go in and just click the start.jar file and
> it all kicks off fine. Not in the trunk. Any suggestions? ****
>
> ** **
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
> ** **
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, May 24, 2013 12:50 PM
>
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
> ** **
>
> I had a second so I finished this.  Trunk now has support for basic auth.
> You enter the credentials on the server tab underneath the API
> credentials.  Please give it a try and let me know if it works for you.
>
> Karl****
>
> ** **
>
> On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> CONNECTORS-692.  I will probably look at this over the weekend.****
>
> Karl****
>
> ** **
>
> On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> Hi TC,****
>
> Unless I'm very much mistaken, there are no Apache kerberos session
> cookies being used on your site, so it should be a straightforward matter
> to include basic auth credentials to your Apache mod-auth-kerb module for
> all pages during crawling.****
>
> I'll create a ticket for this.
>
> Karl****
>
> ** **
>
> On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com> wrote:**
> **
>
> Hi Karl,****
>
> Here’s what I know so far.****
>
>  ****
>
> Our module is configured to use two auth methods: Negotiate and Basic.  In
> most cases, we use Negotiate, but I’m guessing you’d prefer Basic.****
>
>  ****
>
> Here’s an example header.****
>
>  ****
>
> GET / HTTP/1.1****
>
> Host: wiki.epic.com****
>
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101
> Firefox/20.0****
>
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8***
> *
>
> Accept-Language: en-US,en;q=0.5****
>
> Accept-Encoding: gzip, deflate****
>
> Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC;
> .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC;
> wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5;
> wiki_pensieveUserName=Lziobro;
> wiki_pensieveToken=********************be3a3a990a8a****
>
> Connection: keep-alive****
>
> Authorization: Basic bHppb**************xMjM0   <-I've censored this line
> so you cannot get my password****
>
>  ****
>
> If I’m understanding you correctly, there’s no way to accomplish this
> currently? Or, is there some workaround we could implement? ****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Thursday, May 16, 2013 12:05 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
>  ****
>
> Hi TC,
>
> Apparently mod-auth-kerb can be configured in a number of different ways.
> But if yours will work with basic auth, we can just transmit the
> credentials each time.  It will be relatively slow because mod-auth-kerb
> will then need to talk to the kdc on each page fetch, but it should work.
> Better yet would be if Apache set a browser cookie containing your tickets,
> which it knew how to interpret if returned - but I don't see any Google
> evidence that mod-auth-kerb is capable of that.  But either of these two
> approaches we could readily implement.****
>
> FWIW, the standard way to work with kerberos is for you to actually have
> tickets already kinit'd and installed on your machine.  Your browser then
> picks up those tickets and transmits them to the Wiki server (I presume in
> a header that mod-auth-kerb knows about), and the kdc does not need to be
> involved.  But initializing that kind of ticket store, and managing the
> associated kinit requests when necessary, are beyond the scope of any
> connector we've so far done, so if we had to go that way, that would
> effectively make this proposal a Research Project.****
>
> What would be great to know in advance is how exactly your browser
> interacts with your Apache server.  Are you familiar with the process of
> getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
> (windows) in order to capture the packet traffic between a browser session
> and your Apache server, to see exactly what is happening.  Start by
> shutting down all your browser windows, so there is no in-memory state, and
> then start the capture and browse to a part of the wiki that is secured by
> mod-auth-kerb.  We'd want to see if cookies get set, or if any special
> headers get transmitted by your browser (other than the standard Basic Auth
> "Authentication" headers).  If the exchange is protected by SSL, then
> you'll have to use FireFox and use a plugin called LiveHeaders to see what
> is going on instead.****
>
> Please let me know what you find.****
>
> Karl****
>
>  ****
>
>  ****
>
> On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> Hi TC,****
>
> Thanks, this is a big help in understanding your setup.****
>
> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
> communicate with the browser, and whether it expects the browser to cache
> the resulting tickets (in cookies?)  I will have to do some research and
> get back to you on that.****
>
> Basically, security for a Wiki is usually handled by the Wiki, but since
> you've put added auth in front of it by going through mod-auth-kerb, it's
> something that the Wiki connector would have to understand (and emulate
> your browser) in order to implement.  So it does not likely support this
> right now.  It may be relatively easy to do or it may be a challenge -
> we'll see.  I would also be somewhat concerned that it may not possible to
> actually reach the API urls through Apache; that would make everything moot
> if it were true.  Could you confirm that you can visit API urls through
> your Apache setup?****
>
> Karl****
>
>  ****
>
> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:**
> **
>
> Hi there,****
>
> I'm trying to connect ManifoldCF to an internal wiki at my company. The
> ManifoldCF wiki connector supplies a username and password field for the
> wiki api, however, at my company, a username and password is required to
> connect to the apache server running the wiki site, and after that
> authentication takes place, those credentials are passed on to the wiki api.
> ****
>
>  ****
>
> So, essentially, I need a way to have ManifoldCF pass my windows
> credentials on when trying to make its connection. Using the api login
> fields does not work.****
>
>  ****
>
> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
> is that this module does use Basic Auth to communicate with the browser.**
> **
>
>  ****
>
> Is there anything we can to make ManifoldCF authenticate in this scenario?
> ****
>
>  ****
>
> Thanks,****
>
>  ****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>
> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
> (Common setup and support tasks)****
>
> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
> ****
>
> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
> ****
>
> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
> documentation)  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>
> ** **
>
> ** **
>

RE: ManifoldCF and Kerberos/Basic Authentication

Posted by TC Tobin-Campbell <TC...@epic.com>.
Hey Karl,
This is probably a dumb question, so consider yourself warned.

I got the trunk downloaded and built, but now can't figure out how I'm supposed to run ManifoldCF from it. In the prebuilt downloads, there are example directories where I can go in and just click the start.jar file and it all kicks off fine. Not in the trunk. Any suggestions?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Friday, May 24, 2013 12:50 PM
To: user@manifoldcf.apache.org
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

I had a second so I finished this.  Trunk now has support for basic auth.  You enter the credentials on the server tab underneath the API credentials.  Please give it a try and let me know if it works for you.

Karl

On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com>> wrote:
CONNECTORS-692.  I will probably look at this over the weekend.
Karl

On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Unless I'm very much mistaken, there are no Apache kerberos session cookies being used on your site, so it should be a straightforward matter to include basic auth credentials to your Apache mod-auth-kerb module for all pages during crawling.
I'll create a ticket for this.

Karl

On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi Karl,
Here's what I know so far.

Our module is configured to use two auth methods: Negotiate and Basic.  In most cases, we use Negotiate, but I'm guessing you'd prefer Basic.

Here's an example header.

GET / HTTP/1.1
Host: wiki.epic.com<http://wiki.epic.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC; .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC; wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5; wiki_pensieveUserName=Lziobro; wiki_pensieveToken=********************be3a3a990a8a
Connection: keep-alive
Authorization: Basic bHppb**************xMjM0   <-I've censored this line so you cannot get my password

If I'm understanding you correctly, there's no way to accomplish this currently? Or, is there some workaround we could implement?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Thursday, May 16, 2013 12:05 PM
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,

Apparently mod-auth-kerb can be configured in a number of different ways.  But if yours will work with basic auth, we can just transmit the credentials each time.  It will be relatively slow because mod-auth-kerb will then need to talk to the kdc on each page fetch, but it should work.  Better yet would be if Apache set a browser cookie containing your tickets, which it knew how to interpret if returned - but I don't see any Google evidence that mod-auth-kerb is capable of that.  But either of these two approaches we could readily implement.
FWIW, the standard way to work with kerberos is for you to actually have tickets already kinit'd and installed on your machine.  Your browser then picks up those tickets and transmits them to the Wiki server (I presume in a header that mod-auth-kerb knows about), and the kdc does not need to be involved.  But initializing that kind of ticket store, and managing the associated kinit requests when necessary, are beyond the scope of any connector we've so far done, so if we had to go that way, that would effectively make this proposal a Research Project.
What would be great to know in advance is how exactly your browser interacts with your Apache server.  Are you familiar with the process of getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark (windows) in order to capture the packet traffic between a browser session and your Apache server, to see exactly what is happening.  Start by shutting down all your browser windows, so there is no in-memory state, and then start the capture and browse to a part of the wiki that is secured by mod-auth-kerb.  We'd want to see if cookies get set, or if any special headers get transmitted by your browser (other than the standard Basic Auth "Authentication" headers).  If the exchange is protected by SSL, then you'll have to use FireFox and use a plugin called LiveHeaders to see what is going on instead.
Please let me know what you find.
Karl


On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Thanks, this is a big help in understanding your setup.
I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to communicate with the browser, and whether it expects the browser to cache the resulting tickets (in cookies?)  I will have to do some research and get back to you on that.
Basically, security for a Wiki is usually handled by the Wiki, but since you've put added auth in front of it by going through mod-auth-kerb, it's something that the Wiki connector would have to understand (and emulate your browser) in order to implement.  So it does not likely support this right now.  It may be relatively easy to do or it may be a challenge - we'll see.  I would also be somewhat concerned that it may not possible to actually reach the API urls through Apache; that would make everything moot if it were true.  Could you confirm that you can visit API urls through your Apache setup?
Karl

On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi there,
I'm trying to connect ManifoldCF to an internal wiki at my company. The ManifoldCF wiki connector supplies a username and password field for the wiki api, however, at my company, a username and password is required to connect to the apache server running the wiki site, and after that authentication takes place, those credentials are passed on to the wiki api.

So, essentially, I need a way to have ManifoldCF pass my windows credentials on when trying to make its connection. Using the api login fields does not work.

We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html> (AuthType Kerberos).  My understanding based on that linked documentation is that this module does use Basic Auth to communicate with the browser.

Is there anything we can to make ManifoldCF authenticate in this scenario?

Thanks,


TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

Sherlock<https://sherlock.epic.com/> (Issue tracking)
Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx> (Common setup and support tasks)
Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch> (Epic reports documentation)
Nova<https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx> (Release note management)
Galaxy<https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic documentation)







RE: ManifoldCF and Kerberos/Basic Authentication

Posted by TC Tobin-Campbell <TC...@epic.com>.
I added in . for both allowed file extensions and allowed MIME types. Still no luck. Is there something else I should be looking at in the job or something?

[cid:image004.png@01CE681C.77E6CAF0]

DEBUG 2013-06-13 09:55:21,794 (Worker thread '44') - WEB: Decided not to ingest 'http://wiki/main/EpicSearch/Test' because it did not match ingestability criteria


TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Thursday, June 13, 2013 8:47 AM
To: user@manifoldcf.apache.org
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,
Please read the comments in the ticket.  You will need to change your ElasticSearch extension list in order for it to accept documents with no extension.  To do that you need to add  a new extension of "." to your extension list.

Karl

On Thu, Jun 13, 2013 at 9:43 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hey Karl,
I updated my working copy, rebuilt using ant, and tried again. I'm still not getting anything to post to Elasticsearch.

[cid:image001.png@01CE681C.59D94FB0]

I did notice this line in the logfile.

DEBUG 2013-06-13 08:25:36,976 (Worker thread '3') - WEB: Decided not to ingest 'http://wiki/main/EpicSearch/Test' because it did not match ingestability criteria

I was poking around in my setup, and still am not seeing anything configured incorrectly. Any other thoughts?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Friday, June 07, 2013 12:29 PM
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Fix checked into trunk.
Karl

On Fri, Jun 7, 2013 at 12:42 PM, Karl Wright <da...@gmail.com>> wrote:
I created the ticket: CONNECTORS-707.

On Fri, Jun 7, 2013 at 12:16 PM, Karl Wright <da...@gmail.com>> wrote:
I looked at the ElasticSearch connector, and it's going to treat these extensions as being "" (empty string).  So your list of allowed extensions will have to include "" if such documents are to be ingested.
Checking now to see if in fact you can just add a blank line to the list of extensions to get this to happen... it looks like you can't:

>>>>>>
      while ((line = br.readLine()) != null)
      {
        line = line.trim();
        if (line.length() > 0)
          set.add(line);
      }
<<<<<<
So, the ElasticSearch connector in its infinite wisdom excludes all documents that have no extension.  Hmm.
Can you open a ticket for this problem?  I'm not quite sure yet how to address it, but clearly this needs to be fixed.

Karl

On Fri, Jun 7, 2013 at 12:07 PM, Karl Wright <da...@gmail.com>> wrote:
The extension of a document comes from the url.  So for the urls listed in your previous mail, they don't appear to have any extension at all.
The code here from the web connector rejects documents because of various reasons, but does not log it:

>>>>>>
    if (cache.getResponseCode(documentIdentifier) != 200)
      return false;

    if (activities.checkLengthIndexable(cache.getDataLength(documentIdentifier)) == false)
      return false;

    if (activities.checkURLIndexable(documentIdentifier) == false)
      return false;

    if (filter.isDocumentIndexable(documentIdentifier) == false)
      return false;

<<<<<<
All you would see if any one of these conditions failed would be:

          if (Logging.connectors.isDebugEnabled())
            Logging.connectors.debug("WEB: Decided not to ingest '"+documentIdentifier+"' because it did not match ingestability criteria");
Do you see that in the log?
Also, bear in mind that since the crawler is incremental, you may need to kick it to make it retry all this so you get debugging output.  You can click the "reingest all" link on your output connection to make that happen...
Karl

On Fri, Jun 7, 2013 at 11:52 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
I took a look at the output connection, and didn't see anything in there that looked like it would cause any issues. I'm including all of the default MIME and file extensions. This should just be html I would think.
[cid:image002.jpg@01CE681C.59D94FB0]

Here's what I'm seeing in the DEBUG output. It seems like we are starting the extraction, but then just aren't doing anything with it?? Seems weird.

DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Waiting to start getting a connection to http://10.8.159.161:80
DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Attempting to get connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Successfully got connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Waiting to start getting a connection to http://10.8.159.161:80
DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Attempting to get connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Successfully got connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Waiting for an HttpClient object
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For http://wiki/main/EpicSearch/Test, discovered matching authentication credentials
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For http://wiki/main/EpicSearch/Test, setting virtual host to wiki
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Got an HttpClient object after 0 ms.
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Get method for '/main/EpicSearch/Test'
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Waiting for an HttpClient object
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For http://wiki.epic.com/main/EpicSearch/Test, discovered matching authentication credentials
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For http://wiki.epic.com/main/EpicSearch/Test, setting virtual host to wiki.epic.com<http://wiki.epic.com>
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Got an HttpClient object after 0 ms.
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Get method for '/main/EpicSearch/Test'
WARN 2013-06-07 10:40:27,900 (Thread-2185) - NEGOTIATE authentication error: Invalid name provided (Mechanism level: Could not load configuration file C:\Windows\krb5.ini (The system cannot find the file specified))
WARN 2013-06-07 10:40:27,900 (Thread-2188) - NEGOTIATE authentication error: Invalid name provided (Mechanism level: Could not load configuration file C:\Windows\krb5.ini (The system cannot find the file specified))
DEBUG 2013-06-07 10:40:28,378 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 128 ms.
DEBUG 2013-06-07 10:40:28,506 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 50 ms.
DEBUG 2013-06-07 10:40:28,556 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 64 ms.
DEBUG 2013-06-07 10:40:28,613 (Thread-2188) - WEB: Performing a read wait on bin 'wiki.epic.com<http://wiki.epic.com>' of 126 ms.
DEBUG 2013-06-07 10:40:28,620 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 47 ms.
INFO 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: FETCH URL|http://wiki/main/EpicSearch/Test|1370619627893+787|200|14438|<http://wiki/main/EpicSearch/Test%7C1370619627893+787%7C200%7C14438%7C>
DEBUG 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: Document 'http://wiki/main/EpicSearch/Test' is text, with encoding 'utf-8'; link extraction starting

Followed by lots of these, which seems appropriate:
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url 'http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test' is illegal because no include patterns match it
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html document 'http://wiki/main/EpicSearch/Test', found an unincluded URL '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url 'http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test' is illegal because no include patterns match it
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html document 'http://wiki/main/EpicSearch/Test', found an unincluded URL '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Friday, June 07, 2013 9:49 AM

To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,
The fact that the fetch is successful means that the URL is included (and not excluded).  The fact that it doesn't mention a robots exclusion means that robots.txt is happy with it.  But it could well be that:
(a) the mimetype is one that your ElasticSearch connection is excluding;
(b) the extension is one that your ElasticSearch connection is excluding.
I would check your output connection, and if that doesn't help turn on connector debugging (in properties.xml, set property "org.apache.manifoldcf.connectors" to "DEBUG").  Then you will see output that describes the consideration process the web connector is going through for each document.
Karl

On Fri, Jun 7, 2013 at 10:43 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Apologies for the delay here Karl. I was able to get this up and running, and the authentication is working. Thanks for getting that in so quickly!

I do have a new issue though. I have an output connection to Elasticsearch setup for this job.

I can see that the crawler is in fact crawling the wiki, and the fetches are all working great. However, it doesn't seem to be attempting to send the pages to the index.

[cid:image003.png@01CE681C.59D94FB0]

I'm not seeing anything in the elasticsearch logs, so it appears we're just not sending anything to Elasticsearch. Could this be related to the change you made? Or is this a completely separate problem?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Friday, May 24, 2013 12:50 PM

To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

I had a second so I finished this.  Trunk now has support for basic auth.  You enter the credentials on the server tab underneath the API credentials.  Please give it a try and let me know if it works for you.

Karl

On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com>> wrote:
CONNECTORS-692.  I will probably look at this over the weekend.
Karl

On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Unless I'm very much mistaken, there are no Apache kerberos session cookies being used on your site, so it should be a straightforward matter to include basic auth credentials to your Apache mod-auth-kerb module for all pages during crawling.
I'll create a ticket for this.

Karl

On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi Karl,
Here's what I know so far.

Our module is configured to use two auth methods: Negotiate and Basic.  In most cases, we use Negotiate, but I'm guessing you'd prefer Basic.

Here's an example header.

GET / HTTP/1.1
Host: wiki.epic.com<http://wiki.epic.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC; .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC; wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5; wiki_pensieveUserName=Lziobro; wiki_pensieveToken=********************be3a3a990a8a
Connection: keep-alive
Authorization: Basic bHppb**************xMjM0   <-I've censored this line so you cannot get my password

If I'm understanding you correctly, there's no way to accomplish this currently? Or, is there some workaround we could implement?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Thursday, May 16, 2013 12:05 PM
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,

Apparently mod-auth-kerb can be configured in a number of different ways.  But if yours will work with basic auth, we can just transmit the credentials each time.  It will be relatively slow because mod-auth-kerb will then need to talk to the kdc on each page fetch, but it should work.  Better yet would be if Apache set a browser cookie containing your tickets, which it knew how to interpret if returned - but I don't see any Google evidence that mod-auth-kerb is capable of that.  But either of these two approaches we could readily implement.
FWIW, the standard way to work with kerberos is for you to actually have tickets already kinit'd and installed on your machine.  Your browser then picks up those tickets and transmits them to the Wiki server (I presume in a header that mod-auth-kerb knows about), and the kdc does not need to be involved.  But initializing that kind of ticket store, and managing the associated kinit requests when necessary, are beyond the scope of any connector we've so far done, so if we had to go that way, that would effectively make this proposal a Research Project.
What would be great to know in advance is how exactly your browser interacts with your Apache server.  Are you familiar with the process of getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark (windows) in order to capture the packet traffic between a browser session and your Apache server, to see exactly what is happening.  Start by shutting down all your browser windows, so there is no in-memory state, and then start the capture and browse to a part of the wiki that is secured by mod-auth-kerb.  We'd want to see if cookies get set, or if any special headers get transmitted by your browser (other than the standard Basic Auth "Authentication" headers).  If the exchange is protected by SSL, then you'll have to use FireFox and use a plugin called LiveHeaders to see what is going on instead.
Please let me know what you find.
Karl


On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Thanks, this is a big help in understanding your setup.
I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to communicate with the browser, and whether it expects the browser to cache the resulting tickets (in cookies?)  I will have to do some research and get back to you on that.
Basically, security for a Wiki is usually handled by the Wiki, but since you've put added auth in front of it by going through mod-auth-kerb, it's something that the Wiki connector would have to understand (and emulate your browser) in order to implement.  So it does not likely support this right now.  It may be relatively easy to do or it may be a challenge - we'll see.  I would also be somewhat concerned that it may not possible to actually reach the API urls through Apache; that would make everything moot if it were true.  Could you confirm that you can visit API urls through your Apache setup?
Karl

On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi there,
I'm trying to connect ManifoldCF to an internal wiki at my company. The ManifoldCF wiki connector supplies a username and password field for the wiki api, however, at my company, a username and password is required to connect to the apache server running the wiki site, and after that authentication takes place, those credentials are passed on to the wiki api.

So, essentially, I need a way to have ManifoldCF pass my windows credentials on when trying to make its connection. Using the api login fields does not work.

We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html> (AuthType Kerberos).  My understanding based on that linked documentation is that this module does use Basic Auth to communicate with the browser.

Is there anything we can to make ManifoldCF authenticate in this scenario?

Thanks,


TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

Sherlock<https://sherlock.epic.com/> (Issue tracking)
Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx> (Common setup and support tasks)
Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch> (Epic reports documentation)
Nova<https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx> (Release note management)
Galaxy<https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic documentation)













Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
Hi TC,

Please read the comments in the ticket.  You will need to change your
ElasticSearch extension list in order for it to accept documents with no
extension.  To do that you need to add  a new extension of "." to your
extension list.

Karl



On Thu, Jun 13, 2013 at 9:43 AM, TC Tobin-Campbell <TC...@epic.com> wrote:

>  Hey Karl,****
>
> I updated my working copy, rebuilt using ant, and tried again. I’m still
> not getting anything to post to Elasticsearch.****
>
> ** **
>
> ****
>
> ** **
>
> I did notice this line in the logfile. ****
>
> ** **
>
> DEBUG 2013-06-13 08:25:36,976 (Worker thread '3') - WEB: Decided not to
> ingest 'http://wiki/main/EpicSearch/Test' because it did not match
> ingestability criteria****
>
> ** **
>
> I was poking around in my setup, and still am not seeing anything
> configured incorrectly. Any other thoughts?****
>
> ** **
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
> ** **
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, June 07, 2013 12:29 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
> ** **
>
> Fix checked into trunk.
> Karl****
>
> ** **
>
> On Fri, Jun 7, 2013 at 12:42 PM, Karl Wright <da...@gmail.com> wrote:**
> **
>
> I created the ticket: CONNECTORS-707.****
>
> ** **
>
> On Fri, Jun 7, 2013 at 12:16 PM, Karl Wright <da...@gmail.com> wrote:**
> **
>
> I looked at the ElasticSearch connector, and it's going to treat these
> extensions as being "" (empty string).  So your list of allowed extensions
> will have to include "" if such documents are to be ingested.****
>
> Checking now to see if in fact you can just add a blank line to the list
> of extensions to get this to happen... it looks like you can't:
>
> >>>>>>
>       while ((line = br.readLine()) != null)
>       {
>         line = line.trim();
>         if (line.length() > 0)
>           set.add(line);
>       }
> <<<<<<****
>
> So, the ElasticSearch connector in its infinite wisdom excludes all
> documents that have no extension.  Hmm.****
>
> Can you open a ticket for this problem?  I'm not quite sure yet how to
> address it, but clearly this needs to be fixed.
>
> Karl****
>
> ** **
>
> On Fri, Jun 7, 2013 at 12:07 PM, Karl Wright <da...@gmail.com> wrote:**
> **
>
> The extension of a document comes from the url.  So for the urls listed in
> your previous mail, they don't appear to have any extension at all.****
>
> The code here from the web connector rejects documents because of various
> reasons, but does not log it:
>
> >>>>>>
>     if (cache.getResponseCode(documentIdentifier) != 200)
>       return false;
>
>     if
> (activities.checkLengthIndexable(cache.getDataLength(documentIdentifier))
> == false)
>       return false;
>
>     if (activities.checkURLIndexable(documentIdentifier) == false)
>       return false;
>
>     if (filter.isDocumentIndexable(documentIdentifier) == false)
>       return false;
>
> <<<<<<****
>
> All you would see if any one of these conditions failed would be:
>
>           if (Logging.connectors.isDebugEnabled())
>             Logging.connectors.debug("WEB: Decided not to ingest
> '"+documentIdentifier+"' because it did not match ingestability criteria");
> ****
>
> Do you see that in the log?****
>
> Also, bear in mind that since the crawler is incremental, you may need to
> kick it to make it retry all this so you get debugging output.  You can
> click the "reingest all" link on your output connection to make that
> happen...****
>
> Karl****
>
> ** **
>
> On Fri, Jun 7, 2013 at 11:52 AM, TC Tobin-Campbell <TC...@epic.com> wrote:***
> *
>
> I took a look at the output connection, and didn’t see anything in there
> that looked like it would cause any issues. I’m including all of the
> default MIME and file extensions. This should just be html I would think.*
> ***
>
> ****
>
>  ****
>
> Here’s what I’m seeing in the DEBUG output. It seems like we are starting
> the extraction, but then just aren’t doing anything with it?? Seems weird.
> ****
>
>  ****
>
> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Waiting to start
> getting a connection to http://10.8.159.161:80****
>
> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Attempting to
> get connection to http://10.8.159.161:80 (0 ms)****
>
> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Successfully got
> connection to http://10.8.159.161:80 (0 ms)****
>
> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Waiting to start
> getting a connection to http://10.8.159.161:80****
>
> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Attempting to
> get connection to http://10.8.159.161:80 (0 ms)****
>
> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Successfully got
> connection to http://10.8.159.161:80 (0 ms)****
>
> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Waiting for an
> HttpClient object****
>
> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For
> http://wiki/main/EpicSearch/Test, discovered matching authentication
> credentials****
>
> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For
> http://wiki/main/EpicSearch/Test, setting virtual host to wiki****
>
> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Got an
> HttpClient object after 0 ms.****
>
> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Get method for
> '/main/EpicSearch/Test'****
>
> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Waiting for an
> HttpClient object****
>
> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For
> http://wiki.epic.com/main/EpicSearch/Test, discovered matching
> authentication credentials****
>
> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For
> http://wiki.epic.com/main/EpicSearch/Test, setting virtual host to
> wiki.epic.com****
>
> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Got an
> HttpClient object after 0 ms.****
>
> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Get method for
> '/main/EpicSearch/Test'****
>
> WARN 2013-06-07 10:40:27,900 (Thread-2185) - NEGOTIATE authentication
> error: Invalid name provided (Mechanism level: Could not load configuration
> file C:\Windows\krb5.ini (The system cannot find the file specified))****
>
> WARN 2013-06-07 10:40:27,900 (Thread-2188) - NEGOTIATE authentication
> error: Invalid name provided (Mechanism level: Could not load configuration
> file C:\Windows\krb5.ini (The system cannot find the file specified))****
>
> DEBUG 2013-06-07 10:40:28,378 (Thread-2185) - WEB: Performing a read wait
> on bin 'wiki' of 128 ms.****
>
> DEBUG 2013-06-07 10:40:28,506 (Thread-2185) - WEB: Performing a read wait
> on bin 'wiki' of 50 ms.****
>
> DEBUG 2013-06-07 10:40:28,556 (Thread-2185) - WEB: Performing a read wait
> on bin 'wiki' of 64 ms.****
>
> DEBUG 2013-06-07 10:40:28,613 (Thread-2188) - WEB: Performing a read wait
> on bin 'wiki.epic.com' of 126 ms.****
>
> DEBUG 2013-06-07 10:40:28,620 (Thread-2185) - WEB: Performing a read wait
> on bin 'wiki' of 47 ms.****
>
> INFO 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: FETCH URL|
> http://wiki/main/EpicSearch/Test|1370619627893+787|200|14438|<http://wiki/main/EpicSearch/Test%7C1370619627893+787%7C200%7C14438%7C>
> ****
>
> DEBUG 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: Document '
> http://wiki/main/EpicSearch/Test' is text, with encoding 'utf-8'; link
> extraction starting****
>
>  ****
>
> *Followed by lots of these, which seems appropriate:*****
>
> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url '
> http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test'
> is illegal because no include patterns match it****
>
> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html document
> 'http://wiki/main/EpicSearch/Test', found an unincluded URL
> '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'****
>
> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url '
> http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test'
> is illegal because no include patterns match it****
>
> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html document
> 'http://wiki/main/EpicSearch/Test', found an unincluded URL
> '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, June 07, 2013 9:49 AM****
>
>
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
>  ****
>
> Hi TC,****
>
> The fact that the fetch is successful means that the URL is included (and
> not excluded).  The fact that it doesn't mention a robots exclusion means
> that robots.txt is happy with it.  But it could well be that:****
>
> (a) the mimetype is one that your ElasticSearch connection is excluding;**
> **
>
> (b) the extension is one that your ElasticSearch connection is excluding.*
> ***
>
> I would check your output connection, and if that doesn't help turn on
> connector debugging (in properties.xml, set property
> "org.apache.manifoldcf.connectors" to "DEBUG").  Then you will see output
> that describes the consideration process the web connector is going through
> for each document.****
>
> Karl****
>
>  ****
>
> On Fri, Jun 7, 2013 at 10:43 AM, TC Tobin-Campbell <TC...@epic.com> wrote:***
> *
>
> Apologies for the delay here Karl. I was able to get this up and running,
> and the authentication is working. Thanks for getting that in so quickly!*
> ***
>
>  ****
>
> I do have a new issue though. I have an output connection to Elasticsearch
> setup for this job. ****
>
>  ****
>
> I can see that the crawler is in fact crawling the wiki, and the fetches
> are all working great. However, it doesn’t seem to be attempting to send
> the pages to the index.****
>
>  ****
>
> ****
>
>  ****
>
> I’m not seeing anything in the elasticsearch logs, so it appears we’re
> just not sending anything to Elasticsearch. Could this be related to the
> change you made? Or is this a completely separate problem?****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, May 24, 2013 12:50 PM****
>
>
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
>  ****
>
> I had a second so I finished this.  Trunk now has support for basic auth.
> You enter the credentials on the server tab underneath the API
> credentials.  Please give it a try and let me know if it works for you.
>
> Karl****
>
>  ****
>
> On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> CONNECTORS-692.  I will probably look at this over the weekend.****
>
> Karl****
>
>  ****
>
> On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> Hi TC,****
>
> Unless I'm very much mistaken, there are no Apache kerberos session
> cookies being used on your site, so it should be a straightforward matter
> to include basic auth credentials to your Apache mod-auth-kerb module for
> all pages during crawling.****
>
> I'll create a ticket for this.
>
> Karl****
>
>  ****
>
> On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com> wrote:**
> **
>
> Hi Karl,****
>
> Here’s what I know so far.****
>
>  ****
>
> Our module is configured to use two auth methods: Negotiate and Basic.  In
> most cases, we use Negotiate, but I’m guessing you’d prefer Basic.****
>
>  ****
>
> Here’s an example header.****
>
>  ****
>
> GET / HTTP/1.1****
>
> Host: wiki.epic.com****
>
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101
> Firefox/20.0****
>
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8***
> *
>
> Accept-Language: en-US,en;q=0.5****
>
> Accept-Encoding: gzip, deflate****
>
> Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC;
> .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC;
> wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5;
> wiki_pensieveUserName=Lziobro;
> wiki_pensieveToken=********************be3a3a990a8a****
>
> Connection: keep-alive****
>
> Authorization: Basic bHppb**************xMjM0   <-I've censored this line
> so you cannot get my password****
>
>  ****
>
> If I’m understanding you correctly, there’s no way to accomplish this
> currently? Or, is there some workaround we could implement? ****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Thursday, May 16, 2013 12:05 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
>  ****
>
> Hi TC,
>
> Apparently mod-auth-kerb can be configured in a number of different ways.
> But if yours will work with basic auth, we can just transmit the
> credentials each time.  It will be relatively slow because mod-auth-kerb
> will then need to talk to the kdc on each page fetch, but it should work.
> Better yet would be if Apache set a browser cookie containing your tickets,
> which it knew how to interpret if returned - but I don't see any Google
> evidence that mod-auth-kerb is capable of that.  But either of these two
> approaches we could readily implement.****
>
> FWIW, the standard way to work with kerberos is for you to actually have
> tickets already kinit'd and installed on your machine.  Your browser then
> picks up those tickets and transmits them to the Wiki server (I presume in
> a header that mod-auth-kerb knows about), and the kdc does not need to be
> involved.  But initializing that kind of ticket store, and managing the
> associated kinit requests when necessary, are beyond the scope of any
> connector we've so far done, so if we had to go that way, that would
> effectively make this proposal a Research Project.****
>
> What would be great to know in advance is how exactly your browser
> interacts with your Apache server.  Are you familiar with the process of
> getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
> (windows) in order to capture the packet traffic between a browser session
> and your Apache server, to see exactly what is happening.  Start by
> shutting down all your browser windows, so there is no in-memory state, and
> then start the capture and browse to a part of the wiki that is secured by
> mod-auth-kerb.  We'd want to see if cookies get set, or if any special
> headers get transmitted by your browser (other than the standard Basic Auth
> "Authentication" headers).  If the exchange is protected by SSL, then
> you'll have to use FireFox and use a plugin called LiveHeaders to see what
> is going on instead.****
>
> Please let me know what you find.****
>
> Karl****
>
>  ****
>
>  ****
>
> On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> Hi TC,****
>
> Thanks, this is a big help in understanding your setup.****
>
> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
> communicate with the browser, and whether it expects the browser to cache
> the resulting tickets (in cookies?)  I will have to do some research and
> get back to you on that.****
>
> Basically, security for a Wiki is usually handled by the Wiki, but since
> you've put added auth in front of it by going through mod-auth-kerb, it's
> something that the Wiki connector would have to understand (and emulate
> your browser) in order to implement.  So it does not likely support this
> right now.  It may be relatively easy to do or it may be a challenge -
> we'll see.  I would also be somewhat concerned that it may not possible to
> actually reach the API urls through Apache; that would make everything moot
> if it were true.  Could you confirm that you can visit API urls through
> your Apache setup?****
>
> Karl****
>
>  ****
>
> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:**
> **
>
> Hi there,****
>
> I'm trying to connect ManifoldCF to an internal wiki at my company. The
> ManifoldCF wiki connector supplies a username and password field for the
> wiki api, however, at my company, a username and password is required to
> connect to the apache server running the wiki site, and after that
> authentication takes place, those credentials are passed on to the wiki api.
> ****
>
>  ****
>
> So, essentially, I need a way to have ManifoldCF pass my windows
> credentials on when trying to make its connection. Using the api login
> fields does not work.****
>
>  ****
>
> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
> is that this module does use Basic Auth to communicate with the browser.**
> **
>
>  ****
>
> Is there anything we can to make ManifoldCF authenticate in this scenario?
> ****
>
>  ****
>
> Thanks,****
>
>  ****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>
> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
> (Common setup and support tasks)****
>
> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
> ****
>
> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
> ****
>
> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
> documentation)  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>

RE: ManifoldCF and Kerberos/Basic Authentication

Posted by TC Tobin-Campbell <TC...@epic.com>.
Hey Karl,
I updated my working copy, rebuilt using ant, and tried again. I'm still not getting anything to post to Elasticsearch.

[cid:image003.png@01CE6812.1E629400]

I did notice this line in the logfile.

DEBUG 2013-06-13 08:25:36,976 (Worker thread '3') - WEB: Decided not to ingest 'http://wiki/main/EpicSearch/Test' because it did not match ingestability criteria

I was poking around in my setup, and still am not seeing anything configured incorrectly. Any other thoughts?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Friday, June 07, 2013 12:29 PM
To: user@manifoldcf.apache.org
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Fix checked into trunk.
Karl

On Fri, Jun 7, 2013 at 12:42 PM, Karl Wright <da...@gmail.com>> wrote:
I created the ticket: CONNECTORS-707.

On Fri, Jun 7, 2013 at 12:16 PM, Karl Wright <da...@gmail.com>> wrote:
I looked at the ElasticSearch connector, and it's going to treat these extensions as being "" (empty string).  So your list of allowed extensions will have to include "" if such documents are to be ingested.
Checking now to see if in fact you can just add a blank line to the list of extensions to get this to happen... it looks like you can't:

>>>>>>
      while ((line = br.readLine()) != null)
      {
        line = line.trim();
        if (line.length() > 0)
          set.add(line);
      }
<<<<<<
So, the ElasticSearch connector in its infinite wisdom excludes all documents that have no extension.  Hmm.
Can you open a ticket for this problem?  I'm not quite sure yet how to address it, but clearly this needs to be fixed.

Karl

On Fri, Jun 7, 2013 at 12:07 PM, Karl Wright <da...@gmail.com>> wrote:
The extension of a document comes from the url.  So for the urls listed in your previous mail, they don't appear to have any extension at all.
The code here from the web connector rejects documents because of various reasons, but does not log it:

>>>>>>
    if (cache.getResponseCode(documentIdentifier) != 200)
      return false;

    if (activities.checkLengthIndexable(cache.getDataLength(documentIdentifier)) == false)
      return false;

    if (activities.checkURLIndexable(documentIdentifier) == false)
      return false;

    if (filter.isDocumentIndexable(documentIdentifier) == false)
      return false;

<<<<<<
All you would see if any one of these conditions failed would be:

          if (Logging.connectors.isDebugEnabled())
            Logging.connectors.debug("WEB: Decided not to ingest '"+documentIdentifier+"' because it did not match ingestability criteria");
Do you see that in the log?
Also, bear in mind that since the crawler is incremental, you may need to kick it to make it retry all this so you get debugging output.  You can click the "reingest all" link on your output connection to make that happen...
Karl

On Fri, Jun 7, 2013 at 11:52 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
I took a look at the output connection, and didn't see anything in there that looked like it would cause any issues. I'm including all of the default MIME and file extensions. This should just be html I would think.
[cid:image004.jpg@01CE6812.1E629400]

Here's what I'm seeing in the DEBUG output. It seems like we are starting the extraction, but then just aren't doing anything with it?? Seems weird.

DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Waiting to start getting a connection to http://10.8.159.161:80
DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Attempting to get connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Successfully got connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Waiting to start getting a connection to http://10.8.159.161:80
DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Attempting to get connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Successfully got connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Waiting for an HttpClient object
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For http://wiki/main/EpicSearch/Test, discovered matching authentication credentials
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For http://wiki/main/EpicSearch/Test, setting virtual host to wiki
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Got an HttpClient object after 0 ms.
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Get method for '/main/EpicSearch/Test'
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Waiting for an HttpClient object
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For http://wiki.epic.com/main/EpicSearch/Test, discovered matching authentication credentials
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For http://wiki.epic.com/main/EpicSearch/Test, setting virtual host to wiki.epic.com<http://wiki.epic.com>
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Got an HttpClient object after 0 ms.
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Get method for '/main/EpicSearch/Test'
WARN 2013-06-07 10:40:27,900 (Thread-2185) - NEGOTIATE authentication error: Invalid name provided (Mechanism level: Could not load configuration file C:\Windows\krb5.ini (The system cannot find the file specified))
WARN 2013-06-07 10:40:27,900 (Thread-2188) - NEGOTIATE authentication error: Invalid name provided (Mechanism level: Could not load configuration file C:\Windows\krb5.ini (The system cannot find the file specified))
DEBUG 2013-06-07 10:40:28,378 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 128 ms.
DEBUG 2013-06-07 10:40:28,506 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 50 ms.
DEBUG 2013-06-07 10:40:28,556 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 64 ms.
DEBUG 2013-06-07 10:40:28,613 (Thread-2188) - WEB: Performing a read wait on bin 'wiki.epic.com<http://wiki.epic.com>' of 126 ms.
DEBUG 2013-06-07 10:40:28,620 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 47 ms.
INFO 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: FETCH URL|http://wiki/main/EpicSearch/Test|1370619627893+787|200|14438|<http://wiki/main/EpicSearch/Test%7C1370619627893+787%7C200%7C14438%7C>
DEBUG 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: Document 'http://wiki/main/EpicSearch/Test' is text, with encoding 'utf-8'; link extraction starting

Followed by lots of these, which seems appropriate:
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url 'http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test' is illegal because no include patterns match it
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html document 'http://wiki/main/EpicSearch/Test', found an unincluded URL '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url 'http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test' is illegal because no include patterns match it
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html document 'http://wiki/main/EpicSearch/Test', found an unincluded URL '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Friday, June 07, 2013 9:49 AM

To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,
The fact that the fetch is successful means that the URL is included (and not excluded).  The fact that it doesn't mention a robots exclusion means that robots.txt is happy with it.  But it could well be that:
(a) the mimetype is one that your ElasticSearch connection is excluding;
(b) the extension is one that your ElasticSearch connection is excluding.
I would check your output connection, and if that doesn't help turn on connector debugging (in properties.xml, set property "org.apache.manifoldcf.connectors" to "DEBUG").  Then you will see output that describes the consideration process the web connector is going through for each document.
Karl

On Fri, Jun 7, 2013 at 10:43 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Apologies for the delay here Karl. I was able to get this up and running, and the authentication is working. Thanks for getting that in so quickly!

I do have a new issue though. I have an output connection to Elasticsearch setup for this job.

I can see that the crawler is in fact crawling the wiki, and the fetches are all working great. However, it doesn't seem to be attempting to send the pages to the index.

[cid:image005.png@01CE6812.1E629400]

I'm not seeing anything in the elasticsearch logs, so it appears we're just not sending anything to Elasticsearch. Could this be related to the change you made? Or is this a completely separate problem?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Friday, May 24, 2013 12:50 PM

To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

I had a second so I finished this.  Trunk now has support for basic auth.  You enter the credentials on the server tab underneath the API credentials.  Please give it a try and let me know if it works for you.

Karl

On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com>> wrote:
CONNECTORS-692.  I will probably look at this over the weekend.
Karl

On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Unless I'm very much mistaken, there are no Apache kerberos session cookies being used on your site, so it should be a straightforward matter to include basic auth credentials to your Apache mod-auth-kerb module for all pages during crawling.
I'll create a ticket for this.

Karl

On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi Karl,
Here's what I know so far.

Our module is configured to use two auth methods: Negotiate and Basic.  In most cases, we use Negotiate, but I'm guessing you'd prefer Basic.

Here's an example header.

GET / HTTP/1.1
Host: wiki.epic.com<http://wiki.epic.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC; .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC; wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5; wiki_pensieveUserName=Lziobro; wiki_pensieveToken=********************be3a3a990a8a
Connection: keep-alive
Authorization: Basic bHppb**************xMjM0   <-I've censored this line so you cannot get my password

If I'm understanding you correctly, there's no way to accomplish this currently? Or, is there some workaround we could implement?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Thursday, May 16, 2013 12:05 PM
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,

Apparently mod-auth-kerb can be configured in a number of different ways.  But if yours will work with basic auth, we can just transmit the credentials each time.  It will be relatively slow because mod-auth-kerb will then need to talk to the kdc on each page fetch, but it should work.  Better yet would be if Apache set a browser cookie containing your tickets, which it knew how to interpret if returned - but I don't see any Google evidence that mod-auth-kerb is capable of that.  But either of these two approaches we could readily implement.
FWIW, the standard way to work with kerberos is for you to actually have tickets already kinit'd and installed on your machine.  Your browser then picks up those tickets and transmits them to the Wiki server (I presume in a header that mod-auth-kerb knows about), and the kdc does not need to be involved.  But initializing that kind of ticket store, and managing the associated kinit requests when necessary, are beyond the scope of any connector we've so far done, so if we had to go that way, that would effectively make this proposal a Research Project.
What would be great to know in advance is how exactly your browser interacts with your Apache server.  Are you familiar with the process of getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark (windows) in order to capture the packet traffic between a browser session and your Apache server, to see exactly what is happening.  Start by shutting down all your browser windows, so there is no in-memory state, and then start the capture and browse to a part of the wiki that is secured by mod-auth-kerb.  We'd want to see if cookies get set, or if any special headers get transmitted by your browser (other than the standard Basic Auth "Authentication" headers).  If the exchange is protected by SSL, then you'll have to use FireFox and use a plugin called LiveHeaders to see what is going on instead.
Please let me know what you find.
Karl


On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Thanks, this is a big help in understanding your setup.
I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to communicate with the browser, and whether it expects the browser to cache the resulting tickets (in cookies?)  I will have to do some research and get back to you on that.
Basically, security for a Wiki is usually handled by the Wiki, but since you've put added auth in front of it by going through mod-auth-kerb, it's something that the Wiki connector would have to understand (and emulate your browser) in order to implement.  So it does not likely support this right now.  It may be relatively easy to do or it may be a challenge - we'll see.  I would also be somewhat concerned that it may not possible to actually reach the API urls through Apache; that would make everything moot if it were true.  Could you confirm that you can visit API urls through your Apache setup?
Karl

On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi there,
I'm trying to connect ManifoldCF to an internal wiki at my company. The ManifoldCF wiki connector supplies a username and password field for the wiki api, however, at my company, a username and password is required to connect to the apache server running the wiki site, and after that authentication takes place, those credentials are passed on to the wiki api.

So, essentially, I need a way to have ManifoldCF pass my windows credentials on when trying to make its connection. Using the api login fields does not work.

We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html> (AuthType Kerberos).  My understanding based on that linked documentation is that this module does use Basic Auth to communicate with the browser.

Is there anything we can to make ManifoldCF authenticate in this scenario?

Thanks,


TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

Sherlock<https://sherlock.epic.com/> (Issue tracking)
Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx> (Common setup and support tasks)
Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch> (Epic reports documentation)
Nova<https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx> (Release note management)
Galaxy<https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic documentation)












Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
Fix checked into trunk.
Karl


On Fri, Jun 7, 2013 at 12:42 PM, Karl Wright <da...@gmail.com> wrote:

> I created the ticket: CONNECTORS-707.
>
>
>
> On Fri, Jun 7, 2013 at 12:16 PM, Karl Wright <da...@gmail.com> wrote:
>
>> I looked at the ElasticSearch connector, and it's going to treat these
>> extensions as being "" (empty string).  So your list of allowed extensions
>> will have to include "" if such documents are to be ingested.
>>
>> Checking now to see if in fact you can just add a blank line to the list
>> of extensions to get this to happen... it looks like you can't:
>>
>> >>>>>>
>>       while ((line = br.readLine()) != null)
>>       {
>>         line = line.trim();
>>         if (line.length() > 0)
>>           set.add(line);
>>       }
>> <<<<<<
>>
>> So, the ElasticSearch connector in its infinite wisdom excludes all
>> documents that have no extension.  Hmm.
>>
>> Can you open a ticket for this problem?  I'm not quite sure yet how to
>> address it, but clearly this needs to be fixed.
>>
>> Karl
>>
>>
>>
>> On Fri, Jun 7, 2013 at 12:07 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> The extension of a document comes from the url.  So for the urls listed
>>> in your previous mail, they don't appear to have any extension at all.
>>>
>>> The code here from the web connector rejects documents because of
>>> various reasons, but does not log it:
>>>
>>> >>>>>>
>>>     if (cache.getResponseCode(documentIdentifier) != 200)
>>>       return false;
>>>
>>>     if
>>> (activities.checkLengthIndexable(cache.getDataLength(documentIdentifier))
>>> == false)
>>>       return false;
>>>
>>>     if (activities.checkURLIndexable(documentIdentifier) == false)
>>>       return false;
>>>
>>>     if (filter.isDocumentIndexable(documentIdentifier) == false)
>>>       return false;
>>>
>>> <<<<<<
>>>
>>> All you would see if any one of these conditions failed would be:
>>>
>>>           if (Logging.connectors.isDebugEnabled())
>>>             Logging.connectors.debug("WEB: Decided not to ingest
>>> '"+documentIdentifier+"' because it did not match ingestability criteria");
>>>
>>> Do you see that in the log?
>>>
>>> Also, bear in mind that since the crawler is incremental, you may need
>>> to kick it to make it retry all this so you get debugging output.  You can
>>> click the "reingest all" link on your output connection to make that
>>> happen...
>>>
>>> Karl
>>>
>>>
>>> On Fri, Jun 7, 2013 at 11:52 AM, TC Tobin-Campbell <TC...@epic.com> wrote:
>>>
>>>>  I took a look at the output connection, and didn’t see anything in
>>>> there that looked like it would cause any issues. I’m including all of the
>>>> default MIME and file extensions. This should just be html I would think.
>>>> ****
>>>>
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> Here’s what I’m seeing in the DEBUG output. It seems like we are
>>>> starting the extraction, but then just aren’t doing anything with it??
>>>> Seems weird.  ****
>>>>
>>>> ** **
>>>>
>>>> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Waiting to
>>>> start getting a connection to http://10.8.159.161:80****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Attempting to
>>>> get connection to http://10.8.159.161:80 (0 ms)****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Successfully
>>>> got connection to http://10.8.159.161:80 (0 ms)****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Waiting to
>>>> start getting a connection to http://10.8.159.161:80****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Attempting to
>>>> get connection to http://10.8.159.161:80 (0 ms)****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Successfully
>>>> got connection to http://10.8.159.161:80 (0 ms)****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Waiting for
>>>> an HttpClient object****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For
>>>> http://wiki/main/EpicSearch/Test, discovered matching authentication
>>>> credentials****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For
>>>> http://wiki/main/EpicSearch/Test, setting virtual host to wiki****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Got an
>>>> HttpClient object after 0 ms.****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Get method
>>>> for '/main/EpicSearch/Test'****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Waiting for
>>>> an HttpClient object****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For
>>>> http://wiki.epic.com/main/EpicSearch/Test, discovered matching
>>>> authentication credentials****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For
>>>> http://wiki.epic.com/main/EpicSearch/Test, setting virtual host to
>>>> wiki.epic.com****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Got an
>>>> HttpClient object after 0 ms.****
>>>>
>>>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Get method
>>>> for '/main/EpicSearch/Test'****
>>>>
>>>> WARN 2013-06-07 10:40:27,900 (Thread-2185) - NEGOTIATE authentication
>>>> error: Invalid name provided (Mechanism level: Could not load configuration
>>>> file C:\Windows\krb5.ini (The system cannot find the file specified))**
>>>> **
>>>>
>>>> WARN 2013-06-07 10:40:27,900 (Thread-2188) - NEGOTIATE authentication
>>>> error: Invalid name provided (Mechanism level: Could not load configuration
>>>> file C:\Windows\krb5.ini (The system cannot find the file specified))**
>>>> **
>>>>
>>>> DEBUG 2013-06-07 10:40:28,378 (Thread-2185) - WEB: Performing a read
>>>> wait on bin 'wiki' of 128 ms.****
>>>>
>>>> DEBUG 2013-06-07 10:40:28,506 (Thread-2185) - WEB: Performing a read
>>>> wait on bin 'wiki' of 50 ms.****
>>>>
>>>> DEBUG 2013-06-07 10:40:28,556 (Thread-2185) - WEB: Performing a read
>>>> wait on bin 'wiki' of 64 ms.****
>>>>
>>>> DEBUG 2013-06-07 10:40:28,613 (Thread-2188) - WEB: Performing a read
>>>> wait on bin 'wiki.epic.com' of 126 ms.****
>>>>
>>>> DEBUG 2013-06-07 10:40:28,620 (Thread-2185) - WEB: Performing a read
>>>> wait on bin 'wiki' of 47 ms.****
>>>>
>>>> INFO 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: FETCH URL|
>>>> http://wiki/main/EpicSearch/Test|1370619627893+787|200|14438|<http://wiki/main/EpicSearch/Test%7C1370619627893+787%7C200%7C14438%7C>
>>>> ****
>>>>
>>>> DEBUG 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: Document '
>>>> http://wiki/main/EpicSearch/Test' is text, with encoding 'utf-8'; link
>>>> extraction starting****
>>>>
>>>> ** **
>>>>
>>>> *Followed by lots of these, which seems appropriate:*
>>>>
>>>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url '
>>>> http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test'
>>>> is illegal because no include patterns match it****
>>>>
>>>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html
>>>> document 'http://wiki/main/EpicSearch/Test', found an unincluded URL
>>>> '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'****
>>>>
>>>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url '
>>>> http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test'
>>>> is illegal because no include patterns match it****
>>>>
>>>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html
>>>> document 'http://wiki/main/EpicSearch/Test', found an unincluded URL
>>>> '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'****
>>>>
>>>> ** **
>>>>
>>>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>>>> 271-9000 ****
>>>>
>>>> ** **
>>>>
>>>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>>>> *Sent:* Friday, June 07, 2013 9:49 AM
>>>>
>>>> *To:* user@manifoldcf.apache.org
>>>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>>>
>>>> ** **
>>>>
>>>> Hi TC,****
>>>>
>>>> The fact that the fetch is successful means that the URL is included
>>>> (and not excluded).  The fact that it doesn't mention a robots exclusion
>>>> means that robots.txt is happy with it.  But it could well be that:****
>>>>
>>>> (a) the mimetype is one that your ElasticSearch connection is excluding;
>>>> ****
>>>>
>>>> (b) the extension is one that your ElasticSearch connection is
>>>> excluding.****
>>>>
>>>> I would check your output connection, and if that doesn't help turn on
>>>> connector debugging (in properties.xml, set property
>>>> "org.apache.manifoldcf.connectors" to "DEBUG").  Then you will see output
>>>> that describes the consideration process the web connector is going through
>>>> for each document.****
>>>>
>>>> Karl****
>>>>
>>>> ** **
>>>>
>>>> On Fri, Jun 7, 2013 at 10:43 AM, TC Tobin-Campbell <TC...@epic.com> wrote:
>>>> ****
>>>>
>>>> Apologies for the delay here Karl. I was able to get this up and
>>>> running, and the authentication is working. Thanks for getting that in so
>>>> quickly!****
>>>>
>>>>  ****
>>>>
>>>> I do have a new issue though. I have an output connection to
>>>> Elasticsearch setup for this job. ****
>>>>
>>>>  ****
>>>>
>>>> I can see that the crawler is in fact crawling the wiki, and the
>>>> fetches are all working great. However, it doesn’t seem to be attempting to
>>>> send the pages to the index.****
>>>>
>>>>  ****
>>>>
>>>> ****
>>>>
>>>>  ****
>>>>
>>>> I’m not seeing anything in the elasticsearch logs, so it appears we’re
>>>> just not sending anything to Elasticsearch. Could this be related to the
>>>> change you made? Or is this a completely separate problem?****
>>>>
>>>>  ****
>>>>
>>>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>>>> 271-9000 ****
>>>>
>>>>  ****
>>>>
>>>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>>>> *Sent:* Friday, May 24, 2013 12:50 PM****
>>>>
>>>>
>>>> *To:* user@manifoldcf.apache.org
>>>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>>>
>>>>  ****
>>>>
>>>> I had a second so I finished this.  Trunk now has support for basic
>>>> auth.  You enter the credentials on the server tab underneath the API
>>>> credentials.  Please give it a try and let me know if it works for you.
>>>>
>>>> Karl****
>>>>
>>>>  ****
>>>>
>>>> On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com>
>>>> wrote:****
>>>>
>>>> CONNECTORS-692.  I will probably look at this over the weekend.****
>>>>
>>>> Karl****
>>>>
>>>>  ****
>>>>
>>>> On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com>
>>>> wrote:****
>>>>
>>>> Hi TC,****
>>>>
>>>> Unless I'm very much mistaken, there are no Apache kerberos session
>>>> cookies being used on your site, so it should be a straightforward matter
>>>> to include basic auth credentials to your Apache mod-auth-kerb module for
>>>> all pages during crawling.****
>>>>
>>>> I'll create a ticket for this.
>>>>
>>>> Karl****
>>>>
>>>>  ****
>>>>
>>>> On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com>
>>>> wrote:****
>>>>
>>>> Hi Karl,****
>>>>
>>>> Here’s what I know so far.****
>>>>
>>>>  ****
>>>>
>>>> Our module is configured to use two auth methods: Negotiate and Basic.
>>>> In most cases, we use Negotiate, but I’m guessing you’d prefer Basic.**
>>>> **
>>>>
>>>>  ****
>>>>
>>>> Here’s an example header.****
>>>>
>>>>  ****
>>>>
>>>> GET / HTTP/1.1****
>>>>
>>>> Host: wiki.epic.com****
>>>>
>>>> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101
>>>> Firefox/20.0****
>>>>
>>>> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>>>> ****
>>>>
>>>> Accept-Language: en-US,en;q=0.5****
>>>>
>>>> Accept-Encoding: gzip, deflate****
>>>>
>>>> Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC;
>>>> .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC;
>>>> wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5;
>>>> wiki_pensieveUserName=Lziobro;
>>>> wiki_pensieveToken=********************be3a3a990a8a****
>>>>
>>>> Connection: keep-alive****
>>>>
>>>> Authorization: Basic bHppb**************xMjM0   <-I've censored this
>>>> line so you cannot get my password****
>>>>
>>>>  ****
>>>>
>>>> If I’m understanding you correctly, there’s no way to accomplish this
>>>> currently? Or, is there some workaround we could implement? ****
>>>>
>>>>  ****
>>>>
>>>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>>>> 271-9000 ****
>>>>
>>>>  ****
>>>>
>>>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>>>> *Sent:* Thursday, May 16, 2013 12:05 PM
>>>> *To:* user@manifoldcf.apache.org
>>>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>>>
>>>>  ****
>>>>
>>>> Hi TC,
>>>>
>>>> Apparently mod-auth-kerb can be configured in a number of different
>>>> ways.  But if yours will work with basic auth, we can just transmit the
>>>> credentials each time.  It will be relatively slow because mod-auth-kerb
>>>> will then need to talk to the kdc on each page fetch, but it should work.
>>>> Better yet would be if Apache set a browser cookie containing your tickets,
>>>> which it knew how to interpret if returned - but I don't see any Google
>>>> evidence that mod-auth-kerb is capable of that.  But either of these two
>>>> approaches we could readily implement.****
>>>>
>>>> FWIW, the standard way to work with kerberos is for you to actually
>>>> have tickets already kinit'd and installed on your machine.  Your browser
>>>> then picks up those tickets and transmits them to the Wiki server (I
>>>> presume in a header that mod-auth-kerb knows about), and the kdc does not
>>>> need to be involved.  But initializing that kind of ticket store, and
>>>> managing the associated kinit requests when necessary, are beyond the scope
>>>> of any connector we've so far done, so if we had to go that way, that would
>>>> effectively make this proposal a Research Project.****
>>>>
>>>> What would be great to know in advance is how exactly your browser
>>>> interacts with your Apache server.  Are you familiar with the process of
>>>> getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
>>>> (windows) in order to capture the packet traffic between a browser session
>>>> and your Apache server, to see exactly what is happening.  Start by
>>>> shutting down all your browser windows, so there is no in-memory state, and
>>>> then start the capture and browse to a part of the wiki that is secured by
>>>> mod-auth-kerb.  We'd want to see if cookies get set, or if any special
>>>> headers get transmitted by your browser (other than the standard Basic Auth
>>>> "Authentication" headers).  If the exchange is protected by SSL, then
>>>> you'll have to use FireFox and use a plugin called LiveHeaders to see what
>>>> is going on instead.****
>>>>
>>>> Please let me know what you find.****
>>>>
>>>> Karl****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>> On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com>
>>>> wrote:****
>>>>
>>>> Hi TC,****
>>>>
>>>> Thanks, this is a big help in understanding your setup.****
>>>>
>>>> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth
>>>> to communicate with the browser, and whether it expects the browser to
>>>> cache the resulting tickets (in cookies?)  I will have to do some research
>>>> and get back to you on that.****
>>>>
>>>> Basically, security for a Wiki is usually handled by the Wiki, but
>>>> since you've put added auth in front of it by going through mod-auth-kerb,
>>>> it's something that the Wiki connector would have to understand (and
>>>> emulate your browser) in order to implement.  So it does not likely support
>>>> this right now.  It may be relatively easy to do or it may be a challenge -
>>>> we'll see.  I would also be somewhat concerned that it may not possible to
>>>> actually reach the API urls through Apache; that would make everything moot
>>>> if it were true.  Could you confirm that you can visit API urls through
>>>> your Apache setup?****
>>>>
>>>> Karl****
>>>>
>>>>  ****
>>>>
>>>> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com>
>>>> wrote:****
>>>>
>>>> Hi there,****
>>>>
>>>> I'm trying to connect ManifoldCF to an internal wiki at my company. The
>>>> ManifoldCF wiki connector supplies a username and password field for the
>>>> wiki api, however, at my company, a username and password is required to
>>>> connect to the apache server running the wiki site, and after that
>>>> authentication takes place, those credentials are passed on to the wiki api.
>>>> ****
>>>>
>>>>  ****
>>>>
>>>> So, essentially, I need a way to have ManifoldCF pass my windows
>>>> credentials on when trying to make its connection. Using the api login
>>>> fields does not work.****
>>>>
>>>>  ****
>>>>
>>>> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
>>>> is that this module does use Basic Auth to communicate with the browser.
>>>> ****
>>>>
>>>>  ****
>>>>
>>>> Is there anything we can to make ManifoldCF authenticate in this
>>>> scenario? ****
>>>>
>>>>  ****
>>>>
>>>> Thanks,****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>>>> 271-9000 ****
>>>>
>>>>  ****
>>>>
>>>> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>>>>
>>>> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
>>>> (Common setup and support tasks)****
>>>>
>>>> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
>>>> ****
>>>>
>>>> Nova<https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
>>>> ****
>>>>
>>>> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
>>>> documentation)  ****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>> ** **
>>>>
>>>
>>>
>>
>

Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
I created the ticket: CONNECTORS-707.



On Fri, Jun 7, 2013 at 12:16 PM, Karl Wright <da...@gmail.com> wrote:

> I looked at the ElasticSearch connector, and it's going to treat these
> extensions as being "" (empty string).  So your list of allowed extensions
> will have to include "" if such documents are to be ingested.
>
> Checking now to see if in fact you can just add a blank line to the list
> of extensions to get this to happen... it looks like you can't:
>
> >>>>>>
>       while ((line = br.readLine()) != null)
>       {
>         line = line.trim();
>         if (line.length() > 0)
>           set.add(line);
>       }
> <<<<<<
>
> So, the ElasticSearch connector in its infinite wisdom excludes all
> documents that have no extension.  Hmm.
>
> Can you open a ticket for this problem?  I'm not quite sure yet how to
> address it, but clearly this needs to be fixed.
>
> Karl
>
>
>
> On Fri, Jun 7, 2013 at 12:07 PM, Karl Wright <da...@gmail.com> wrote:
>
>> The extension of a document comes from the url.  So for the urls listed
>> in your previous mail, they don't appear to have any extension at all.
>>
>> The code here from the web connector rejects documents because of various
>> reasons, but does not log it:
>>
>> >>>>>>
>>     if (cache.getResponseCode(documentIdentifier) != 200)
>>       return false;
>>
>>     if
>> (activities.checkLengthIndexable(cache.getDataLength(documentIdentifier))
>> == false)
>>       return false;
>>
>>     if (activities.checkURLIndexable(documentIdentifier) == false)
>>       return false;
>>
>>     if (filter.isDocumentIndexable(documentIdentifier) == false)
>>       return false;
>>
>> <<<<<<
>>
>> All you would see if any one of these conditions failed would be:
>>
>>           if (Logging.connectors.isDebugEnabled())
>>             Logging.connectors.debug("WEB: Decided not to ingest
>> '"+documentIdentifier+"' because it did not match ingestability criteria");
>>
>> Do you see that in the log?
>>
>> Also, bear in mind that since the crawler is incremental, you may need to
>> kick it to make it retry all this so you get debugging output.  You can
>> click the "reingest all" link on your output connection to make that
>> happen...
>>
>> Karl
>>
>>
>> On Fri, Jun 7, 2013 at 11:52 AM, TC Tobin-Campbell <TC...@epic.com> wrote:
>>
>>>  I took a look at the output connection, and didn’t see anything in
>>> there that looked like it would cause any issues. I’m including all of the
>>> default MIME and file extensions. This should just be html I would think.
>>> ****
>>>
>>> ****
>>>
>>> ** **
>>>
>>> Here’s what I’m seeing in the DEBUG output. It seems like we are
>>> starting the extraction, but then just aren’t doing anything with it??
>>> Seems weird.  ****
>>>
>>> ** **
>>>
>>> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Waiting to
>>> start getting a connection to http://10.8.159.161:80****
>>>
>>> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Attempting to
>>> get connection to http://10.8.159.161:80 (0 ms)****
>>>
>>> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Successfully
>>> got connection to http://10.8.159.161:80 (0 ms)****
>>>
>>> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Waiting to
>>> start getting a connection to http://10.8.159.161:80****
>>>
>>> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Attempting to
>>> get connection to http://10.8.159.161:80 (0 ms)****
>>>
>>> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Successfully
>>> got connection to http://10.8.159.161:80 (0 ms)****
>>>
>>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Waiting for an
>>> HttpClient object****
>>>
>>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For
>>> http://wiki/main/EpicSearch/Test, discovered matching authentication
>>> credentials****
>>>
>>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For
>>> http://wiki/main/EpicSearch/Test, setting virtual host to wiki****
>>>
>>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Got an
>>> HttpClient object after 0 ms.****
>>>
>>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Get method for
>>> '/main/EpicSearch/Test'****
>>>
>>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Waiting for an
>>> HttpClient object****
>>>
>>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For
>>> http://wiki.epic.com/main/EpicSearch/Test, discovered matching
>>> authentication credentials****
>>>
>>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For
>>> http://wiki.epic.com/main/EpicSearch/Test, setting virtual host to
>>> wiki.epic.com****
>>>
>>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Got an
>>> HttpClient object after 0 ms.****
>>>
>>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Get method for
>>> '/main/EpicSearch/Test'****
>>>
>>> WARN 2013-06-07 10:40:27,900 (Thread-2185) - NEGOTIATE authentication
>>> error: Invalid name provided (Mechanism level: Could not load configuration
>>> file C:\Windows\krb5.ini (The system cannot find the file specified))***
>>> *
>>>
>>> WARN 2013-06-07 10:40:27,900 (Thread-2188) - NEGOTIATE authentication
>>> error: Invalid name provided (Mechanism level: Could not load configuration
>>> file C:\Windows\krb5.ini (The system cannot find the file specified))***
>>> *
>>>
>>> DEBUG 2013-06-07 10:40:28,378 (Thread-2185) - WEB: Performing a read
>>> wait on bin 'wiki' of 128 ms.****
>>>
>>> DEBUG 2013-06-07 10:40:28,506 (Thread-2185) - WEB: Performing a read
>>> wait on bin 'wiki' of 50 ms.****
>>>
>>> DEBUG 2013-06-07 10:40:28,556 (Thread-2185) - WEB: Performing a read
>>> wait on bin 'wiki' of 64 ms.****
>>>
>>> DEBUG 2013-06-07 10:40:28,613 (Thread-2188) - WEB: Performing a read
>>> wait on bin 'wiki.epic.com' of 126 ms.****
>>>
>>> DEBUG 2013-06-07 10:40:28,620 (Thread-2185) - WEB: Performing a read
>>> wait on bin 'wiki' of 47 ms.****
>>>
>>> INFO 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: FETCH URL|
>>> http://wiki/main/EpicSearch/Test|1370619627893+787|200|14438|<http://wiki/main/EpicSearch/Test%7C1370619627893+787%7C200%7C14438%7C>
>>> ****
>>>
>>> DEBUG 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: Document '
>>> http://wiki/main/EpicSearch/Test' is text, with encoding 'utf-8'; link
>>> extraction starting****
>>>
>>> ** **
>>>
>>> *Followed by lots of these, which seems appropriate:*
>>>
>>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url '
>>> http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test'
>>> is illegal because no include patterns match it****
>>>
>>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html
>>> document 'http://wiki/main/EpicSearch/Test', found an unincluded URL
>>> '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'****
>>>
>>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url '
>>> http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test'
>>> is illegal because no include patterns match it****
>>>
>>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html
>>> document 'http://wiki/main/EpicSearch/Test', found an unincluded URL
>>> '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'****
>>>
>>> ** **
>>>
>>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>>> 271-9000 ****
>>>
>>> ** **
>>>
>>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>>> *Sent:* Friday, June 07, 2013 9:49 AM
>>>
>>> *To:* user@manifoldcf.apache.org
>>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>>
>>> ** **
>>>
>>> Hi TC,****
>>>
>>> The fact that the fetch is successful means that the URL is included
>>> (and not excluded).  The fact that it doesn't mention a robots exclusion
>>> means that robots.txt is happy with it.  But it could well be that:****
>>>
>>> (a) the mimetype is one that your ElasticSearch connection is excluding;
>>> ****
>>>
>>> (b) the extension is one that your ElasticSearch connection is excluding.
>>> ****
>>>
>>> I would check your output connection, and if that doesn't help turn on
>>> connector debugging (in properties.xml, set property
>>> "org.apache.manifoldcf.connectors" to "DEBUG").  Then you will see output
>>> that describes the consideration process the web connector is going through
>>> for each document.****
>>>
>>> Karl****
>>>
>>> ** **
>>>
>>> On Fri, Jun 7, 2013 at 10:43 AM, TC Tobin-Campbell <TC...@epic.com> wrote:*
>>> ***
>>>
>>> Apologies for the delay here Karl. I was able to get this up and
>>> running, and the authentication is working. Thanks for getting that in so
>>> quickly!****
>>>
>>>  ****
>>>
>>> I do have a new issue though. I have an output connection to
>>> Elasticsearch setup for this job. ****
>>>
>>>  ****
>>>
>>> I can see that the crawler is in fact crawling the wiki, and the fetches
>>> are all working great. However, it doesn’t seem to be attempting to send
>>> the pages to the index.****
>>>
>>>  ****
>>>
>>> ****
>>>
>>>  ****
>>>
>>> I’m not seeing anything in the elasticsearch logs, so it appears we’re
>>> just not sending anything to Elasticsearch. Could this be related to the
>>> change you made? Or is this a completely separate problem?****
>>>
>>>  ****
>>>
>>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>>> 271-9000 ****
>>>
>>>  ****
>>>
>>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>>> *Sent:* Friday, May 24, 2013 12:50 PM****
>>>
>>>
>>> *To:* user@manifoldcf.apache.org
>>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>>
>>>  ****
>>>
>>> I had a second so I finished this.  Trunk now has support for basic
>>> auth.  You enter the credentials on the server tab underneath the API
>>> credentials.  Please give it a try and let me know if it works for you.
>>>
>>> Karl****
>>>
>>>  ****
>>>
>>> On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com>
>>> wrote:****
>>>
>>> CONNECTORS-692.  I will probably look at this over the weekend.****
>>>
>>> Karl****
>>>
>>>  ****
>>>
>>> On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com>
>>> wrote:****
>>>
>>> Hi TC,****
>>>
>>> Unless I'm very much mistaken, there are no Apache kerberos session
>>> cookies being used on your site, so it should be a straightforward matter
>>> to include basic auth credentials to your Apache mod-auth-kerb module for
>>> all pages during crawling.****
>>>
>>> I'll create a ticket for this.
>>>
>>> Karl****
>>>
>>>  ****
>>>
>>> On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com> wrote:
>>> ****
>>>
>>> Hi Karl,****
>>>
>>> Here’s what I know so far.****
>>>
>>>  ****
>>>
>>> Our module is configured to use two auth methods: Negotiate and Basic.
>>> In most cases, we use Negotiate, but I’m guessing you’d prefer Basic.***
>>> *
>>>
>>>  ****
>>>
>>> Here’s an example header.****
>>>
>>>  ****
>>>
>>> GET / HTTP/1.1****
>>>
>>> Host: wiki.epic.com****
>>>
>>> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101
>>> Firefox/20.0****
>>>
>>> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8*
>>> ***
>>>
>>> Accept-Language: en-US,en;q=0.5****
>>>
>>> Accept-Encoding: gzip, deflate****
>>>
>>> Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC;
>>> .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC;
>>> wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5;
>>> wiki_pensieveUserName=Lziobro;
>>> wiki_pensieveToken=********************be3a3a990a8a****
>>>
>>> Connection: keep-alive****
>>>
>>> Authorization: Basic bHppb**************xMjM0   <-I've censored this
>>> line so you cannot get my password****
>>>
>>>  ****
>>>
>>> If I’m understanding you correctly, there’s no way to accomplish this
>>> currently? Or, is there some workaround we could implement? ****
>>>
>>>  ****
>>>
>>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>>> 271-9000 ****
>>>
>>>  ****
>>>
>>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>>> *Sent:* Thursday, May 16, 2013 12:05 PM
>>> *To:* user@manifoldcf.apache.org
>>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>>
>>>  ****
>>>
>>> Hi TC,
>>>
>>> Apparently mod-auth-kerb can be configured in a number of different
>>> ways.  But if yours will work with basic auth, we can just transmit the
>>> credentials each time.  It will be relatively slow because mod-auth-kerb
>>> will then need to talk to the kdc on each page fetch, but it should work.
>>> Better yet would be if Apache set a browser cookie containing your tickets,
>>> which it knew how to interpret if returned - but I don't see any Google
>>> evidence that mod-auth-kerb is capable of that.  But either of these two
>>> approaches we could readily implement.****
>>>
>>> FWIW, the standard way to work with kerberos is for you to actually have
>>> tickets already kinit'd and installed on your machine.  Your browser then
>>> picks up those tickets and transmits them to the Wiki server (I presume in
>>> a header that mod-auth-kerb knows about), and the kdc does not need to be
>>> involved.  But initializing that kind of ticket store, and managing the
>>> associated kinit requests when necessary, are beyond the scope of any
>>> connector we've so far done, so if we had to go that way, that would
>>> effectively make this proposal a Research Project.****
>>>
>>> What would be great to know in advance is how exactly your browser
>>> interacts with your Apache server.  Are you familiar with the process of
>>> getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
>>> (windows) in order to capture the packet traffic between a browser session
>>> and your Apache server, to see exactly what is happening.  Start by
>>> shutting down all your browser windows, so there is no in-memory state, and
>>> then start the capture and browse to a part of the wiki that is secured by
>>> mod-auth-kerb.  We'd want to see if cookies get set, or if any special
>>> headers get transmitted by your browser (other than the standard Basic Auth
>>> "Authentication" headers).  If the exchange is protected by SSL, then
>>> you'll have to use FireFox and use a plugin called LiveHeaders to see what
>>> is going on instead.****
>>>
>>> Please let me know what you find.****
>>>
>>> Karl****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>> On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com>
>>> wrote:****
>>>
>>> Hi TC,****
>>>
>>> Thanks, this is a big help in understanding your setup.****
>>>
>>> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
>>> communicate with the browser, and whether it expects the browser to cache
>>> the resulting tickets (in cookies?)  I will have to do some research and
>>> get back to you on that.****
>>>
>>> Basically, security for a Wiki is usually handled by the Wiki, but since
>>> you've put added auth in front of it by going through mod-auth-kerb, it's
>>> something that the Wiki connector would have to understand (and emulate
>>> your browser) in order to implement.  So it does not likely support this
>>> right now.  It may be relatively easy to do or it may be a challenge -
>>> we'll see.  I would also be somewhat concerned that it may not possible to
>>> actually reach the API urls through Apache; that would make everything moot
>>> if it were true.  Could you confirm that you can visit API urls through
>>> your Apache setup?****
>>>
>>> Karl****
>>>
>>>  ****
>>>
>>> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:
>>> ****
>>>
>>> Hi there,****
>>>
>>> I'm trying to connect ManifoldCF to an internal wiki at my company. The
>>> ManifoldCF wiki connector supplies a username and password field for the
>>> wiki api, however, at my company, a username and password is required to
>>> connect to the apache server running the wiki site, and after that
>>> authentication takes place, those credentials are passed on to the wiki api.
>>> ****
>>>
>>>  ****
>>>
>>> So, essentially, I need a way to have ManifoldCF pass my windows
>>> credentials on when trying to make its connection. Using the api login
>>> fields does not work.****
>>>
>>>  ****
>>>
>>> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
>>> is that this module does use Basic Auth to communicate with the browser.
>>> ****
>>>
>>>  ****
>>>
>>> Is there anything we can to make ManifoldCF authenticate in this
>>> scenario? ****
>>>
>>>  ****
>>>
>>> Thanks,****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>>> 271-9000 ****
>>>
>>>  ****
>>>
>>> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>>>
>>> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
>>> (Common setup and support tasks)****
>>>
>>> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
>>> ****
>>>
>>> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
>>> ****
>>>
>>> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
>>> documentation)  ****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>> ** **
>>>
>>
>>
>

Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
I looked at the ElasticSearch connector, and it's going to treat these
extensions as being "" (empty string).  So your list of allowed extensions
will have to include "" if such documents are to be ingested.

Checking now to see if in fact you can just add a blank line to the list of
extensions to get this to happen... it looks like you can't:

>>>>>>
      while ((line = br.readLine()) != null)
      {
        line = line.trim();
        if (line.length() > 0)
          set.add(line);
      }
<<<<<<

So, the ElasticSearch connector in its infinite wisdom excludes all
documents that have no extension.  Hmm.

Can you open a ticket for this problem?  I'm not quite sure yet how to
address it, but clearly this needs to be fixed.

Karl



On Fri, Jun 7, 2013 at 12:07 PM, Karl Wright <da...@gmail.com> wrote:

> The extension of a document comes from the url.  So for the urls listed in
> your previous mail, they don't appear to have any extension at all.
>
> The code here from the web connector rejects documents because of various
> reasons, but does not log it:
>
> >>>>>>
>     if (cache.getResponseCode(documentIdentifier) != 200)
>       return false;
>
>     if
> (activities.checkLengthIndexable(cache.getDataLength(documentIdentifier))
> == false)
>       return false;
>
>     if (activities.checkURLIndexable(documentIdentifier) == false)
>       return false;
>
>     if (filter.isDocumentIndexable(documentIdentifier) == false)
>       return false;
>
> <<<<<<
>
> All you would see if any one of these conditions failed would be:
>
>           if (Logging.connectors.isDebugEnabled())
>             Logging.connectors.debug("WEB: Decided not to ingest
> '"+documentIdentifier+"' because it did not match ingestability criteria");
>
> Do you see that in the log?
>
> Also, bear in mind that since the crawler is incremental, you may need to
> kick it to make it retry all this so you get debugging output.  You can
> click the "reingest all" link on your output connection to make that
> happen...
>
> Karl
>
>
> On Fri, Jun 7, 2013 at 11:52 AM, TC Tobin-Campbell <TC...@epic.com> wrote:
>
>>  I took a look at the output connection, and didn’t see anything in
>> there that looked like it would cause any issues. I’m including all of the
>> default MIME and file extensions. This should just be html I would think.
>> ****
>>
>> ****
>>
>> ** **
>>
>> Here’s what I’m seeing in the DEBUG output. It seems like we are starting
>> the extraction, but then just aren’t doing anything with it?? Seems weird.
>> ****
>>
>> ** **
>>
>> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Waiting to
>> start getting a connection to http://10.8.159.161:80****
>>
>> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Attempting to
>> get connection to http://10.8.159.161:80 (0 ms)****
>>
>> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Successfully
>> got connection to http://10.8.159.161:80 (0 ms)****
>>
>> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Waiting to
>> start getting a connection to http://10.8.159.161:80****
>>
>> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Attempting to
>> get connection to http://10.8.159.161:80 (0 ms)****
>>
>> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Successfully
>> got connection to http://10.8.159.161:80 (0 ms)****
>>
>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Waiting for an
>> HttpClient object****
>>
>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For
>> http://wiki/main/EpicSearch/Test, discovered matching authentication
>> credentials****
>>
>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For
>> http://wiki/main/EpicSearch/Test, setting virtual host to wiki****
>>
>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Got an
>> HttpClient object after 0 ms.****
>>
>> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Get method for
>> '/main/EpicSearch/Test'****
>>
>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Waiting for an
>> HttpClient object****
>>
>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For
>> http://wiki.epic.com/main/EpicSearch/Test, discovered matching
>> authentication credentials****
>>
>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For
>> http://wiki.epic.com/main/EpicSearch/Test, setting virtual host to
>> wiki.epic.com****
>>
>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Got an
>> HttpClient object after 0 ms.****
>>
>> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Get method for
>> '/main/EpicSearch/Test'****
>>
>> WARN 2013-06-07 10:40:27,900 (Thread-2185) - NEGOTIATE authentication
>> error: Invalid name provided (Mechanism level: Could not load configuration
>> file C:\Windows\krb5.ini (The system cannot find the file specified))****
>>
>> WARN 2013-06-07 10:40:27,900 (Thread-2188) - NEGOTIATE authentication
>> error: Invalid name provided (Mechanism level: Could not load configuration
>> file C:\Windows\krb5.ini (The system cannot find the file specified))****
>>
>> DEBUG 2013-06-07 10:40:28,378 (Thread-2185) - WEB: Performing a read wait
>> on bin 'wiki' of 128 ms.****
>>
>> DEBUG 2013-06-07 10:40:28,506 (Thread-2185) - WEB: Performing a read wait
>> on bin 'wiki' of 50 ms.****
>>
>> DEBUG 2013-06-07 10:40:28,556 (Thread-2185) - WEB: Performing a read wait
>> on bin 'wiki' of 64 ms.****
>>
>> DEBUG 2013-06-07 10:40:28,613 (Thread-2188) - WEB: Performing a read wait
>> on bin 'wiki.epic.com' of 126 ms.****
>>
>> DEBUG 2013-06-07 10:40:28,620 (Thread-2185) - WEB: Performing a read wait
>> on bin 'wiki' of 47 ms.****
>>
>> INFO 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: FETCH URL|
>> http://wiki/main/EpicSearch/Test|1370619627893+787|200|14438|<http://wiki/main/EpicSearch/Test%7C1370619627893+787%7C200%7C14438%7C>
>> ****
>>
>> DEBUG 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: Document '
>> http://wiki/main/EpicSearch/Test' is text, with encoding 'utf-8'; link
>> extraction starting****
>>
>> ** **
>>
>> *Followed by lots of these, which seems appropriate:*
>>
>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url '
>> http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test'
>> is illegal because no include patterns match it****
>>
>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html
>> document 'http://wiki/main/EpicSearch/Test', found an unincluded URL
>> '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'****
>>
>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url '
>> http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test'
>> is illegal because no include patterns match it****
>>
>> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html
>> document 'http://wiki/main/EpicSearch/Test', found an unincluded URL
>> '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'****
>>
>> ** **
>>
>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>> 271-9000 ****
>>
>> ** **
>>
>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>> *Sent:* Friday, June 07, 2013 9:49 AM
>>
>> *To:* user@manifoldcf.apache.org
>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>
>> ** **
>>
>> Hi TC,****
>>
>> The fact that the fetch is successful means that the URL is included (and
>> not excluded).  The fact that it doesn't mention a robots exclusion means
>> that robots.txt is happy with it.  But it could well be that:****
>>
>> (a) the mimetype is one that your ElasticSearch connection is excluding;*
>> ***
>>
>> (b) the extension is one that your ElasticSearch connection is excluding.
>> ****
>>
>> I would check your output connection, and if that doesn't help turn on
>> connector debugging (in properties.xml, set property
>> "org.apache.manifoldcf.connectors" to "DEBUG").  Then you will see output
>> that describes the consideration process the web connector is going through
>> for each document.****
>>
>> Karl****
>>
>> ** **
>>
>> On Fri, Jun 7, 2013 at 10:43 AM, TC Tobin-Campbell <TC...@epic.com> wrote:**
>> **
>>
>> Apologies for the delay here Karl. I was able to get this up and running,
>> and the authentication is working. Thanks for getting that in so quickly!
>> ****
>>
>>  ****
>>
>> I do have a new issue though. I have an output connection to
>> Elasticsearch setup for this job. ****
>>
>>  ****
>>
>> I can see that the crawler is in fact crawling the wiki, and the fetches
>> are all working great. However, it doesn’t seem to be attempting to send
>> the pages to the index.****
>>
>>  ****
>>
>> ****
>>
>>  ****
>>
>> I’m not seeing anything in the elasticsearch logs, so it appears we’re
>> just not sending anything to Elasticsearch. Could this be related to the
>> change you made? Or is this a completely separate problem?****
>>
>>  ****
>>
>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>> 271-9000 ****
>>
>>  ****
>>
>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>> *Sent:* Friday, May 24, 2013 12:50 PM****
>>
>>
>> *To:* user@manifoldcf.apache.org
>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>
>>  ****
>>
>> I had a second so I finished this.  Trunk now has support for basic
>> auth.  You enter the credentials on the server tab underneath the API
>> credentials.  Please give it a try and let me know if it works for you.
>>
>> Karl****
>>
>>  ****
>>
>> On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com> wrote:
>> ****
>>
>> CONNECTORS-692.  I will probably look at this over the weekend.****
>>
>> Karl****
>>
>>  ****
>>
>> On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com> wrote:
>> ****
>>
>> Hi TC,****
>>
>> Unless I'm very much mistaken, there are no Apache kerberos session
>> cookies being used on your site, so it should be a straightforward matter
>> to include basic auth credentials to your Apache mod-auth-kerb module for
>> all pages during crawling.****
>>
>> I'll create a ticket for this.
>>
>> Karl****
>>
>>  ****
>>
>> On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com> wrote:*
>> ***
>>
>> Hi Karl,****
>>
>> Here’s what I know so far.****
>>
>>  ****
>>
>> Our module is configured to use two auth methods: Negotiate and Basic.
>> In most cases, we use Negotiate, but I’m guessing you’d prefer Basic.****
>>
>>  ****
>>
>> Here’s an example header.****
>>
>>  ****
>>
>> GET / HTTP/1.1****
>>
>> Host: wiki.epic.com****
>>
>> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101
>> Firefox/20.0****
>>
>> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8**
>> **
>>
>> Accept-Language: en-US,en;q=0.5****
>>
>> Accept-Encoding: gzip, deflate****
>>
>> Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC;
>> .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC;
>> wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5;
>> wiki_pensieveUserName=Lziobro;
>> wiki_pensieveToken=********************be3a3a990a8a****
>>
>> Connection: keep-alive****
>>
>> Authorization: Basic bHppb**************xMjM0   <-I've censored this line
>> so you cannot get my password****
>>
>>  ****
>>
>> If I’m understanding you correctly, there’s no way to accomplish this
>> currently? Or, is there some workaround we could implement? ****
>>
>>  ****
>>
>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>> 271-9000 ****
>>
>>  ****
>>
>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>> *Sent:* Thursday, May 16, 2013 12:05 PM
>> *To:* user@manifoldcf.apache.org
>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>
>>  ****
>>
>> Hi TC,
>>
>> Apparently mod-auth-kerb can be configured in a number of different
>> ways.  But if yours will work with basic auth, we can just transmit the
>> credentials each time.  It will be relatively slow because mod-auth-kerb
>> will then need to talk to the kdc on each page fetch, but it should work.
>> Better yet would be if Apache set a browser cookie containing your tickets,
>> which it knew how to interpret if returned - but I don't see any Google
>> evidence that mod-auth-kerb is capable of that.  But either of these two
>> approaches we could readily implement.****
>>
>> FWIW, the standard way to work with kerberos is for you to actually have
>> tickets already kinit'd and installed on your machine.  Your browser then
>> picks up those tickets and transmits them to the Wiki server (I presume in
>> a header that mod-auth-kerb knows about), and the kdc does not need to be
>> involved.  But initializing that kind of ticket store, and managing the
>> associated kinit requests when necessary, are beyond the scope of any
>> connector we've so far done, so if we had to go that way, that would
>> effectively make this proposal a Research Project.****
>>
>> What would be great to know in advance is how exactly your browser
>> interacts with your Apache server.  Are you familiar with the process of
>> getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
>> (windows) in order to capture the packet traffic between a browser session
>> and your Apache server, to see exactly what is happening.  Start by
>> shutting down all your browser windows, so there is no in-memory state, and
>> then start the capture and browse to a part of the wiki that is secured by
>> mod-auth-kerb.  We'd want to see if cookies get set, or if any special
>> headers get transmitted by your browser (other than the standard Basic Auth
>> "Authentication" headers).  If the exchange is protected by SSL, then
>> you'll have to use FireFox and use a plugin called LiveHeaders to see what
>> is going on instead.****
>>
>> Please let me know what you find.****
>>
>> Karl****
>>
>>  ****
>>
>>  ****
>>
>> On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com> wrote:
>> ****
>>
>> Hi TC,****
>>
>> Thanks, this is a big help in understanding your setup.****
>>
>> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
>> communicate with the browser, and whether it expects the browser to cache
>> the resulting tickets (in cookies?)  I will have to do some research and
>> get back to you on that.****
>>
>> Basically, security for a Wiki is usually handled by the Wiki, but since
>> you've put added auth in front of it by going through mod-auth-kerb, it's
>> something that the Wiki connector would have to understand (and emulate
>> your browser) in order to implement.  So it does not likely support this
>> right now.  It may be relatively easy to do or it may be a challenge -
>> we'll see.  I would also be somewhat concerned that it may not possible to
>> actually reach the API urls through Apache; that would make everything moot
>> if it were true.  Could you confirm that you can visit API urls through
>> your Apache setup?****
>>
>> Karl****
>>
>>  ****
>>
>> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:*
>> ***
>>
>> Hi there,****
>>
>> I'm trying to connect ManifoldCF to an internal wiki at my company. The
>> ManifoldCF wiki connector supplies a username and password field for the
>> wiki api, however, at my company, a username and password is required to
>> connect to the apache server running the wiki site, and after that
>> authentication takes place, those credentials are passed on to the wiki api.
>> ****
>>
>>  ****
>>
>> So, essentially, I need a way to have ManifoldCF pass my windows
>> credentials on when trying to make its connection. Using the api login
>> fields does not work.****
>>
>>  ****
>>
>> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
>> is that this module does use Basic Auth to communicate with the browser.*
>> ***
>>
>>  ****
>>
>> Is there anything we can to make ManifoldCF authenticate in this
>> scenario? ****
>>
>>  ****
>>
>> Thanks,****
>>
>>  ****
>>
>>  ****
>>
>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>> 271-9000 ****
>>
>>  ****
>>
>> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>>
>> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
>> (Common setup and support tasks)****
>>
>> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
>> ****
>>
>> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
>> ****
>>
>> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
>> documentation)  ****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>>  ****
>>
>> ** **
>>
>
>

Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
The extension of a document comes from the url.  So for the urls listed in
your previous mail, they don't appear to have any extension at all.

The code here from the web connector rejects documents because of various
reasons, but does not log it:

>>>>>>
    if (cache.getResponseCode(documentIdentifier) != 200)
      return false;

    if
(activities.checkLengthIndexable(cache.getDataLength(documentIdentifier))
== false)
      return false;

    if (activities.checkURLIndexable(documentIdentifier) == false)
      return false;

    if (filter.isDocumentIndexable(documentIdentifier) == false)
      return false;

<<<<<<

All you would see if any one of these conditions failed would be:

          if (Logging.connectors.isDebugEnabled())
            Logging.connectors.debug("WEB: Decided not to ingest
'"+documentIdentifier+"' because it did not match ingestability criteria");

Do you see that in the log?

Also, bear in mind that since the crawler is incremental, you may need to
kick it to make it retry all this so you get debugging output.  You can
click the "reingest all" link on your output connection to make that
happen...

Karl


On Fri, Jun 7, 2013 at 11:52 AM, TC Tobin-Campbell <TC...@epic.com> wrote:

>  I took a look at the output connection, and didn’t see anything in there
> that looked like it would cause any issues. I’m including all of the
> default MIME and file extensions. This should just be html I would think.*
> ***
>
> ****
>
> ** **
>
> Here’s what I’m seeing in the DEBUG output. It seems like we are starting
> the extraction, but then just aren’t doing anything with it?? Seems weird.
> ****
>
> ** **
>
> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Waiting to start
> getting a connection to http://10.8.159.161:80****
>
> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Attempting to
> get connection to http://10.8.159.161:80 (0 ms)****
>
> DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Successfully got
> connection to http://10.8.159.161:80 (0 ms)****
>
> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Waiting to start
> getting a connection to http://10.8.159.161:80****
>
> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Attempting to
> get connection to http://10.8.159.161:80 (0 ms)****
>
> DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Successfully got
> connection to http://10.8.159.161:80 (0 ms)****
>
> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Waiting for an
> HttpClient object****
>
> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For
> http://wiki/main/EpicSearch/Test, discovered matching authentication
> credentials****
>
> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For
> http://wiki/main/EpicSearch/Test, setting virtual host to wiki****
>
> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Got an
> HttpClient object after 0 ms.****
>
> DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Get method for
> '/main/EpicSearch/Test'****
>
> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Waiting for an
> HttpClient object****
>
> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For
> http://wiki.epic.com/main/EpicSearch/Test, discovered matching
> authentication credentials****
>
> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For
> http://wiki.epic.com/main/EpicSearch/Test, setting virtual host to
> wiki.epic.com****
>
> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Got an
> HttpClient object after 0 ms.****
>
> DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Get method for
> '/main/EpicSearch/Test'****
>
> WARN 2013-06-07 10:40:27,900 (Thread-2185) - NEGOTIATE authentication
> error: Invalid name provided (Mechanism level: Could not load configuration
> file C:\Windows\krb5.ini (The system cannot find the file specified))****
>
> WARN 2013-06-07 10:40:27,900 (Thread-2188) - NEGOTIATE authentication
> error: Invalid name provided (Mechanism level: Could not load configuration
> file C:\Windows\krb5.ini (The system cannot find the file specified))****
>
> DEBUG 2013-06-07 10:40:28,378 (Thread-2185) - WEB: Performing a read wait
> on bin 'wiki' of 128 ms.****
>
> DEBUG 2013-06-07 10:40:28,506 (Thread-2185) - WEB: Performing a read wait
> on bin 'wiki' of 50 ms.****
>
> DEBUG 2013-06-07 10:40:28,556 (Thread-2185) - WEB: Performing a read wait
> on bin 'wiki' of 64 ms.****
>
> DEBUG 2013-06-07 10:40:28,613 (Thread-2188) - WEB: Performing a read wait
> on bin 'wiki.epic.com' of 126 ms.****
>
> DEBUG 2013-06-07 10:40:28,620 (Thread-2185) - WEB: Performing a read wait
> on bin 'wiki' of 47 ms.****
>
> INFO 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: FETCH URL|
> http://wiki/main/EpicSearch/Test|1370619627893+787|200|14438|<http://wiki/main/EpicSearch/Test%7C1370619627893+787%7C200%7C14438%7C>
> ****
>
> DEBUG 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: Document '
> http://wiki/main/EpicSearch/Test' is text, with encoding 'utf-8'; link
> extraction starting****
>
> ** **
>
> *Followed by lots of these, which seems appropriate:*
>
> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url '
> http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test'
> is illegal because no include patterns match it****
>
> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html document
> 'http://wiki/main/EpicSearch/Test', found an unincluded URL
> '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'****
>
> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url '
> http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test'
> is illegal because no include patterns match it****
>
> DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html document
> 'http://wiki/main/EpicSearch/Test', found an unincluded URL
> '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'****
>
> ** **
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
> ** **
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, June 07, 2013 9:49 AM
>
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
> ** **
>
> Hi TC,****
>
> The fact that the fetch is successful means that the URL is included (and
> not excluded).  The fact that it doesn't mention a robots exclusion means
> that robots.txt is happy with it.  But it could well be that:****
>
> (a) the mimetype is one that your ElasticSearch connection is excluding;**
> **
>
> (b) the extension is one that your ElasticSearch connection is excluding.*
> ***
>
> I would check your output connection, and if that doesn't help turn on
> connector debugging (in properties.xml, set property
> "org.apache.manifoldcf.connectors" to "DEBUG").  Then you will see output
> that describes the consideration process the web connector is going through
> for each document.****
>
> Karl****
>
> ** **
>
> On Fri, Jun 7, 2013 at 10:43 AM, TC Tobin-Campbell <TC...@epic.com> wrote:***
> *
>
> Apologies for the delay here Karl. I was able to get this up and running,
> and the authentication is working. Thanks for getting that in so quickly!*
> ***
>
>  ****
>
> I do have a new issue though. I have an output connection to Elasticsearch
> setup for this job. ****
>
>  ****
>
> I can see that the crawler is in fact crawling the wiki, and the fetches
> are all working great. However, it doesn’t seem to be attempting to send
> the pages to the index.****
>
>  ****
>
> ****
>
>  ****
>
> I’m not seeing anything in the elasticsearch logs, so it appears we’re
> just not sending anything to Elasticsearch. Could this be related to the
> change you made? Or is this a completely separate problem?****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, May 24, 2013 12:50 PM****
>
>
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
>  ****
>
> I had a second so I finished this.  Trunk now has support for basic auth.
> You enter the credentials on the server tab underneath the API
> credentials.  Please give it a try and let me know if it works for you.
>
> Karl****
>
>  ****
>
> On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> CONNECTORS-692.  I will probably look at this over the weekend.****
>
> Karl****
>
>  ****
>
> On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> Hi TC,****
>
> Unless I'm very much mistaken, there are no Apache kerberos session
> cookies being used on your site, so it should be a straightforward matter
> to include basic auth credentials to your Apache mod-auth-kerb module for
> all pages during crawling.****
>
> I'll create a ticket for this.
>
> Karl****
>
>  ****
>
> On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com> wrote:**
> **
>
> Hi Karl,****
>
> Here’s what I know so far.****
>
>  ****
>
> Our module is configured to use two auth methods: Negotiate and Basic.  In
> most cases, we use Negotiate, but I’m guessing you’d prefer Basic.****
>
>  ****
>
> Here’s an example header.****
>
>  ****
>
> GET / HTTP/1.1****
>
> Host: wiki.epic.com****
>
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101
> Firefox/20.0****
>
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8***
> *
>
> Accept-Language: en-US,en;q=0.5****
>
> Accept-Encoding: gzip, deflate****
>
> Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC;
> .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC;
> wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5;
> wiki_pensieveUserName=Lziobro;
> wiki_pensieveToken=********************be3a3a990a8a****
>
> Connection: keep-alive****
>
> Authorization: Basic bHppb**************xMjM0   <-I've censored this line
> so you cannot get my password****
>
>  ****
>
> If I’m understanding you correctly, there’s no way to accomplish this
> currently? Or, is there some workaround we could implement? ****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Thursday, May 16, 2013 12:05 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
>  ****
>
> Hi TC,
>
> Apparently mod-auth-kerb can be configured in a number of different ways.
> But if yours will work with basic auth, we can just transmit the
> credentials each time.  It will be relatively slow because mod-auth-kerb
> will then need to talk to the kdc on each page fetch, but it should work.
> Better yet would be if Apache set a browser cookie containing your tickets,
> which it knew how to interpret if returned - but I don't see any Google
> evidence that mod-auth-kerb is capable of that.  But either of these two
> approaches we could readily implement.****
>
> FWIW, the standard way to work with kerberos is for you to actually have
> tickets already kinit'd and installed on your machine.  Your browser then
> picks up those tickets and transmits them to the Wiki server (I presume in
> a header that mod-auth-kerb knows about), and the kdc does not need to be
> involved.  But initializing that kind of ticket store, and managing the
> associated kinit requests when necessary, are beyond the scope of any
> connector we've so far done, so if we had to go that way, that would
> effectively make this proposal a Research Project.****
>
> What would be great to know in advance is how exactly your browser
> interacts with your Apache server.  Are you familiar with the process of
> getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
> (windows) in order to capture the packet traffic between a browser session
> and your Apache server, to see exactly what is happening.  Start by
> shutting down all your browser windows, so there is no in-memory state, and
> then start the capture and browse to a part of the wiki that is secured by
> mod-auth-kerb.  We'd want to see if cookies get set, or if any special
> headers get transmitted by your browser (other than the standard Basic Auth
> "Authentication" headers).  If the exchange is protected by SSL, then
> you'll have to use FireFox and use a plugin called LiveHeaders to see what
> is going on instead.****
>
> Please let me know what you find.****
>
> Karl****
>
>  ****
>
>  ****
>
> On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> Hi TC,****
>
> Thanks, this is a big help in understanding your setup.****
>
> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
> communicate with the browser, and whether it expects the browser to cache
> the resulting tickets (in cookies?)  I will have to do some research and
> get back to you on that.****
>
> Basically, security for a Wiki is usually handled by the Wiki, but since
> you've put added auth in front of it by going through mod-auth-kerb, it's
> something that the Wiki connector would have to understand (and emulate
> your browser) in order to implement.  So it does not likely support this
> right now.  It may be relatively easy to do or it may be a challenge -
> we'll see.  I would also be somewhat concerned that it may not possible to
> actually reach the API urls through Apache; that would make everything moot
> if it were true.  Could you confirm that you can visit API urls through
> your Apache setup?****
>
> Karl****
>
>  ****
>
> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:**
> **
>
> Hi there,****
>
> I'm trying to connect ManifoldCF to an internal wiki at my company. The
> ManifoldCF wiki connector supplies a username and password field for the
> wiki api, however, at my company, a username and password is required to
> connect to the apache server running the wiki site, and after that
> authentication takes place, those credentials are passed on to the wiki api.
> ****
>
>  ****
>
> So, essentially, I need a way to have ManifoldCF pass my windows
> credentials on when trying to make its connection. Using the api login
> fields does not work.****
>
>  ****
>
> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
> is that this module does use Basic Auth to communicate with the browser.**
> **
>
>  ****
>
> Is there anything we can to make ManifoldCF authenticate in this scenario?
> ****
>
>  ****
>
> Thanks,****
>
>  ****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>
> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
> (Common setup and support tasks)****
>
> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
> ****
>
> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
> ****
>
> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
> documentation)  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

RE: ManifoldCF and Kerberos/Basic Authentication

Posted by TC Tobin-Campbell <TC...@epic.com>.
I took a look at the output connection, and didn't see anything in there that looked like it would cause any issues. I'm including all of the default MIME and file extensions. This should just be html I would think.
[cid:image002.jpg@01CE636D.22DB8540]

Here's what I'm seeing in the DEBUG output. It seems like we are starting the extraction, but then just aren't doing anything with it?? Seems weird.

DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Waiting to start getting a connection to http://10.8.159.161:80
DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Attempting to get connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,888 (Worker thread '24') - WEB: Successfully got connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Waiting to start getting a connection to http://10.8.159.161:80
DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Attempting to get connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,889 (Worker thread '20') - WEB: Successfully got connection to http://10.8.159.161:80 (0 ms)
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Waiting for an HttpClient object
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For http://wiki/main/EpicSearch/Test, discovered matching authentication credentials
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: For http://wiki/main/EpicSearch/Test, setting virtual host to wiki
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Got an HttpClient object after 0 ms.
DEBUG 2013-06-07 10:40:27,893 (Worker thread '20') - WEB: Get method for '/main/EpicSearch/Test'
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Waiting for an HttpClient object
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For http://wiki.epic.com/main/EpicSearch/Test, discovered matching authentication credentials
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: For http://wiki.epic.com/main/EpicSearch/Test, setting virtual host to wiki.epic.com
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Got an HttpClient object after 0 ms.
DEBUG 2013-06-07 10:40:27,896 (Worker thread '24') - WEB: Get method for '/main/EpicSearch/Test'
WARN 2013-06-07 10:40:27,900 (Thread-2185) - NEGOTIATE authentication error: Invalid name provided (Mechanism level: Could not load configuration file C:\Windows\krb5.ini (The system cannot find the file specified))
WARN 2013-06-07 10:40:27,900 (Thread-2188) - NEGOTIATE authentication error: Invalid name provided (Mechanism level: Could not load configuration file C:\Windows\krb5.ini (The system cannot find the file specified))
DEBUG 2013-06-07 10:40:28,378 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 128 ms.
DEBUG 2013-06-07 10:40:28,506 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 50 ms.
DEBUG 2013-06-07 10:40:28,556 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 64 ms.
DEBUG 2013-06-07 10:40:28,613 (Thread-2188) - WEB: Performing a read wait on bin 'wiki.epic.com' of 126 ms.
DEBUG 2013-06-07 10:40:28,620 (Thread-2185) - WEB: Performing a read wait on bin 'wiki' of 47 ms.
INFO 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: FETCH URL|http://wiki/main/EpicSearch/Test|1370619627893+787|200|14438|
DEBUG 2013-06-07 10:40:28,682 (Worker thread '20') - WEB: Document 'http://wiki/main/EpicSearch/Test' is text, with encoding 'utf-8'; link extraction starting

Followed by lots of these, which seems appropriate:
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url 'http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test' is illegal because no include patterns match it
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html document 'http://wiki/main/EpicSearch/Test', found an unincluded URL '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: Url 'http://wiki/mediawiki/main/index.php?action=edit&title=EpicSearch/Test' is illegal because no include patterns match it
DEBUG 2013-06-07 10:40:28,683 (Worker thread '20') - WEB: In html document 'http://wiki/main/EpicSearch/Test', found an unincluded URL '/mediawiki/main/index.php?title=EpicSearch/Test&action=edit'

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Friday, June 07, 2013 9:49 AM
To: user@manifoldcf.apache.org
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,
The fact that the fetch is successful means that the URL is included (and not excluded).  The fact that it doesn't mention a robots exclusion means that robots.txt is happy with it.  But it could well be that:
(a) the mimetype is one that your ElasticSearch connection is excluding;
(b) the extension is one that your ElasticSearch connection is excluding.
I would check your output connection, and if that doesn't help turn on connector debugging (in properties.xml, set property "org.apache.manifoldcf.connectors" to "DEBUG").  Then you will see output that describes the consideration process the web connector is going through for each document.
Karl

On Fri, Jun 7, 2013 at 10:43 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Apologies for the delay here Karl. I was able to get this up and running, and the authentication is working. Thanks for getting that in so quickly!

I do have a new issue though. I have an output connection to Elasticsearch setup for this job.

I can see that the crawler is in fact crawling the wiki, and the fetches are all working great. However, it doesn't seem to be attempting to send the pages to the index.

[cid:image001.png@01CE636C.6CDCF1C0]

I'm not seeing anything in the elasticsearch logs, so it appears we're just not sending anything to Elasticsearch. Could this be related to the change you made? Or is this a completely separate problem?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Friday, May 24, 2013 12:50 PM

To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

I had a second so I finished this.  Trunk now has support for basic auth.  You enter the credentials on the server tab underneath the API credentials.  Please give it a try and let me know if it works for you.

Karl

On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com>> wrote:
CONNECTORS-692.  I will probably look at this over the weekend.
Karl

On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Unless I'm very much mistaken, there are no Apache kerberos session cookies being used on your site, so it should be a straightforward matter to include basic auth credentials to your Apache mod-auth-kerb module for all pages during crawling.
I'll create a ticket for this.

Karl

On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi Karl,
Here's what I know so far.

Our module is configured to use two auth methods: Negotiate and Basic.  In most cases, we use Negotiate, but I'm guessing you'd prefer Basic.

Here's an example header.

GET / HTTP/1.1
Host: wiki.epic.com<http://wiki.epic.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC; .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC; wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5; wiki_pensieveUserName=Lziobro; wiki_pensieveToken=********************be3a3a990a8a
Connection: keep-alive
Authorization: Basic bHppb**************xMjM0   <-I've censored this line so you cannot get my password

If I'm understanding you correctly, there's no way to accomplish this currently? Or, is there some workaround we could implement?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Thursday, May 16, 2013 12:05 PM
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,

Apparently mod-auth-kerb can be configured in a number of different ways.  But if yours will work with basic auth, we can just transmit the credentials each time.  It will be relatively slow because mod-auth-kerb will then need to talk to the kdc on each page fetch, but it should work.  Better yet would be if Apache set a browser cookie containing your tickets, which it knew how to interpret if returned - but I don't see any Google evidence that mod-auth-kerb is capable of that.  But either of these two approaches we could readily implement.
FWIW, the standard way to work with kerberos is for you to actually have tickets already kinit'd and installed on your machine.  Your browser then picks up those tickets and transmits them to the Wiki server (I presume in a header that mod-auth-kerb knows about), and the kdc does not need to be involved.  But initializing that kind of ticket store, and managing the associated kinit requests when necessary, are beyond the scope of any connector we've so far done, so if we had to go that way, that would effectively make this proposal a Research Project.
What would be great to know in advance is how exactly your browser interacts with your Apache server.  Are you familiar with the process of getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark (windows) in order to capture the packet traffic between a browser session and your Apache server, to see exactly what is happening.  Start by shutting down all your browser windows, so there is no in-memory state, and then start the capture and browse to a part of the wiki that is secured by mod-auth-kerb.  We'd want to see if cookies get set, or if any special headers get transmitted by your browser (other than the standard Basic Auth "Authentication" headers).  If the exchange is protected by SSL, then you'll have to use FireFox and use a plugin called LiveHeaders to see what is going on instead.
Please let me know what you find.
Karl


On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Thanks, this is a big help in understanding your setup.
I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to communicate with the browser, and whether it expects the browser to cache the resulting tickets (in cookies?)  I will have to do some research and get back to you on that.
Basically, security for a Wiki is usually handled by the Wiki, but since you've put added auth in front of it by going through mod-auth-kerb, it's something that the Wiki connector would have to understand (and emulate your browser) in order to implement.  So it does not likely support this right now.  It may be relatively easy to do or it may be a challenge - we'll see.  I would also be somewhat concerned that it may not possible to actually reach the API urls through Apache; that would make everything moot if it were true.  Could you confirm that you can visit API urls through your Apache setup?
Karl

On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi there,
I'm trying to connect ManifoldCF to an internal wiki at my company. The ManifoldCF wiki connector supplies a username and password field for the wiki api, however, at my company, a username and password is required to connect to the apache server running the wiki site, and after that authentication takes place, those credentials are passed on to the wiki api.

So, essentially, I need a way to have ManifoldCF pass my windows credentials on when trying to make its connection. Using the api login fields does not work.

We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html> (AuthType Kerberos).  My understanding based on that linked documentation is that this module does use Basic Auth to communicate with the browser.

Is there anything we can to make ManifoldCF authenticate in this scenario?

Thanks,


TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

Sherlock<https://sherlock.epic.com/> (Issue tracking)
Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx> (Common setup and support tasks)
Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch> (Epic reports documentation)
Nova<https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx> (Release note management)
Galaxy<https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic documentation)








Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
Hi TC,
The fact that the fetch is successful means that the URL is included (and
not excluded).  The fact that it doesn't mention a robots exclusion means
that robots.txt is happy with it.  But it could well be that:
(a) the mimetype is one that your ElasticSearch connection is excluding;
(b) the extension is one that your ElasticSearch connection is excluding.

I would check your output connection, and if that doesn't help turn on
connector debugging (in properties.xml, set property
"org.apache.manifoldcf.connectors" to "DEBUG").  Then you will see output
that describes the consideration process the web connector is going through
for each document.

Karl



On Fri, Jun 7, 2013 at 10:43 AM, TC Tobin-Campbell <TC...@epic.com> wrote:

>  Apologies for the delay here Karl. I was able to get this up and
> running, and the authentication is working. Thanks for getting that in so
> quickly!****
>
> ** **
>
> I do have a new issue though. I have an output connection to Elasticsearch
> setup for this job. ****
>
> ** **
>
> I can see that the crawler is in fact crawling the wiki, and the fetches
> are all working great. However, it doesn’t seem to be attempting to send
> the pages to the index.****
>
> ** **
>
> ****
>
> ** **
>
> I’m not seeing anything in the elasticsearch logs, so it appears we’re
> just not sending anything to Elasticsearch. Could this be related to the
> change you made? Or is this a completely separate problem?****
>
> ** **
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
> ** **
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, May 24, 2013 12:50 PM
>
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
> ** **
>
> I had a second so I finished this.  Trunk now has support for basic auth.
> You enter the credentials on the server tab underneath the API
> credentials.  Please give it a try and let me know if it works for you.
>
> Karl****
>
> ** **
>
> On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> CONNECTORS-692.  I will probably look at this over the weekend.****
>
> Karl****
>
> ** **
>
> On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> Hi TC,****
>
> Unless I'm very much mistaken, there are no Apache kerberos session
> cookies being used on your site, so it should be a straightforward matter
> to include basic auth credentials to your Apache mod-auth-kerb module for
> all pages during crawling.****
>
> I'll create a ticket for this.
>
> Karl****
>
> ** **
>
> On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com> wrote:**
> **
>
> Hi Karl,****
>
> Here’s what I know so far.****
>
>  ****
>
> Our module is configured to use two auth methods: Negotiate and Basic.  In
> most cases, we use Negotiate, but I’m guessing you’d prefer Basic.****
>
>  ****
>
> Here’s an example header.****
>
>  ****
>
> GET / HTTP/1.1****
>
> Host: wiki.epic.com****
>
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101
> Firefox/20.0****
>
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8***
> *
>
> Accept-Language: en-US,en;q=0.5****
>
> Accept-Encoding: gzip, deflate****
>
> Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC;
> .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC;
> wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5;
> wiki_pensieveUserName=Lziobro;
> wiki_pensieveToken=********************be3a3a990a8a****
>
> Connection: keep-alive****
>
> Authorization: Basic bHppb**************xMjM0   <-I've censored this line
> so you cannot get my password****
>
>  ****
>
> If I’m understanding you correctly, there’s no way to accomplish this
> currently? Or, is there some workaround we could implement? ****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Thursday, May 16, 2013 12:05 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
>  ****
>
> Hi TC,
>
> Apparently mod-auth-kerb can be configured in a number of different ways.
> But if yours will work with basic auth, we can just transmit the
> credentials each time.  It will be relatively slow because mod-auth-kerb
> will then need to talk to the kdc on each page fetch, but it should work.
> Better yet would be if Apache set a browser cookie containing your tickets,
> which it knew how to interpret if returned - but I don't see any Google
> evidence that mod-auth-kerb is capable of that.  But either of these two
> approaches we could readily implement.****
>
> FWIW, the standard way to work with kerberos is for you to actually have
> tickets already kinit'd and installed on your machine.  Your browser then
> picks up those tickets and transmits them to the Wiki server (I presume in
> a header that mod-auth-kerb knows about), and the kdc does not need to be
> involved.  But initializing that kind of ticket store, and managing the
> associated kinit requests when necessary, are beyond the scope of any
> connector we've so far done, so if we had to go that way, that would
> effectively make this proposal a Research Project.****
>
> What would be great to know in advance is how exactly your browser
> interacts with your Apache server.  Are you familiar with the process of
> getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
> (windows) in order to capture the packet traffic between a browser session
> and your Apache server, to see exactly what is happening.  Start by
> shutting down all your browser windows, so there is no in-memory state, and
> then start the capture and browse to a part of the wiki that is secured by
> mod-auth-kerb.  We'd want to see if cookies get set, or if any special
> headers get transmitted by your browser (other than the standard Basic Auth
> "Authentication" headers).  If the exchange is protected by SSL, then
> you'll have to use FireFox and use a plugin called LiveHeaders to see what
> is going on instead.****
>
> Please let me know what you find.****
>
> Karl****
>
>  ****
>
>  ****
>
> On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> Hi TC,****
>
> Thanks, this is a big help in understanding your setup.****
>
> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
> communicate with the browser, and whether it expects the browser to cache
> the resulting tickets (in cookies?)  I will have to do some research and
> get back to you on that.****
>
> Basically, security for a Wiki is usually handled by the Wiki, but since
> you've put added auth in front of it by going through mod-auth-kerb, it's
> something that the Wiki connector would have to understand (and emulate
> your browser) in order to implement.  So it does not likely support this
> right now.  It may be relatively easy to do or it may be a challenge -
> we'll see.  I would also be somewhat concerned that it may not possible to
> actually reach the API urls through Apache; that would make everything moot
> if it were true.  Could you confirm that you can visit API urls through
> your Apache setup?****
>
> Karl****
>
>  ****
>
> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:**
> **
>
> Hi there,****
>
> I'm trying to connect ManifoldCF to an internal wiki at my company. The
> ManifoldCF wiki connector supplies a username and password field for the
> wiki api, however, at my company, a username and password is required to
> connect to the apache server running the wiki site, and after that
> authentication takes place, those credentials are passed on to the wiki api.
> ****
>
>  ****
>
> So, essentially, I need a way to have ManifoldCF pass my windows
> credentials on when trying to make its connection. Using the api login
> fields does not work.****
>
>  ****
>
> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
> is that this module does use Basic Auth to communicate with the browser.**
> **
>
>  ****
>
> Is there anything we can to make ManifoldCF authenticate in this scenario?
> ****
>
>  ****
>
> Thanks,****
>
>  ****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>
> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
> (Common setup and support tasks)****
>
> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
> ****
>
> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
> ****
>
> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
> documentation)  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>
> ** **
>
> ** **
>

RE: ManifoldCF and Kerberos/Basic Authentication

Posted by TC Tobin-Campbell <TC...@epic.com>.
Apologies for the delay here Karl. I was able to get this up and running, and the authentication is working. Thanks for getting that in so quickly!

I do have a new issue though. I have an output connection to Elasticsearch setup for this job.

I can see that the crawler is in fact crawling the wiki, and the fetches are all working great. However, it doesn't seem to be attempting to send the pages to the index.

[cid:image001.png@01CE6363.74299F40]

I'm not seeing anything in the elasticsearch logs, so it appears we're just not sending anything to Elasticsearch. Could this be related to the change you made? Or is this a completely separate problem?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Friday, May 24, 2013 12:50 PM
To: user@manifoldcf.apache.org
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

I had a second so I finished this.  Trunk now has support for basic auth.  You enter the credentials on the server tab underneath the API credentials.  Please give it a try and let me know if it works for you.

Karl

On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com>> wrote:
CONNECTORS-692.  I will probably look at this over the weekend.
Karl

On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Unless I'm very much mistaken, there are no Apache kerberos session cookies being used on your site, so it should be a straightforward matter to include basic auth credentials to your Apache mod-auth-kerb module for all pages during crawling.
I'll create a ticket for this.

Karl

On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi Karl,
Here's what I know so far.

Our module is configured to use two auth methods: Negotiate and Basic.  In most cases, we use Negotiate, but I'm guessing you'd prefer Basic.

Here's an example header.

GET / HTTP/1.1
Host: wiki.epic.com<http://wiki.epic.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC; .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC; wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5; wiki_pensieveUserName=Lziobro; wiki_pensieveToken=********************be3a3a990a8a
Connection: keep-alive
Authorization: Basic bHppb**************xMjM0   <-I've censored this line so you cannot get my password

If I'm understanding you correctly, there's no way to accomplish this currently? Or, is there some workaround we could implement?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

From: Karl Wright [mailto:daddywri@gmail.com<ma...@gmail.com>]
Sent: Thursday, May 16, 2013 12:05 PM
To: user@manifoldcf.apache.org<ma...@manifoldcf.apache.org>
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,

Apparently mod-auth-kerb can be configured in a number of different ways.  But if yours will work with basic auth, we can just transmit the credentials each time.  It will be relatively slow because mod-auth-kerb will then need to talk to the kdc on each page fetch, but it should work.  Better yet would be if Apache set a browser cookie containing your tickets, which it knew how to interpret if returned - but I don't see any Google evidence that mod-auth-kerb is capable of that.  But either of these two approaches we could readily implement.
FWIW, the standard way to work with kerberos is for you to actually have tickets already kinit'd and installed on your machine.  Your browser then picks up those tickets and transmits them to the Wiki server (I presume in a header that mod-auth-kerb knows about), and the kdc does not need to be involved.  But initializing that kind of ticket store, and managing the associated kinit requests when necessary, are beyond the scope of any connector we've so far done, so if we had to go that way, that would effectively make this proposal a Research Project.
What would be great to know in advance is how exactly your browser interacts with your Apache server.  Are you familiar with the process of getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark (windows) in order to capture the packet traffic between a browser session and your Apache server, to see exactly what is happening.  Start by shutting down all your browser windows, so there is no in-memory state, and then start the capture and browse to a part of the wiki that is secured by mod-auth-kerb.  We'd want to see if cookies get set, or if any special headers get transmitted by your browser (other than the standard Basic Auth "Authentication" headers).  If the exchange is protected by SSL, then you'll have to use FireFox and use a plugin called LiveHeaders to see what is going on instead.
Please let me know what you find.
Karl


On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Thanks, this is a big help in understanding your setup.
I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to communicate with the browser, and whether it expects the browser to cache the resulting tickets (in cookies?)  I will have to do some research and get back to you on that.
Basically, security for a Wiki is usually handled by the Wiki, but since you've put added auth in front of it by going through mod-auth-kerb, it's something that the Wiki connector would have to understand (and emulate your browser) in order to implement.  So it does not likely support this right now.  It may be relatively easy to do or it may be a challenge - we'll see.  I would also be somewhat concerned that it may not possible to actually reach the API urls through Apache; that would make everything moot if it were true.  Could you confirm that you can visit API urls through your Apache setup?
Karl

On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi there,
I'm trying to connect ManifoldCF to an internal wiki at my company. The ManifoldCF wiki connector supplies a username and password field for the wiki api, however, at my company, a username and password is required to connect to the apache server running the wiki site, and after that authentication takes place, those credentials are passed on to the wiki api.

So, essentially, I need a way to have ManifoldCF pass my windows credentials on when trying to make its connection. Using the api login fields does not work.

We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html> (AuthType Kerberos).  My understanding based on that linked documentation is that this module does use Basic Auth to communicate with the browser.

Is there anything we can to make ManifoldCF authenticate in this scenario?

Thanks,


TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

Sherlock<https://sherlock.epic.com/> (Issue tracking)
Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx> (Common setup and support tasks)
Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch> (Epic reports documentation)
Nova<https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx> (Release note management)
Galaxy<https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic documentation)







Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
I had a second so I finished this.  Trunk now has support for basic auth.
You enter the credentials on the server tab underneath the API
credentials.  Please give it a try and let me know if it works for you.

Karl



On Fri, May 24, 2013 at 11:28 AM, Karl Wright <da...@gmail.com> wrote:

> CONNECTORS-692.  I will probably look at this over the weekend.
>
> Karl
>
>
> On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com> wrote:
>
>> Hi TC,
>>
>> Unless I'm very much mistaken, there are no Apache kerberos session
>> cookies being used on your site, so it should be a straightforward matter
>> to include basic auth credentials to your Apache mod-auth-kerb module for
>> all pages during crawling.
>>
>> I'll create a ticket for this.
>>
>> Karl
>>
>>
>>
>> On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com> wrote:
>>
>>>  Hi Karl,****
>>>
>>> Here’s what I know so far.****
>>>
>>> ** **
>>>
>>> Our module is configured to use two auth methods: Negotiate and Basic.
>>> In most cases, we use Negotiate, but I’m guessing you’d prefer Basic.***
>>> *
>>>
>>> ** **
>>>
>>> Here’s an example header.****
>>>
>>> ** **
>>>
>>> GET / HTTP/1.1****
>>>
>>> Host: wiki.epic.com****
>>>
>>> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101
>>> Firefox/20.0****
>>>
>>> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8*
>>> ***
>>>
>>> Accept-Language: en-US,en;q=0.5****
>>>
>>> Accept-Encoding: gzip, deflate****
>>>
>>> Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC;
>>> .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC;
>>> wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5;
>>> wiki_pensieveUserName=Lziobro;
>>> wiki_pensieveToken=********************be3a3a990a8a****
>>>
>>> Connection: keep-alive****
>>>
>>> Authorization: Basic bHppb**************xMjM0   <-I've censored this
>>> line so you cannot get my password****
>>>
>>> ** **
>>>
>>> If I’m understanding you correctly, there’s no way to accomplish this
>>> currently? Or, is there some workaround we could implement? ****
>>>
>>> ** **
>>>
>>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>>> 271-9000 ****
>>>
>>> ** **
>>>
>>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>>> *Sent:* Thursday, May 16, 2013 12:05 PM
>>> *To:* user@manifoldcf.apache.org
>>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>>
>>> ** **
>>>
>>> Hi TC,
>>>
>>> Apparently mod-auth-kerb can be configured in a number of different
>>> ways.  But if yours will work with basic auth, we can just transmit the
>>> credentials each time.  It will be relatively slow because mod-auth-kerb
>>> will then need to talk to the kdc on each page fetch, but it should work.
>>> Better yet would be if Apache set a browser cookie containing your tickets,
>>> which it knew how to interpret if returned - but I don't see any Google
>>> evidence that mod-auth-kerb is capable of that.  But either of these two
>>> approaches we could readily implement.****
>>>
>>> FWIW, the standard way to work with kerberos is for you to actually have
>>> tickets already kinit'd and installed on your machine.  Your browser then
>>> picks up those tickets and transmits them to the Wiki server (I presume in
>>> a header that mod-auth-kerb knows about), and the kdc does not need to be
>>> involved.  But initializing that kind of ticket store, and managing the
>>> associated kinit requests when necessary, are beyond the scope of any
>>> connector we've so far done, so if we had to go that way, that would
>>> effectively make this proposal a Research Project.****
>>>
>>> What would be great to know in advance is how exactly your browser
>>> interacts with your Apache server.  Are you familiar with the process of
>>> getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
>>> (windows) in order to capture the packet traffic between a browser session
>>> and your Apache server, to see exactly what is happening.  Start by
>>> shutting down all your browser windows, so there is no in-memory state, and
>>> then start the capture and browse to a part of the wiki that is secured by
>>> mod-auth-kerb.  We'd want to see if cookies get set, or if any special
>>> headers get transmitted by your browser (other than the standard Basic Auth
>>> "Authentication" headers).  If the exchange is protected by SSL, then
>>> you'll have to use FireFox and use a plugin called LiveHeaders to see what
>>> is going on instead.****
>>>
>>> Please let me know what you find.****
>>>
>>> Karl****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com>
>>> wrote:****
>>>
>>> Hi TC,****
>>>
>>> Thanks, this is a big help in understanding your setup.****
>>>
>>> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
>>> communicate with the browser, and whether it expects the browser to cache
>>> the resulting tickets (in cookies?)  I will have to do some research and
>>> get back to you on that.****
>>>
>>> Basically, security for a Wiki is usually handled by the Wiki, but since
>>> you've put added auth in front of it by going through mod-auth-kerb, it's
>>> something that the Wiki connector would have to understand (and emulate
>>> your browser) in order to implement.  So it does not likely support this
>>> right now.  It may be relatively easy to do or it may be a challenge -
>>> we'll see.  I would also be somewhat concerned that it may not possible to
>>> actually reach the API urls through Apache; that would make everything moot
>>> if it were true.  Could you confirm that you can visit API urls through
>>> your Apache setup?****
>>>
>>> Karl****
>>>
>>> ** **
>>>
>>> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:
>>> ****
>>>
>>> Hi there,****
>>>
>>> I'm trying to connect ManifoldCF to an internal wiki at my company. The
>>> ManifoldCF wiki connector supplies a username and password field for the
>>> wiki api, however, at my company, a username and password is required to
>>> connect to the apache server running the wiki site, and after that
>>> authentication takes place, those credentials are passed on to the wiki api.
>>> ****
>>>
>>>  ****
>>>
>>> So, essentially, I need a way to have ManifoldCF pass my windows
>>> credentials on when trying to make its connection. Using the api login
>>> fields does not work.****
>>>
>>>  ****
>>>
>>> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
>>> is that this module does use Basic Auth to communicate with the browser.
>>> ****
>>>
>>>  ****
>>>
>>> Is there anything we can to make ManifoldCF authenticate in this
>>> scenario? ****
>>>
>>>  ****
>>>
>>> Thanks,****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>>> 271-9000 ****
>>>
>>>  ****
>>>
>>> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>>>
>>> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
>>> (Common setup and support tasks)****
>>>
>>> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
>>> ****
>>>
>>> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
>>> ****
>>>
>>> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
>>> documentation)  ****
>>>
>>>  ****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>
>>
>

Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
CONNECTORS-692.  I will probably look at this over the weekend.

Karl


On Fri, May 24, 2013 at 11:26 AM, Karl Wright <da...@gmail.com> wrote:

> Hi TC,
>
> Unless I'm very much mistaken, there are no Apache kerberos session
> cookies being used on your site, so it should be a straightforward matter
> to include basic auth credentials to your Apache mod-auth-kerb module for
> all pages during crawling.
>
> I'll create a ticket for this.
>
> Karl
>
>
>
> On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com> wrote:
>
>>  Hi Karl,****
>>
>> Here’s what I know so far.****
>>
>> ** **
>>
>> Our module is configured to use two auth methods: Negotiate and Basic.
>> In most cases, we use Negotiate, but I’m guessing you’d prefer Basic.****
>>
>> ** **
>>
>> Here’s an example header.****
>>
>> ** **
>>
>> GET / HTTP/1.1****
>>
>> Host: wiki.epic.com****
>>
>> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101
>> Firefox/20.0****
>>
>> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8**
>> **
>>
>> Accept-Language: en-US,en;q=0.5****
>>
>> Accept-Encoding: gzip, deflate****
>>
>> Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC;
>> .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC;
>> wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5;
>> wiki_pensieveUserName=Lziobro;
>> wiki_pensieveToken=********************be3a3a990a8a****
>>
>> Connection: keep-alive****
>>
>> Authorization: Basic bHppb**************xMjM0   <-I've censored this line
>> so you cannot get my password****
>>
>> ** **
>>
>> If I’m understanding you correctly, there’s no way to accomplish this
>> currently? Or, is there some workaround we could implement? ****
>>
>> ** **
>>
>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>> 271-9000 ****
>>
>> ** **
>>
>> *From:* Karl Wright [mailto:daddywri@gmail.com]
>> *Sent:* Thursday, May 16, 2013 12:05 PM
>> *To:* user@manifoldcf.apache.org
>> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>>
>> ** **
>>
>> Hi TC,
>>
>> Apparently mod-auth-kerb can be configured in a number of different
>> ways.  But if yours will work with basic auth, we can just transmit the
>> credentials each time.  It will be relatively slow because mod-auth-kerb
>> will then need to talk to the kdc on each page fetch, but it should work.
>> Better yet would be if Apache set a browser cookie containing your tickets,
>> which it knew how to interpret if returned - but I don't see any Google
>> evidence that mod-auth-kerb is capable of that.  But either of these two
>> approaches we could readily implement.****
>>
>> FWIW, the standard way to work with kerberos is for you to actually have
>> tickets already kinit'd and installed on your machine.  Your browser then
>> picks up those tickets and transmits them to the Wiki server (I presume in
>> a header that mod-auth-kerb knows about), and the kdc does not need to be
>> involved.  But initializing that kind of ticket store, and managing the
>> associated kinit requests when necessary, are beyond the scope of any
>> connector we've so far done, so if we had to go that way, that would
>> effectively make this proposal a Research Project.****
>>
>> What would be great to know in advance is how exactly your browser
>> interacts with your Apache server.  Are you familiar with the process of
>> getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
>> (windows) in order to capture the packet traffic between a browser session
>> and your Apache server, to see exactly what is happening.  Start by
>> shutting down all your browser windows, so there is no in-memory state, and
>> then start the capture and browse to a part of the wiki that is secured by
>> mod-auth-kerb.  We'd want to see if cookies get set, or if any special
>> headers get transmitted by your browser (other than the standard Basic Auth
>> "Authentication" headers).  If the exchange is protected by SSL, then
>> you'll have to use FireFox and use a plugin called LiveHeaders to see what
>> is going on instead.****
>>
>> Please let me know what you find.****
>>
>> Karl****
>>
>> ** **
>>
>> ** **
>>
>> On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com> wrote:
>> ****
>>
>> Hi TC,****
>>
>> Thanks, this is a big help in understanding your setup.****
>>
>> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
>> communicate with the browser, and whether it expects the browser to cache
>> the resulting tickets (in cookies?)  I will have to do some research and
>> get back to you on that.****
>>
>> Basically, security for a Wiki is usually handled by the Wiki, but since
>> you've put added auth in front of it by going through mod-auth-kerb, it's
>> something that the Wiki connector would have to understand (and emulate
>> your browser) in order to implement.  So it does not likely support this
>> right now.  It may be relatively easy to do or it may be a challenge -
>> we'll see.  I would also be somewhat concerned that it may not possible to
>> actually reach the API urls through Apache; that would make everything moot
>> if it were true.  Could you confirm that you can visit API urls through
>> your Apache setup?****
>>
>> Karl****
>>
>> ** **
>>
>> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:*
>> ***
>>
>> Hi there,****
>>
>> I'm trying to connect ManifoldCF to an internal wiki at my company. The
>> ManifoldCF wiki connector supplies a username and password field for the
>> wiki api, however, at my company, a username and password is required to
>> connect to the apache server running the wiki site, and after that
>> authentication takes place, those credentials are passed on to the wiki api.
>> ****
>>
>>  ****
>>
>> So, essentially, I need a way to have ManifoldCF pass my windows
>> credentials on when trying to make its connection. Using the api login
>> fields does not work.****
>>
>>  ****
>>
>> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
>> is that this module does use Basic Auth to communicate with the browser.*
>> ***
>>
>>  ****
>>
>> Is there anything we can to make ManifoldCF authenticate in this
>> scenario? ****
>>
>>  ****
>>
>> Thanks,****
>>
>>  ****
>>
>>  ****
>>
>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>> 271-9000 ****
>>
>>  ****
>>
>> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>>
>> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
>> (Common setup and support tasks)****
>>
>> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
>> ****
>>
>> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
>> ****
>>
>> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
>> documentation)  ****
>>
>>  ****
>>
>> ** **
>>
>> ** **
>>
>
>

Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
Hi TC,

Unless I'm very much mistaken, there are no Apache kerberos session cookies
being used on your site, so it should be a straightforward matter to
include basic auth credentials to your Apache mod-auth-kerb module for all
pages during crawling.

I'll create a ticket for this.

Karl



On Fri, May 24, 2013 at 11:14 AM, TC Tobin-Campbell <TC...@epic.com> wrote:

>  Hi Karl,****
>
> Here’s what I know so far.****
>
> ** **
>
> Our module is configured to use two auth methods: Negotiate and Basic.  In
> most cases, we use Negotiate, but I’m guessing you’d prefer Basic.****
>
> ** **
>
> Here’s an example header.****
>
> ** **
>
> GET / HTTP/1.1****
>
> Host: wiki.epic.com****
>
> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101
> Firefox/20.0****
>
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8***
> *
>
> Accept-Language: en-US,en;q=0.5****
>
> Accept-Encoding: gzip, deflate****
>
> Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC;
> .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC;
> wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5;
> wiki_pensieveUserName=Lziobro;
> wiki_pensieveToken=********************be3a3a990a8a****
>
> Connection: keep-alive****
>
> Authorization: Basic bHppb**************xMjM0   <-I've censored this line
> so you cannot get my password****
>
> ** **
>
> If I’m understanding you correctly, there’s no way to accomplish this
> currently? Or, is there some workaround we could implement? ****
>
> ** **
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
> ** **
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Thursday, May 16, 2013 12:05 PM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: ManifoldCF and Kerberos/Basic Authentication****
>
> ** **
>
> Hi TC,
>
> Apparently mod-auth-kerb can be configured in a number of different ways.
> But if yours will work with basic auth, we can just transmit the
> credentials each time.  It will be relatively slow because mod-auth-kerb
> will then need to talk to the kdc on each page fetch, but it should work.
> Better yet would be if Apache set a browser cookie containing your tickets,
> which it knew how to interpret if returned - but I don't see any Google
> evidence that mod-auth-kerb is capable of that.  But either of these two
> approaches we could readily implement.****
>
> FWIW, the standard way to work with kerberos is for you to actually have
> tickets already kinit'd and installed on your machine.  Your browser then
> picks up those tickets and transmits them to the Wiki server (I presume in
> a header that mod-auth-kerb knows about), and the kdc does not need to be
> involved.  But initializing that kind of ticket store, and managing the
> associated kinit requests when necessary, are beyond the scope of any
> connector we've so far done, so if we had to go that way, that would
> effectively make this proposal a Research Project.****
>
> What would be great to know in advance is how exactly your browser
> interacts with your Apache server.  Are you familiar with the process of
> getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
> (windows) in order to capture the packet traffic between a browser session
> and your Apache server, to see exactly what is happening.  Start by
> shutting down all your browser windows, so there is no in-memory state, and
> then start the capture and browse to a part of the wiki that is secured by
> mod-auth-kerb.  We'd want to see if cookies get set, or if any special
> headers get transmitted by your browser (other than the standard Basic Auth
> "Authentication" headers).  If the exchange is protected by SSL, then
> you'll have to use FireFox and use a plugin called LiveHeaders to see what
> is going on instead.****
>
> Please let me know what you find.****
>
> Karl****
>
> ** **
>
> ** **
>
> On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com> wrote:*
> ***
>
> Hi TC,****
>
> Thanks, this is a big help in understanding your setup.****
>
> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
> communicate with the browser, and whether it expects the browser to cache
> the resulting tickets (in cookies?)  I will have to do some research and
> get back to you on that.****
>
> Basically, security for a Wiki is usually handled by the Wiki, but since
> you've put added auth in front of it by going through mod-auth-kerb, it's
> something that the Wiki connector would have to understand (and emulate
> your browser) in order to implement.  So it does not likely support this
> right now.  It may be relatively easy to do or it may be a challenge -
> we'll see.  I would also be somewhat concerned that it may not possible to
> actually reach the API urls through Apache; that would make everything moot
> if it were true.  Could you confirm that you can visit API urls through
> your Apache setup?****
>
> Karl****
>
> ** **
>
> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:**
> **
>
> Hi there,****
>
> I'm trying to connect ManifoldCF to an internal wiki at my company. The
> ManifoldCF wiki connector supplies a username and password field for the
> wiki api, however, at my company, a username and password is required to
> connect to the apache server running the wiki site, and after that
> authentication takes place, those credentials are passed on to the wiki api.
> ****
>
>  ****
>
> So, essentially, I need a way to have ManifoldCF pass my windows
> credentials on when trying to make its connection. Using the api login
> fields does not work.****
>
>  ****
>
> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
> is that this module does use Basic Auth to communicate with the browser.**
> **
>
>  ****
>
> Is there anything we can to make ManifoldCF authenticate in this scenario?
> ****
>
>  ****
>
> Thanks,****
>
>  ****
>
>  ****
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
>  ****
>
> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>
> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
> (Common setup and support tasks)****
>
> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
> ****
>
> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
> ****
>
> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
> documentation)  ****
>
>  ****
>
> ** **
>
> ** **
>

RE: ManifoldCF and Kerberos/Basic Authentication

Posted by TC Tobin-Campbell <TC...@epic.com>.
Hi Karl,
Here's what I know so far.

Our module is configured to use two auth methods: Negotiate and Basic.  In most cases, we use Negotiate, but I'm guessing you'd prefer Basic.

Here's an example header.

GET / HTTP/1.1
Host: wiki.epic.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: wooTracker=QOMVLXDIC6OGOUXMGST1O54HYW573NNC; .EPICASPXAUTHQA=FA94C945F613DACB9341384EBB1C28C52CFC52558E606FC2F880DD5BA811BE7E94301C7A0A1990FAC2E119AABB8591EC975059A2B8169BEA9FC525D0577F3C0EC56DC29C28880D23E0790AD890024FB57A338981606774259656B6971556645B095778115ADFE6B9B434970869C4B546A59A61B2CDEF0C0A5B23E80BB1D1E3D3D567E4C113D9E7B32D137FDEE65E51AC7B3DF5A04F9767FA7C8723140AC274E2695D939C716D9B49CCF0F1D79967CE902781BC8CB5A253E3FB39896021ABB4F2FCA01D0E138E00A8176EB2ECE5B0204597C21969C8F501A9EDE4D27694E699777BB179CD329748B3341A4BBF3085C447E2B55BE97E27D23E415C23F1A53A33A15551D9AE6B5CF255C3B8ECE038A481B8291A8EC46F0EA8730C3658DABC5BE7557C6659321677D8F4586CA79D6D5CCCB1C5687F9077A6CD96487EAEF417A1411C2F62BE6FF57DD1F515B16406CF4B0B9460EFB9BCB46F8F7E47FCB8E8CE4FAE2EB92F20DECEF2BBF1D95C80597BE935A031CD158593EFA2E446FA6FAFDD2B4E691CD8569B7D60DAD4378EBD6A138E23F0F616FD01443647D9A6F852AEF773A69580390496748241739C0DDF2791B1C2143B7E9E976754056B70EB846DAE1D7018CC40026F862ABF613D89C8D31B2C468B81D0C18C37697E8BA5D415F8DFCA37AF2935AAD0238ED6F652E24062849EC8E0C4651C4FB8BB9DD11BE4F8639AD690C791868B8E94ADB626C9B1BD8E334F675E664A03DC; wiki_pensieve_session=j1pcf1746js1442m7p92hag9g1; wiki_pensieveUserID=5; wiki_pensieveUserName=Lziobro; wiki_pensieveToken=********************be3a3a990a8a
Connection: keep-alive
Authorization: Basic bHppb**************xMjM0   <-I've censored this line so you cannot get my password

If I'm understanding you correctly, there's no way to accomplish this currently? Or, is there some workaround we could implement?

TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Thursday, May 16, 2013 12:05 PM
To: user@manifoldcf.apache.org
Subject: Re: ManifoldCF and Kerberos/Basic Authentication

Hi TC,

Apparently mod-auth-kerb can be configured in a number of different ways.  But if yours will work with basic auth, we can just transmit the credentials each time.  It will be relatively slow because mod-auth-kerb will then need to talk to the kdc on each page fetch, but it should work.  Better yet would be if Apache set a browser cookie containing your tickets, which it knew how to interpret if returned - but I don't see any Google evidence that mod-auth-kerb is capable of that.  But either of these two approaches we could readily implement.
FWIW, the standard way to work with kerberos is for you to actually have tickets already kinit'd and installed on your machine.  Your browser then picks up those tickets and transmits them to the Wiki server (I presume in a header that mod-auth-kerb knows about), and the kdc does not need to be involved.  But initializing that kind of ticket store, and managing the associated kinit requests when necessary, are beyond the scope of any connector we've so far done, so if we had to go that way, that would effectively make this proposal a Research Project.
What would be great to know in advance is how exactly your browser interacts with your Apache server.  Are you familiar with the process of getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark (windows) in order to capture the packet traffic between a browser session and your Apache server, to see exactly what is happening.  Start by shutting down all your browser windows, so there is no in-memory state, and then start the capture and browse to a part of the wiki that is secured by mod-auth-kerb.  We'd want to see if cookies get set, or if any special headers get transmitted by your browser (other than the standard Basic Auth "Authentication" headers).  If the exchange is protected by SSL, then you'll have to use FireFox and use a plugin called LiveHeaders to see what is going on instead.
Please let me know what you find.
Karl


On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com>> wrote:
Hi TC,
Thanks, this is a big help in understanding your setup.
I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to communicate with the browser, and whether it expects the browser to cache the resulting tickets (in cookies?)  I will have to do some research and get back to you on that.
Basically, security for a Wiki is usually handled by the Wiki, but since you've put added auth in front of it by going through mod-auth-kerb, it's something that the Wiki connector would have to understand (and emulate your browser) in order to implement.  So it does not likely support this right now.  It may be relatively easy to do or it may be a challenge - we'll see.  I would also be somewhat concerned that it may not possible to actually reach the API urls through Apache; that would make everything moot if it were true.  Could you confirm that you can visit API urls through your Apache setup?
Karl

On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com>> wrote:
Hi there,
I'm trying to connect ManifoldCF to an internal wiki at my company. The ManifoldCF wiki connector supplies a username and password field for the wiki api, however, at my company, a username and password is required to connect to the apache server running the wiki site, and after that authentication takes place, those credentials are passed on to the wiki api.

So, essentially, I need a way to have ManifoldCF pass my windows credentials on when trying to make its connection. Using the api login fields does not work.

We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html> (AuthType Kerberos).  My understanding based on that linked documentation is that this module does use Basic Auth to communicate with the browser.

Is there anything we can to make ManifoldCF authenticate in this scenario?

Thanks,


TC Tobin-Campbell | Technical Services | Willow | Epic  | (608) 271-9000<tel:%28608%29%20271-9000>

Sherlock<https://sherlock.epic.com/> (Issue tracking)
Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx> (Common setup and support tasks)
Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch> (Epic reports documentation)
Nova<https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx> (Release note management)
Galaxy<https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic documentation)




Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
Hi TC,

Apparently mod-auth-kerb can be configured in a number of different ways.
But if yours will work with basic auth, we can just transmit the
credentials each time.  It will be relatively slow because mod-auth-kerb
will then need to talk to the kdc on each page fetch, but it should work.
Better yet would be if Apache set a browser cookie containing your tickets,
which it knew how to interpret if returned - but I don't see any Google
evidence that mod-auth-kerb is capable of that.  But either of these two
approaches we could readily implement.

FWIW, the standard way to work with kerberos is for you to actually have
tickets already kinit'd and installed on your machine.  Your browser then
picks up those tickets and transmits them to the Wiki server (I presume in
a header that mod-auth-kerb knows about), and the kdc does not need to be
involved.  But initializing that kind of ticket store, and managing the
associated kinit requests when necessary, are beyond the scope of any
connector we've so far done, so if we had to go that way, that would
effectively make this proposal a Research Project.

What would be great to know in advance is how exactly your browser
interacts with your Apache server.  Are you familiar with the process of
getting a packet dump?  You'd use a tool like tcpdump (Unix) or wireshark
(windows) in order to capture the packet traffic between a browser session
and your Apache server, to see exactly what is happening.  Start by
shutting down all your browser windows, so there is no in-memory state, and
then start the capture and browse to a part of the wiki that is secured by
mod-auth-kerb.  We'd want to see if cookies get set, or if any special
headers get transmitted by your browser (other than the standard Basic Auth
"Authentication" headers).  If the exchange is protected by SSL, then
you'll have to use FireFox and use a plugin called LiveHeaders to see what
is going on instead.

Please let me know what you find.
Karl




On Thu, May 16, 2013 at 12:37 PM, Karl Wright <da...@gmail.com> wrote:

> Hi TC,
>
> Thanks, this is a big help in understanding your setup.
>
> I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
> communicate with the browser, and whether it expects the browser to cache
> the resulting tickets (in cookies?)  I will have to do some research and
> get back to you on that.
>
> Basically, security for a Wiki is usually handled by the Wiki, but since
> you've put added auth in front of it by going through mod-auth-kerb, it's
> something that the Wiki connector would have to understand (and emulate
> your browser) in order to implement.  So it does not likely support this
> right now.  It may be relatively easy to do or it may be a challenge -
> we'll see.  I would also be somewhat concerned that it may not possible to
> actually reach the API urls through Apache; that would make everything moot
> if it were true.  Could you confirm that you can visit API urls through
> your Apache setup?
>
> Karl
>
>
>
> On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:
>
>>  Hi there,****
>>
>> I'm trying to connect ManifoldCF to an internal wiki at my company. The
>> ManifoldCF wiki connector supplies a username and password field for the
>> wiki api, however, at my company, a username and password is required to
>> connect to the apache server running the wiki site, and after that
>> authentication takes place, those credentials are passed on to the wiki api.
>> ****
>>
>> ** **
>>
>> So, essentially, I need a way to have ManifoldCF pass my windows
>> credentials on when trying to make its connection. Using the api login
>> fields does not work.****
>>
>> ** **
>>
>> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
>> is that this module does use Basic Auth to communicate with the browser.*
>> ***
>>
>> ** **
>>
>> Is there anything we can to make ManifoldCF authenticate in this
>> scenario? ****
>>
>> ** **
>>
>> Thanks,****
>>
>> ** **
>>
>> ** **
>>
>> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
>> 271-9000 ****
>>
>> ** **
>>
>> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>>
>> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
>> (Common setup and support tasks)****
>>
>> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
>> ****
>>
>> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
>> ****
>>
>> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
>> documentation)  ****
>>
>> ** **
>>
>
>

Re: ManifoldCF and Kerberos/Basic Authentication

Posted by Karl Wright <da...@gmail.com>.
Hi TC,

Thanks, this is a big help in understanding your setup.

I don't know enough about exactly *how* mod-auth-kerb uses Basic Auth to
communicate with the browser, and whether it expects the browser to cache
the resulting tickets (in cookies?)  I will have to do some research and
get back to you on that.

Basically, security for a Wiki is usually handled by the Wiki, but since
you've put added auth in front of it by going through mod-auth-kerb, it's
something that the Wiki connector would have to understand (and emulate
your browser) in order to implement.  So it does not likely support this
right now.  It may be relatively easy to do or it may be a challenge -
we'll see.  I would also be somewhat concerned that it may not possible to
actually reach the API urls through Apache; that would make everything moot
if it were true.  Could you confirm that you can visit API urls through
your Apache setup?

Karl



On Thu, May 16, 2013 at 12:21 PM, TC Tobin-Campbell <TC...@epic.com> wrote:

>  Hi there,****
>
> I'm trying to connect ManifoldCF to an internal wiki at my company. The
> ManifoldCF wiki connector supplies a username and password field for the
> wiki api, however, at my company, a username and password is required to
> connect to the apache server running the wiki site, and after that
> authentication takes place, those credentials are passed on to the wiki api.
> ****
>
> ** **
>
> So, essentially, I need a way to have ManifoldCF pass my windows
> credentials on when trying to make its connection. Using the api login
> fields does not work.****
>
> ** **
>
> We use Kerberos the Kerberos Module for Apache<http://modauthkerb.sourceforge.net/index.html>(AuthType Kerberos).  My understanding based on that linked documentation
> is that this module does use Basic Auth to communicate with the browser.**
> **
>
> ** **
>
> Is there anything we can to make ManifoldCF authenticate in this scenario?
> ****
>
> ** **
>
> Thanks,****
>
> ** **
>
> ** **
>
> *TC Tobin-Campbell *| Technical Services | Willow | *Epic*  | (608)
> 271-9000 ****
>
> ** **
>
> Sherlock <https://sherlock.epic.com/> (Issue tracking)****
>
> Analyst Toolkits<https://sites.epic.com/epiclib/epicdoc/Pages/analyst/default.aspx>
> (Common setup and support tasks)****
>
> Report Repository<https://documentation.epic.com/DataHandbook/Reports/ReportSearch>(Epic reports documentation)
> ****
>
> Nova <https://nova.epic.com/Login/GetOrg.aspx?returnUrl=%2fdefault.aspx>(Release note management)
> ****
>
> Galaxy <https://documentation.epic.com/OnlineDoc/Documents.aspx> (Epic
> documentation)  ****
>
> ** **
>