You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@archiva.apache.org by Marc Lustig <ml...@marclustig.com> on 2010/03/01 12:14:32 UTC

MRM-1351: please advise

Hi there,

I have just created MRM-1351 and have a couple of questions:

1) supported protocols
Maven Deploy Plugin supports next to DAV also FTP- and SSH-based artifact
deployment.
Which of those additional protocols does Archiva support?
Accordingly, which is the proper place for a generic implementation of the
hashcode-based artifact validation?

2) getting the hashcode of the local repo
The maven-deploy-plugin does not support to specify a parameter like
"sha-hashcode", neither for the deploy nor the deploy-file subgoal. How then
could Archiva possibly get the hashcode of the local repo from?
This appears to me a major pre-condition to implement this ticket.


cheers
Marc

-- 
View this message in context: http://old.nabble.com/MRM-1351%3A-please-advise-tp27742269p27742269.html
Sent from the archiva-dev mailing list archive at Nabble.com.


Re: MRM-1351: please advise

Posted by Deng Ching <oc...@apache.org>.
Hi Marc,

Marc Lustig wrote:
> >
> > What came to my mind now is that instead of sending the checksum as an
> > additional file, we could simply add the checksum as an HTTP header-entry
> > to the DAV-request that sends the artifact.
> >
>
> I have looked into the deploy-plugin (trunk), apparently in
> DefaultWagonManager.putRemoteFile() method there is already some logic
> implemented to add SHA-1 and MD5 hashes using addTransferListener() -
> although I would not bet this logic is working at all until I have testet
> it.
>
> Does Archiva on the other side grab those values from the DAV-request?
>

I don't think so.. IIRC, I think Archiva treats them as separate dav
requests.


> What API is Archiva using to handle the request - Wagon or some plain
> WebDAV
> ?
>

Archiva is using JackRabbit to handle dav requests..

Thanks,
Deng

Re: MRM-1351: please advise

Posted by Marc Lustig <ml...@marclustig.com>.


Marc Lustig wrote:
> 
> What came to my mind now is that instead of sending the checksum as an
> additional file, we could simply add the checksum as an HTTP header-entry
> to the DAV-request that sends the artifact.
> 

I have looked into the deploy-plugin (trunk), apparently in
DefaultWagonManager.putRemoteFile() method there is already some logic
implemented to add SHA-1 and MD5 hashes using addTransferListener() -
although I would not bet this logic is working at all until I have testet
it.

Does Archiva on the other side grab those values from the DAV-request?
What API is Archiva using to handle the request - Wagon or some plain WebDAV
?

-- 
View this message in context: http://old.nabble.com/MRM-1351%3A-please-advise-tp27742269p27754485.html
Sent from the archiva-dev mailing list archive at Nabble.com.


Re: MRM-1351: please advise

Posted by Brett Porter <br...@apache.org>.
On 05/03/2010, at 8:29 PM, Marc Lustig wrote:

>> Is this the problem you are receiving Marc?
>> http://jira.codehaus.org/browse/MRM-1356
>> 
>> 
> 
> I strongly doubt those users use Maven 3, we use Maven 2.x. So if this
> MRM-1356 is due to Maven3-specific logic, then the answer is "no" - our
> problem must have a different source.

It turned out not to be Maven 3, but anyone that for some reason used chunked encoding. I couldn't reproduce it either except to use curl.

> 
> Well the thing is we were not able to reproduce it. We had a few people of
> different projects complaining that Archiva deployed artifacts which proofed
> corrupt on download. This can result in severe impacts to the IT-processes
> that have been defined here. (We automatically create deployment-packages as
> tar-files by retrieving deployment-units (ear, etc.) from Archiva. Sending a
> corrupt deployment-package on stage P would result in major trouble, for
> both system-availability and people...)
> 
> We have some suspicion that the error was caused by the upload-timeout
> configured in tomcat. 
> But what really is frightening is the fact that the deploy-plugin apparently
> did not report an error - at least the result was BUILD SUCCESSFULL (return
> code 0). Such case must not occur.
> Apparently Archiva signaled "OK" in the DAV-response, although the artifact
> was places corruptly.
> 
> So, after all, what we need is to ensure such case cannot happen anymore.
> 
> First of all we should ensure that Archiva is really picking up the checksum
> that is already sent by Wagon in the DAV-request (or one of them) and use it
> to verify the artifact's integrity. Deng pointed out that this may not the
> case already:
> 
>>> Does Archiva on the other side grab those values from the DAV-request?
>>> 
> <I don't think so.. IIRC, I think Archiva treats them as separate dav
> requests. 

That's right, Maven sends the artifact, then it sends the checksum, then the POM, then its checksums.

> 
> Secondly, I suppose the whole workflow should be checked for consistency.
> In my understanding, the process should be like this:
> 
> 1. Maven/Wagon sends the artifact along with checksums (+ pom +metafiles)
> 2. Archiva receives the artifact and it in the managed repo
> 3. Archiva creates a checksum based on the file that already resides in the
> managed repo (not some temp. location)
> 4. Archiva compares this checksum with the checksum that came with Wagon
> 5.a if the checksums match, return OK as usual
> 5.b if the checksums do not match, return HTTP error (400?) with some
> meaningful error-string the header, and log an ERROR-message to logfile. the
> artifact is removed from the managed repo.
> 6. Maven deploy-plugin picks up the error and consequently reports a BUILD
> ERROR
> 
> Comments?

Yes, this would be a reasonable simple adjustment, though the artifact is temporarily present so that might cause some issues.

- Brett

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/





Re: MRM-1351: please advise

Posted by Marc Lustig <ml...@marclustig.com>.


brettporter wrote:
> 
> 
> On 03/03/2010, at 1:25 AM, Brett Porter wrote:
> 
>> 
>> I am interested in getting to the bottom of your core problem though.
>> Have you been able to try the extra debugging I added to see if it logs
>> the error causes?
>> 
> 
> 
> Is this the problem you are receiving Marc?
> http://jira.codehaus.org/browse/MRM-1356
> 
> 

I strongly doubt those users use Maven 3, we use Maven 2.x. So if this
MRM-1356 is due to Maven3-specific logic, then the answer is "no" - our
problem must have a different source.

Well the thing is we were not able to reproduce it. We had a few people of
different projects complaining that Archiva deployed artifacts which proofed
corrupt on download. This can result in severe impacts to the IT-processes
that have been defined here. (We automatically create deployment-packages as
tar-files by retrieving deployment-units (ear, etc.) from Archiva. Sending a
corrupt deployment-package on stage P would result in major trouble, for
both system-availability and people...)

We have some suspicion that the error was caused by the upload-timeout
configured in tomcat. 
But what really is frightening is the fact that the deploy-plugin apparently
did not report an error - at least the result was BUILD SUCCESSFULL (return
code 0). Such case must not occur.
Apparently Archiva signaled "OK" in the DAV-response, although the artifact
was places corruptly.

So, after all, what we need is to ensure such case cannot happen anymore.

First of all we should ensure that Archiva is really picking up the checksum
that is already sent by Wagon in the DAV-request (or one of them) and use it
to verify the artifact's integrity. Deng pointed out that this may not the
case already:

>>  Does Archiva on the other side grab those values from the DAV-request?
>>
<I don't think so.. IIRC, I think Archiva treats them as separate dav
requests. 

Secondly, I suppose the whole workflow should be checked for consistency.
In my understanding, the process should be like this:

1. Maven/Wagon sends the artifact along with checksums (+ pom +metafiles)
2. Archiva receives the artifact and it in the managed repo
3. Archiva creates a checksum based on the file that already resides in the
managed repo (not some temp. location)
4. Archiva compares this checksum with the checksum that came with Wagon
5.a if the checksums match, return OK as usual
5.b if the checksums do not match, return HTTP error (400?) with some
meaningful error-string the header, and log an ERROR-message to logfile. the
artifact is removed from the managed repo.
6. Maven deploy-plugin picks up the error and consequently reports a BUILD
ERROR

Comments?


-- 
View this message in context: http://old.nabble.com/MRM-1351%3A-please-advise-tp27742269p27791667.html
Sent from the archiva-dev mailing list archive at Nabble.com.


Re: MRM-1351: please advise

Posted by Brett Porter <br...@apache.org>.
On 03/03/2010, at 1:25 AM, Brett Porter wrote:

> 
> I am interested in getting to the bottom of your core problem though. Have you been able to try the extra debugging I added to see if it logs the error causes?
> 


Is this the problem you are receiving Marc?
http://jira.codehaus.org/browse/MRM-1356

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/





Re: MRM-1351: please advise

Posted by Brett Porter <br...@apache.org>.
On 02/03/2010, at 8:15 PM, Marc Lustig wrote:

> 
> 
> 
> brettporter wrote:
>> 
>> 
>> The main problem with this is that Wagon currently streams the upload, and
>> calculates the checksum as it goes, to upload as a separate file
>> afterwards (remembering that, I don't know why I thought the checksum went
>> first). Sending the checksum in a header would require reading a file
>> twice - not a big deal, but a fair change to the way it works right now.
>> 
>> It's not unreasonable, but it's probably not that necessary given that

...

> What came to my mind now is that instead of sending the checksum as an
> additional file, we could simply add the checksum as an HTTP header-entry to
> the DAV-request that sends the artifact.
> That way, the contract of the deploy-process will not be changed and we
> avoid compatibility issues with other repo-managers.
> 
> Should the checksum be created on the fly, or should it be read from the
> local-repo?
> Should one of md5 or sha1 be sent or both checksums?

The checksum is not in the local repository after 'install'. It is created on the fly by Wagon as I described above. The problem is, you are going to end up changing the way Wagon works to do it, or calculating the checksum twice (once at the start and once on the fly), which was the problem I mentioned above. It certainly isn't a huge problem, I just think it might be best to address it purely on the server side.

> 
> Regarding Archiva, only a minor change is needed. Instead of placing the
> file in the managed repo unverified, the checksum needs to be read from the
> HTTP-header and compared with a freshly generated checksum based on the file
> received. We will need to discuss the proper place to add the code.
> 
> How sounds that plan for you?

It is a cleaner solution, but I'm a bit concerned about the size of the change on the client side, and that the feature will only work with some clients and so not be very reliable.

I also don't think it will gain much over the content-length check - it is only going to discover additional edge cases where they are the same size but the checksum is wrong, which would be very unusual. It might help detect where a client flat-out gets the checksum wrong, but that doesn't seem to be the main concern here.

It won't help with the Maven bugs where the client sends the wrong checksum after the fact, getting back to your other message:

On 02/03/2010, at 9:54 PM, Marc Lustig wrote:

> 
> I have looked into the deploy-plugin (trunk), apparently in
> DefaultWagonManager.putRemoteFile() method there is already some logic
> implemented to add SHA-1 and MD5 hashes using addTransferListener() -
> although I would not bet this logic is working at all until I have testet
> it.

I'm quite certain they work, with the exception of the problems in the issues I originally pointed out where things get accidentally uploaded twice (and so the checksum is double-processed and incorrect). In that case it's the uploading twice that is wrong, not the checksum calculation. The solution above wouldn't help as you'd send the same (correct) header twice, but then the wrong checksum file afterwards.

I am interested in getting to the bottom of your core problem though. Have you been able to try the extra debugging I added to see if it logs the error causes?

Thanks,
Brett

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/





Re: MRM-1351: please advise

Posted by Marc Lustig <ml...@marclustig.com>.


brettporter wrote:
> 
> 
> On 02/03/2010, at 1:44 AM, Marc Lustig wrote:
> 
>> 
>> What we need is a process to automatically verify the integrity of an
>> artifact by uploading a local hashcode. Ideally, the verification takes
>> place in a single transaction (HTTP-request).
>> 
>> I suppose modifying the deploy:deploy goal is not practical, as it may
>> have
>> impacts for a wide range of Maven users, and the Maven leaders will
>> probably
>> veto against it.
> 
> The main problem with this is that Wagon currently streams the upload, and
> calculates the checksum as it goes, to upload as a separate file
> afterwards (remembering that, I don't know why I thought the checksum went
> first). Sending the checksum in a header would require reading a file
> twice - not a big deal, but a fair change to the way it works right now.
> 
> It's not unreasonable, but it's probably not that necessary given that
> other checks can find the problem. As we seem to have discovered on
> users@, the content length check is probably already triggering for you
> anyway. If it's an incorrectly uploaded checksum instead, that can be done
> without additional goals...
> 
>> But what do you think about adding a subgoal "verify" to the deploy
>> plugin?
>> 
>> That way, the following call would deploy and verify an artifact, using -
>> well - not a single transaction, but at least a single mvn-call:
>> "deploy:deploy deploy:verify"
>> 
>> The deploy:verify subgoal could presume that Maven has been configured to
>> create hash-codes in the local repo. So what deploy:verify could
>> basically
>> do is simply uploading an ordinary artifact .md5 or .sha1 using DAV.
>> 
>> Archiva identifies the verification task based on the file-suffix.
>> What will Archiva do: 
>> - compares the uploaded hashcode with the one that has been created
>> - in case the hashes match: return HTTP 200 (OK)
>> - in case the hashes do NOT match: all Archiva-artifacts for the given
>> version (jar, pom, hashes, xml, etc.) will be deleted. returned is HTTP
>> 400
>> (?) to indicate that the hashes did not match
>> 
>> The deploy:verify subgoal then outputs corresponding messages.
>> Failed verification should result in a BUILD ERROR message, of course.
> 
> Is there a reason this needs a separate goal? Couldn't Archiva do this
> same behaviour when the checksum is first uploaded as part of deploy?
> 
> This could be used to drive an "atomic deployment" of an artifact - all
> deployments go to a temporary location until a checksum-verified POM
> arrives, and if everything is valid it gets automatically pushed into the
> repository.
> 
> - Brett
> 

yes, the idea to commit a deployment as a single transaction ("atomic")
including the checksum (hashcode) should certainly be the goal, IMO.
What came to my mind now is that instead of sending the checksum as an
additional file, we could simply add the checksum as an HTTP header-entry to
the DAV-request that sends the artifact.
That way, the contract of the deploy-process will not be changed and we
avoid compatibility issues with other repo-managers.

Should the checksum be created on the fly, or should it be read from the
local-repo?
Should one of md5 or sha1 be sent or both checksums?

Regarding Archiva, only a minor change is needed. Instead of placing the
file in the managed repo unverified, the checksum needs to be read from the
HTTP-header and compared with a freshly generated checksum based on the file
received. We will need to discuss the proper place to add the code.

How sounds that plan for you?

-- 
View this message in context: http://old.nabble.com/MRM-1351%3A-please-advise-tp27742269p27753605.html
Sent from the archiva-dev mailing list archive at Nabble.com.


Re: MRM-1351: please advise

Posted by Brett Porter <br...@apache.org>.
On 02/03/2010, at 1:44 AM, Marc Lustig wrote:

> 
> What we need is a process to automatically verify the integrity of an
> artifact by uploading a local hashcode. Ideally, the verification takes
> place in a single transaction (HTTP-request).
> 
> I suppose modifying the deploy:deploy goal is not practical, as it may have
> impacts for a wide range of Maven users, and the Maven leaders will probably
> veto against it.

The main problem with this is that Wagon currently streams the upload, and calculates the checksum as it goes, to upload as a separate file afterwards (remembering that, I don't know why I thought the checksum went first). Sending the checksum in a header would require reading a file twice - not a big deal, but a fair change to the way it works right now.

It's not unreasonable, but it's probably not that necessary given that other checks can find the problem. As we seem to have discovered on users@, the content length check is probably already triggering for you anyway. If it's an incorrectly uploaded checksum instead, that can be done without additional goals...

> But what do you think about adding a subgoal "verify" to the deploy plugin?
> 
> That way, the following call would deploy and verify an artifact, using -
> well - not a single transaction, but at least a single mvn-call:
> "deploy:deploy deploy:verify"
> 
> The deploy:verify subgoal could presume that Maven has been configured to
> create hash-codes in the local repo. So what deploy:verify could basically
> do is simply uploading an ordinary artifact .md5 or .sha1 using DAV.
> 
> Archiva identifies the verification task based on the file-suffix.
> What will Archiva do: 
> - compares the uploaded hashcode with the one that has been created
> - in case the hashes match: return HTTP 200 (OK)
> - in case the hashes do NOT match: all Archiva-artifacts for the given
> version (jar, pom, hashes, xml, etc.) will be deleted. returned is HTTP 400
> (?) to indicate that the hashes did not match
> 
> The deploy:verify subgoal then outputs corresponding messages.
> Failed verification should result in a BUILD ERROR message, of course.

Is there a reason this needs a separate goal? Couldn't Archiva do this same behaviour when the checksum is first uploaded as part of deploy?

This could be used to drive an "atomic deployment" of an artifact - all deployments go to a temporary location until a checksum-verified POM arrives, and if everything is valid it gets automatically pushed into the repository.

- Brett

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/





Re: MRM-1351: please advise

Posted by Marc Lustig <ml...@marclustig.com>.


brettporter wrote:
> 
> 
> On 01/03/2010, at 10:14 PM, Marc Lustig wrote:
> 
>> 
>> Hi there,
>> 
>> I have just created MRM-1351 and have a couple of questions:
>> 
>> 1) supported protocols
>> Maven Deploy Plugin supports next to DAV also FTP- and SSH-based artifact
>> deployment.
>> Which of those additional protocols does Archiva support?
>> Accordingly, which is the proper place for a generic implementation of
>> the
>> hashcode-based artifact validation?
> 
> Neither natively - it does however scan the file-system, but it is not
> possible to reject it at this stage so the current behaviour of reporting
> the problem is appropriate.
> 
>> 
>> 2) getting the hashcode of the local repo
>> The maven-deploy-plugin does not support to specify a parameter like
>> "sha-hashcode", neither for the deploy nor the deploy-file subgoal. How
>> then
>> could Archiva possibly get the hashcode of the local repo from?
>> This appears to me a major pre-condition to implement this ticket.
> 
> Sorry, this was my mistake - I thought that Maven uploaded the checksum
> first so that it could be used as the basis for determining if the
> artifact was correct.
> 
> It seems the alternative might be needed, where upon uploading the
> checksum, if incorrect the checksum and the artifact are deleted. This can
> be considered reasonable behaviour as there is little risk of deleting a
> previously correct artifact (snapshot always deploy to a new timestamp,
> and releases should be blocked from redeployment, never reaching this
> option). On the downside, the artifact has already been correctly uploaded
> and if the checksum is never sent, it will be retained in the repository.
> However, this more closely matches your problem as it seems it is the
> checksums that are being uploaded incorrectly?
> 
> - Brett
> 

What we need is a process to automatically verify the integrity of an
artifact by uploading a local hashcode. Ideally, the verification takes
place in a single transaction (HTTP-request).

I suppose modifying the deploy:deploy goal is not practical, as it may have
impacts for a wide range of Maven users, and the Maven leaders will probably
veto against it.
But what do you think about adding a subgoal "verify" to the deploy plugin?

That way, the following call would deploy and verify an artifact, using -
well - not a single transaction, but at least a single mvn-call:
"deploy:deploy deploy:verify"

The deploy:verify subgoal could presume that Maven has been configured to
create hash-codes in the local repo. So what deploy:verify could basically
do is simply uploading an ordinary artifact .md5 or .sha1 using DAV.

Archiva identifies the verification task based on the file-suffix.
What will Archiva do: 
- compares the uploaded hashcode with the one that has been created
- in case the hashes match: return HTTP 200 (OK)
- in case the hashes do NOT match: all Archiva-artifacts for the given
version (jar, pom, hashes, xml, etc.) will be deleted. returned is HTTP 400
(?) to indicate that the hashes did not match

The deploy:verify subgoal then outputs corresponding messages.
Failed verification should result in a BUILD ERROR message, of course.






-- 
View this message in context: http://old.nabble.com/MRM-1351%3A-please-advise-tp27742269p27744435.html
Sent from the archiva-dev mailing list archive at Nabble.com.


Re: MRM-1351: please advise

Posted by Brett Porter <br...@apache.org>.
On 01/03/2010, at 10:14 PM, Marc Lustig wrote:

> 
> Hi there,
> 
> I have just created MRM-1351 and have a couple of questions:
> 
> 1) supported protocols
> Maven Deploy Plugin supports next to DAV also FTP- and SSH-based artifact
> deployment.
> Which of those additional protocols does Archiva support?
> Accordingly, which is the proper place for a generic implementation of the
> hashcode-based artifact validation?

Neither natively - it does however scan the file-system, but it is not possible to reject it at this stage so the current behaviour of reporting the problem is appropriate.

> 
> 2) getting the hashcode of the local repo
> The maven-deploy-plugin does not support to specify a parameter like
> "sha-hashcode", neither for the deploy nor the deploy-file subgoal. How then
> could Archiva possibly get the hashcode of the local repo from?
> This appears to me a major pre-condition to implement this ticket.

Sorry, this was my mistake - I thought that Maven uploaded the checksum first so that it could be used as the basis for determining if the artifact was correct.

It seems the alternative might be needed, where upon uploading the checksum, if incorrect the checksum and the artifact are deleted. This can be considered reasonable behaviour as there is little risk of deleting a previously correct artifact (snapshot always deploy to a new timestamp, and releases should be blocked from redeployment, never reaching this option). On the downside, the artifact has already been correctly uploaded and if the checksum is never sent, it will be retained in the repository. However, this more closely matches your problem as it seems it is the checksums that are being uploaded incorrectly?

- Brett

--
Brett Porter
brett@apache.org
http://brettporter.wordpress.com/