Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.9 alpha 3/4 regression: Certificate verification fails with GnuTLS #18715

Closed
martinpitt opened this issue Feb 22, 2018 · 33 comments · Fixed by #18837
Closed

3.9 alpha 3/4 regression: Certificate verification fails with GnuTLS #18715

martinpitt opened this issue Feb 22, 2018 · 33 comments · Fixed by #18837
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/P1 sig/security

Comments

@martinpitt
Copy link

SSL certificate based authentication fails with 3.9 alpha 3 when using GnuTLS (which Cockpit does, via glib-networking). This has worked fine so far, up to 3.7.

Things work fine with the oc client:

❱❱❱ oc whoami
system:admin

To illustrate it with OpenSSL vs GnuTLS, let's extract the certificate into standard X.509 files:

awk '/client-certificate-data:/ { print $2 }'  ~/.kube/config | base64 -d > /tmp/cert.pem
awk '/client-key-data:/ { print $2 }'  ~/.kube/config | base64 -d > /tmp/cert.key
awk '/certificate-authority-data:/ { print $2 }' ~/.kube/config | base64 -d > /tmp/ca.pem

Now, authentication with curl works:

❱❱❱ curl --cert /tmp/cert.pem --key /tmp/cert.key --cacert /tmp/ca.pem https://10.111.112.101:8443/api
{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
  "serverAddressByClientCIDRs": [
    {
      "clientCIDR": "0.0.0.0/0",
      "serverAddress": "10.111.112.101:8443"
    }
  ]
}

It also works with OpenSSL:

❱❱❱ (printf 'GET /api HTTP/1.0\n\n'; sleep 1) | openssl s_client -CAfile /tmp/ca.pem -cert /tmp/cert.pem -key /tmp/cert.key -connect 10.111.112.101:8443
CONNECTED(00000003)
depth=1 CN = openshift-signer@1518732226
verify return:1
depth=0 CN = 10.111.112.101
verify return:1
[...]
---
HTTP/1.0 200 OK
Cache-Control: no-store
Content-Type: application/json
Date: Thu, 22 Feb 2018 09:28:46 GMT
Content-Length: 137

{"kind":"APIVersions","versions":["v1"],"serverAddressByClientCIDRs":[{"clientCIDR":"0.0.0.0/0","serverAddress":"10.111.112.101:8443"}]}
closed

But not with GnuTLS:

❱❱ (printf 'GET /api HTTP/1.0\n\n'; sleep 1)  | gnutls-cli --x509certfile=/tmp/cert.pem --x509keyfile=/tmp/cert.key --x509cafile=/tmp/ca.pem -p 8443 10.111.112.101
Processed 1 CA certificate(s).
Processed 1 client X.509 certificates...
Resolving '10.111.112.101:8443'...
Connecting to '10.111.112.101:8443'...
- Certificate type: X.509
- Got a certificate list of 2 certificates.
- Certificate[0] info:
 - subject `CN=10.111.112.101', issuer `CN=openshift-signer@1518732226', serial 0x02, RSA key 2048 bits, signed using RSA-SHA256, activated `2018-02-15 22:03:46 UTC', expires `2020-02-15 22:03:47 UTC', pin-sha256="8s1sXBqKC8Vvwk/Mj/UawpbMS83ZFJDCiaKTZTEwK2w="
[...]
HTTP/1.0 401 Unauthorized
Cache-Control: no-store
Content-Type: application/json
Date: Thu, 22 Feb 2018 09:29:24 GMT
Content-Length: 129

{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
- Peer has closed the GnuTLS connection

For that, openshift logs an error message:

Feb 22 04:30:15 f1.cockpit.lan openshift-run[814]: E0222 04:30:15.791838     816 authentication.go:64] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-signer@1518732226"), x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-signer@1518732226")]
Version

oc v3.9.0-alpha.3+78ddc10
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://10.111.112.101:8443
openshift v3.9.0-alpha.3+78ddc10
kubernetes v1.9.1+a0ce1bc657

@martinpitt
Copy link
Author

This is with GnuTLS 3.5.18 in current Fedora 27, and also happens with GnuTLS 3.6.2 in Rawhide.

@martinpitt
Copy link
Author

Confirmed with 3.9-alpha.4 aka v3.9.0 (which is on dockerhub, but not on the GitHub release page yet).

[root@f1 ~]# oc version
oc v3.9.0-alpha.4+1552371-429
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://10.111.112.101:8443
openshift v3.9.0-alpha.4+1552371-429
kubernetes v1.9.1+a0ce1bc657

@martinpitt martinpitt changed the title 3.9 alpha 3: Certificate verification fails with GnuTLS 3.9 alpha 3/4 regression: Certificate verification fails with GnuTLS Feb 22, 2018
@jwforres
Copy link
Member

@openshift/sig-security

@jwforres jwforres added kind/bug Categorizes issue or PR as related to a bug. priority/P1 labels Feb 22, 2018
@liggitt
Copy link
Contributor

liggitt commented Feb 22, 2018

possibly fixed by #18713

@enj
Copy link
Contributor

enj commented Feb 22, 2018

fixed by #18713

Cool I break and fix things without even realizing.

@martinpitt
Copy link
Author

@enj: Nice, thanks! Is there some trick how I can get a build (just the compiled binary or a docker container etc.) with that fix included? Peter Volpe mentioned that OpenShift 3.9 has some other API changes that need adjustment in Cockpit, and I'd like to start with that ASAP.

@smarterclayton
Copy link
Contributor

smarterclayton commented Feb 22, 2018 via email

@enj
Copy link
Contributor

enj commented Feb 24, 2018

@martinpitt #18713 is merged, can you see if it fixes your issue?

@martinpitt
Copy link
Author

Negative, I'm afraid. I've installed v3.9.0 (latest), image 297850a38c48, which was built 20 hours ago. The PR was merged 2 days ago, so I would assume it includes that PR. But GnuTLS still fails with the above reproducer.

@enj enj assigned mrogers950 and unassigned simo5 Feb 25, 2018
@mrogers950
Copy link
Contributor

At the moment the only difference that I'm seeing with the gnutls and openssl commands is that openssl is sending both the cert and CA and gnutls is just sending the cert. Since the x509 authenticator is seeing > 1 cert from openssl it adds the CA to the intermediates list before verifying, however since the cert is signed directly by the CA that is already in the roots pool that should not matter.

On the server we see a generic verification failure only with the gnutls client:

E0226 19:34:26.838702   21398 authentication.go:64] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "openshift-signer@1518195148")]   ```

@mrogers950
Copy link
Contributor

In the GnuTLS case, the certificate subject is being modified in-place somewhere along the way. Since the subject is modified, this leads to the signature verification failing. I can actually reproduce this outside of OpenShift with a basic HTTPS server:

package main                                          

import (                                              
        "log"                                         
        "net/http"                                    
        "crypto/tls"                                  
        "encoding/hex"                                
)                                                     

func main() {                                         
        mux := http.NewServeMux()                     
        mux.HandleFunc("/", HelloServer)              
        cfg := &tls.Config{ClientAuth: tls.RequestClientCert}                                                
        srv := &http.Server{                          
                Addr:         "127.0.0.1:8811",       
                Handler:      mux,                    
                TLSConfig:    cfg,                    
        }                                             
        log.Fatal(srv.ListenAndServeTLS("server.crt", "server.key"))                                         
}                                                     

func HelloServer(w http.ResponseWriter, req *http.Request) {                                                 
        w.Header().Set("Content-Type", "text/plain")  
        if len(req.TLS.PeerCertificates) > 0 {        
                log.Println(string(req.TLS.PeerCertificates[0].RawSubject))                                  
                log.Println(hex.EncodeToString(req.TLS.PeerCertificates[0].RawSubject))                      
        }                                             
        w.Write([]byte("hi\n"))                       
}                                                     

If you use the same certificates, with the master server certificate and key in the server above, you will be able to see that the raw subject data is different between the two commands:

openssl:

2018/02/27 21:50:15 0N150U                                                                                                                                                                                                 
system:cluster-admins0U                                                                                                                                                                                                    
system:masters10U                                                                                                                                                                                                          
                 system:admin                                                                                                                                                                                              
2018/02/27 21:50:15 304e3135301c060355040a131573797374656d3a636c75737465722d61646d696e733015060355040a130e73797374656d3a6d617374657273311530130603550403130c73797374656d3a61646d696e                                       

GnuTLS:

2018/02/27 21:50:50 0N150U                                                                                                                                                                                                 
system:masters0U                                                                                                                                                                                                           
system:cluster-admins10U                                                                                                                                                                                                   
                        system:admin                                                                                                                                                                                       
2018/02/27 21:50:50 304e31353015060355040a130e73797374656d3a6d617374657273301c060355040a131573797374656d3a636c75737465722d61646d696e73311530130603550403130c73797374656d3a61646d696e                                       

Notice that the multi-value group values (the "O" rdn) is being swapped in place. I tested with go 1.9, 1.5, and 1.4, with gnutls-utils-3.5.16-4.fc26.x86_64 and gnutls-utils-2.12.23-21.el6.x86_64 and got the same results. You are probably only seeing this now because the addition of a second "O" rdn was recently added in #17414

I suspect that GnuTLS is doing this to the subject for some odd reason, but I don't know why exactly.
@martinpitt In the meanwhile as a workaround, you should be able to issue a replacement admin certificate by the openshift signer cert that has the same attributes, but instead with the subject of "O = system:masters, CN = system:admin".

@liggitt
Copy link
Contributor

liggitt commented Feb 27, 2018

Seems like GnuTLS might not be handling multi-RDN certs correctly

@martinpitt
Copy link
Author

@mrogers950 : Nice debugging, thank you so much! It even works if you just swap the Os around, like in -subj '/O=system:masters/CN=system:admin'. Although I don't know how stable/reliable that sorting/bug is in GnuTLS.

Took me a bit to figure out all the steps to update the certificate, so noting them down here for posterity.

cd /openshift.local.config/master/
mv admin.key admin.key.orig
mv admin.crt admin.crt.orig
mv admin.kubeconfig admin.kubeconfig.orig
openssl genrsa -out admin.key 2048
openssl req -new -nodes -key admin.key -out admin.csr -subj '/O=system:masters/O=system:cluster-admins/CN=system:admin'
openssl x509 -req -in admin.csr -CA ca.crt -CAkey ca.key -CAcreateserial -days 730 -out admin.crt
rm admin.csr
oc adm create-kubeconfig --certificate-authority=ca.crt --client-certificate=admin.crt --client-key=admin.key --master="https://10.111.112.101:8443" --kubeconfig=admin.kubeconfig
cp /openshift.local.config/master/admin.kubeconfig /root/.kube/config

I'll put that into cockpit's VM creation script for the OpenShift prerelease image, so that we are unblocked on testing. This isn't a final solution, of course, as other people will stumble over this as well. While it might be a GnuTLS bug, maybe as an intermediate bandaid the O= order can be swapped?

Thanks again!

@mrogers950
Copy link
Contributor

(I opened https://gitlab.com/gnutls/gnutls/issues/403 for this)

martinpitt added a commit to martinpitt/cockpit that referenced this issue Mar 2, 2018
openshift/origin#17414 introduced a second
organization value `system:masters`. Apparently GnuTLS mis-handles
multiple RDNs by re-ordering them and then rejecting the client
certificate due to a mismatching subject.

When building the openshift-prerelease image (and only then), work
around this by regenerating the admin certificate with swapping the `O:`
fields around. This is an utter hack, but unblocks testing OpenShift 3.9
for now.

See openshift/origin#18715 for details.
larskarlitski pushed a commit to cockpit-project/cockpit that referenced this issue Mar 2, 2018
openshift/origin#17414 introduced a second
organization value `system:masters`. Apparently GnuTLS mis-handles
multiple RDNs by re-ordering them and then rejecting the client
certificate due to a mismatching subject.

When building the openshift-prerelease image (and only then), work
around this by regenerating the admin certificate with swapping the `O:`
fields around. This is an utter hack, but unblocks testing OpenShift 3.9
for now.

See openshift/origin#18715 for details.
@simo5
Copy link
Contributor

simo5 commented Mar 2, 2018

@nmav PTAL

@nmav
Copy link

nmav commented Mar 2, 2018

As I mentioned in gnutls/403, gnutls modifying the cert cannot be the issue. Certificates are immutable objects and are not modified (otherwise the signature will be invalidated). The issue seems to be on the server, but we may see if we can improve something in gnutls to avoid making it confused. Or I may have not the full picture of how does this work. Are the certificates fixed in both cases, or are generated by openssl and gnutls independently?

@martinpitt
Copy link
Author

Could it call "openssl x509" to generate the certificate then? With

openssl req -new -nodes -key admin.key -out admin.csr -subj '/O=system:cluster-admins/O=system:masters/CN=system:admin'

You avoid the + syntax and create an ordered list:

Subject: O = system:cluster-admins, O = system:masters, CN = system:admin

@liggitt
Copy link
Contributor

liggitt commented Mar 2, 2018

Could it call "openssl x509" to generate the certificate then?

I don't anticipate changing the cert generation architecture for this

@simo5
Copy link
Contributor

simo5 commented Mar 5, 2018

@martinpitt thanks to @mrogers950 and @nmav we found the cause (bug in golang certificate generation) and we may have a workaround for now.
I'll open a PR soon.

simo5 added a commit to simo5/origin that referenced this issue Mar 5, 2018
This is to workaround Bug openshift#18715, which is caused by Golang Crypto's x509
certificate generation ordering Subjects RDN incorrectly *and* GNUTLS'
bug that "fixes" client certs on read with the correct encoding.

To avoid issues until both are fixed we set the correct ordering ourself

Signed-off-by: Simo Sorce <[email protected]>
@simo5
Copy link
Contributor

simo5 commented Mar 5, 2018

Note there is also a bug in GNUTLS that "fixes" client cert encoding when it shouldn't ... so 2 bugs for one issue, nice! :-)

simo5 added a commit to simo5/origin that referenced this issue Mar 5, 2018
This is to workaround Bug openshift#18715, which is caused by Golang Crypto's x509
certificate generation ordering Subjects RDN incorrectly *and* GNUTLS'
bug that "fixes" client certs on read with the correct encoding.

To avoid issues until both are fixed we set the correct ordering on our
own...

Signed-off-by: Simo Sorce <[email protected]>
@mrogers950
Copy link
Contributor

golang/go#24254

simo5 added a commit to simo5/origin that referenced this issue Mar 5, 2018
This is to workaround Bug openshift#18715, which is caused by Golang Crypto's x509
certificate generation ordering Subjects RDN incorrectly *and* GNUTLS'
bug that "fixes" client certs on read with the correct encoding.

To avoid issues until both are fixed we set the correct ordering on our
own...

Signed-off-by: Simo Sorce <[email protected]>
@simo5
Copy link
Contributor

simo5 commented Mar 5, 2018

Just for completeness, GNUTLS fix is merged here: https://gitlab.com/gnutls/gnutls/merge_requests/608

simo5 added a commit to simo5/origin that referenced this issue Mar 6, 2018
This is to workaround Bug openshift#18715, which is caused by Golang Crypto's x509
certificate generation ordering Subjects RDN incorrectly *and* GNUTLS'
bug that "fixes" client certs on read with the correct encoding.

To avoid issues until both are fixed we set the correct ordering on our
own...

Signed-off-by: Simo Sorce <[email protected]>
openshift-merge-robot added a commit that referenced this issue Mar 7, 2018
Automatic merge from submit-queue (batch tested with PRs 18835, 18857, 18641, 18656, 18837).

Reorder groups in cert Subjects

This is to workaround Bug #18715, which is caused by Golang Crypto's x509
certificate generation ordering Subjects RDN incorrectly *and* GNUTLS'
bug that "fixes" client certs on read with the correct encoding.

To avoid issues until both are fixed we set the correct ordering ourself

Fixes #18715
xref golang/go#24254 https://gitlab.com/gnutls/gnutls/issues/403#note_61687722
@martinpitt
Copy link
Author

This still fails with the recently released v3.9.0. Indeed it seems PR #18837 only landed in master, not in the 3.9 branch? At least it's definitively not applied in https://github.com/openshift/origin/blob/v3.9.0/pkg/cmd/server/crypto/crypto.go , only in https://github.com/openshift/origin/blob/master/pkg/cmd/server/crypto/crypto.go .

Can this please get cherry-picked into 3.9.1, so that this works with Cockpit and other GnuTLS clients?

martinpitt added a commit to martinpitt/cockpit that referenced this issue Apr 6, 2018
Unfortunately the fix for openshift/origin#18715
did not make it into the latest v3.9.0 release, so apply it to the
stable "openshift" image as well. It is already conditionally done on
broken certificates.
martinpitt added a commit to martinpitt/cockpit that referenced this issue Apr 6, 2018
Unfortunately the fix for openshift/origin#18715
did not make it into the latest v3.9.0 release, so apply it to the
stable "openshift" image as well. It is already conditionally done on
broken certificates.
mareklibra pushed a commit to mareklibra/cockpit that referenced this issue Apr 23, 2018
Unfortunately the fix for openshift/origin#18715
did not make it into the latest v3.9.0 release, so apply it to the
stable "openshift" image as well. It is already conditionally done on
broken certificates.
martinpitt added a commit to mareklibra/cockpit that referenced this issue Apr 25, 2018
Unfortunately the fix for openshift/origin#18715
did not make it into the latest v3.9.0 release, so apply it to the
stable "openshift" image as well. It is already conditionally done on
broken certificates.
mareklibra pushed a commit to mareklibra/cockpit that referenced this issue Apr 26, 2018
Unfortunately the fix for openshift/origin#18715
did not make it into the latest v3.9.0 release, so apply it to the
stable "openshift" image as well. It is already conditionally done on
broken certificates.
mareklibra pushed a commit to mareklibra/cockpit that referenced this issue Apr 27, 2018
Unfortunately the fix for openshift/origin#18715
did not make it into the latest v3.9.0 release, so apply it to the
stable "openshift" image as well. It is already conditionally done on
broken certificates.
martinpitt added a commit to cockpit-project/cockpit that referenced this issue Apr 27, 2018
Unfortunately the fix for openshift/origin#18715
did not make it into the latest v3.9.0 release, so apply it to the
stable "openshift" image as well. It is already conditionally done on
broken certificates.
martinpitt added a commit to martinpitt/cockpit that referenced this issue Sep 4, 2018
 - Drop kubevirt for now. It has been broken for many months now (cockpit-project#9479,
   cockpit-project#9638), and we must rebuild openshift to finally unbreak our cockpit/ws
   container (cockpit-project#9941). kubevirt tests continue to run on the
   openshift-prerelease image, which is sufficient for nw.

 - Move image to fedora-28, so that we can move cockpit/ws and cockpit/base to
   fedora-28.

 - Drop workaround for [certificate bug](openshift/origin#18715),
   the fix is in latest OpenShift 3.9.0 now.

Fixes cockpit-project#9479
martinpitt added a commit to martinpitt/cockpit that referenced this issue Sep 4, 2018
 - Drop kubevirt for now. It has been broken for many months now (cockpit-project#9479,
   cockpit-project#9638), and we must rebuild openshift to finally unbreak our cockpit/ws
   container (cockpit-project#9941). kubevirt tests continue to run on the
   openshift-prerelease image.

 - Move image to fedora-28, so that we can move cockpit/ws and
   cockpit/base to fedora-28.

 - Update workaround for openshift/origin#18715
   to install "openssl", which is not in the default install any more on
   Fedora 28.

Fixes cockpit-project#9479
Closes cockpit-project#9975
martinpitt added a commit to croissanne/cockpit that referenced this issue Sep 4, 2018
 - Drop kubevirt for now. It has been broken for many months now (cockpit-project#9479,
   cockpit-project#9638), and we must rebuild openshift to finally unbreak our cockpit/ws
   container (cockpit-project#9941). kubevirt tests continue to run on the
   openshift-prerelease image.

 - Move image to fedora-28, so that we can move cockpit/ws and
   cockpit/base to fedora-28.

 - Drop workaround for openshift/origin#18715,
   the fix is in latest OpenShift 3.9.0 now.

Fixes cockpit-project#9479
Closes cockpit-project#9975
martinpitt added a commit to cockpit-project/cockpit that referenced this issue Sep 4, 2018
 - Drop kubevirt for now. It has been broken for many months now (#9479,
   #9638), and we must rebuild openshift to finally unbreak our cockpit/ws
   container (#9941). kubevirt tests continue to run on the
   openshift-prerelease image.

 - Move image to fedora-28, so that we can move cockpit/ws and
   cockpit/base to fedora-28.

 - Update workaround for openshift/origin#18715
   to install "openssl", which is not in the default install any more on
   Fedora 28.

Fixes #9479
Closes #9975
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/P1 sig/security
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants