Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: add CRI-O handler #1741

Merged
merged 1 commit into from
Sep 6, 2017
Merged

*: add CRI-O handler #1741

merged 1 commit into from
Sep 6, 2017

Conversation

runcom
Copy link
Contributor

@runcom runcom commented Aug 31, 2017

This patch adds native support for the CRI-O runtime to cadvisor.
Tested by integrating it with kubernetes and verified the necessary stats are now returned properly.

/cc @derekwaynecarr @mrunalp

Signed-off-by: Antonio Murdaca [email protected]

@runcom
Copy link
Contributor Author

runcom commented Aug 31, 2017

failure is because there's no CRI-O in the CI to test this with:

W0831 16:13:03.580] W0831 16:13:03.498099    2569 manager.go:166] unable to connect to CRI-O api service: Get http://localhost:7373/info: dial tcp 127.0.0.1:7373: getsockopt: connection refused

@runcom
Copy link
Contributor Author

runcom commented Aug 31, 2017

/retest

@derekwaynecarr derekwaynecarr self-assigned this Aug 31, 2017
@derekwaynecarr
Copy link
Collaborator

@dashpole @dchen1107 -- this is the follow-up from our discussion in this week's sig-node. i can handle primary review on this.

@derekwaynecarr
Copy link
Collaborator

fyi @sjenning

@runcom
Copy link
Contributor Author

runcom commented Aug 31, 2017

failure doesn't seem related (?)

@dashpole
Copy link
Collaborator

#1742 builds without error. I havent seen this before, so it is likely something with this PR.

@sjenning
Copy link
Contributor

Looks similiar to #1481 again ☹️ Maybe my netgo flag PR didn't fix it.

@sjenning
Copy link
Contributor

/retest

@runcom
Copy link
Contributor Author

runcom commented Sep 1, 2017

Updated to make calls on the CRI-O socket

@runcom runcom force-pushed the crio-handler branch 4 times, most recently from c0cfa4b to e70a9de Compare September 2, 2017 00:13
Copy link
Collaborator

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close!

)

const (
crioSocket = "/var/run/crio.sock"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this public? I would want to reference it in kubelet

// The namespace under which crio aliases are unique.
const CrioNamespace = "crio"

// Regexp that identifies docker cgroups, containers started with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit from my port. s/docker/cri-o


// crio handles all containers under /crio
func (self *crioFactory) CanHandleAndAccept(name string) (bool, bool, error) {
glog.Infof("CRIO CAN HANDLE AND ACCEPT: %v", name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup logging

if !strings.HasPrefix(path.Base(name), CrioNamespace) {
return false, false, nil
}
// if the container is not associated with docker, we can't handle it or accept it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/docker/crio

return false, false, nil
}
glog.Infof("CRIO HANDLE AND ACCEPT: %v", name)
// TODO should we call equivalent of a crio info to be sure its really ours
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I wrote this comment. I would prefer we do not do this for perf

// Manager of this container's cgroups.
cgroupManager cgroups.Manager

// the docker storage driver
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

context := fs.Context{
Docker: fs.DockerContext{
Root: docker.RootDir(),
Driver: dockerStatus.Driver,
DriverStatus: dockerStatus.DriverStatus,
},
RktPath: rktPath,
RktPath: rktPath,
CrioPath: crioInfo.StorageRoot,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have this be a struct similar to docker

@runcom runcom force-pushed the crio-handler branch 2 times, most recently from 559cbac to 46c9379 Compare September 2, 2017 09:34
@runcom
Copy link
Contributor Author

runcom commented Sep 2, 2017

@derekwaynecarr addressed your comments

@runcom runcom force-pushed the crio-handler branch 2 times, most recently from cde0f22 to 3c40183 Compare September 4, 2017 16:11
@runcom runcom changed the title [WIP] *: add CRI-O handler *: add CRI-O handler Sep 4, 2017
@runcom
Copy link
Contributor Author

runcom commented Sep 4, 2017

removing WIP label, I think it's fully ready for review (added some unit tests also):

kubectl get --raw /api/v1/nodes/127.0.0.1/proxy/stats/summary

{
  "node": {
   "nodeName": "127.0.0.1",
   "startTime": "2017-09-04T11:40:43Z",
   "cpu": {
    "time": "2017-09-04T16:13:10Z",
    "usageNanoCores": 454768832,
    "usageCoreNanoSeconds": 4424417531376
   },
   "memory": {
    "time": "2017-09-04T16:13:10Z",
    "availableBytes": 11485138944,
    "usageBytes": 10475511808,
    "workingSetBytes": 9253203968,
    "rssBytes": 3624992768,
    "pageFaults": 61345034,
    "majorPageFaults": 14271
   },
   "fs": {
    "time": "2017-09-04T16:13:10Z",
    "availableBytes": 13987569664,
    "capacityBytes": 52710469632,
    "usedBytes": 36021768192,
    "inodesFree": 2681895,
    "inodes": 3276800,
    "inodesUsed": 594905
   },
   "runtime": {
    "imageFs": {
     "time": "2017-09-04T16:13:10Z",
     "availableBytes": 13987569664,
     "capacityBytes": 52710469632,
     "usedBytes": 560776649,
     "inodesFree": 2681895,
     "inodes": 3276800,
     "inodesUsed": 594905
    }
   }
  },
  "pods": [
   {
    "podRef": {
     "name": "kube-dns-4124969034-6gqh6",
     "namespace": "kube-system",
     "uid": "fc25928b-918a-11e7-b392-507b9d4141fa"
    },
    "startTime": "2017-09-04T16:06:33Z",
    "containers": [
     {
      "name": "sidecar",
      "startTime": "2017-09-04T16:06:34Z",
      "cpu": {
       "time": "2017-09-04T16:13:09Z",
       "usageNanoCores": 1040646,
       "usageCoreNanoSeconds": 627473919
      },
      "memory": {
       "time": "2017-09-04T16:13:09Z",
       "usageBytes": 11571200,
       "workingSetBytes": 11444224,
       "rssBytes": 10747904,
       "pageFaults": 4191,
       "majorPageFaults": 1
      },
      "rootfs": {
       "time": "2017-09-04T16:13:09Z",
       "availableBytes": 13987569664,
       "capacityBytes": 52710469632,
       "usedBytes": 42471424,
       "inodesFree": 2681895,
       "inodes": 3276800,
       "inodesUsed": 14
      },
      "logs": {
       "time": "2017-09-04T16:13:09Z",
       "availableBytes": 13987569664,
       "capacityBytes": 52710469632,
       "usedBytes": 20480,
       "inodesFree": 2681895,
       "inodes": 3276800,
       "inodesUsed": 594905
      },
      "userDefinedMetrics": null
     },
     {
      "name": "dnsmasq",
      "startTime": "2017-09-04T16:06:34Z",
      "cpu": {
       "time": "2017-09-04T16:13:07Z",
       "usageNanoCores": 848354,
       "usageCoreNanoSeconds": 169743690
      },
      "memory": {
       "time": "2017-09-04T16:13:07Z",
       "usageBytes": 7319552,
       "workingSetBytes": 7315456,
       "rssBytes": 6520832,
       "pageFaults": 2759,
       "majorPageFaults": 0
      },
      "rootfs": {
       "time": "2017-09-04T16:13:07Z",
       "availableBytes": 13987569664,
       "capacityBytes": 52710469632,
       "usedBytes": 42090496,
       "inodesFree": 2681895,
       "inodes": 3276800,
       "inodesUsed": 15
      },
      "logs": {
       "time": "2017-09-04T16:13:07Z",
       "availableBytes": 13987569664,
       "capacityBytes": 52710469632,
       "usedBytes": 20480,
       "inodesFree": 2681895,
       "inodes": 3276800,
       "inodesUsed": 594905
      },
      "userDefinedMetrics": null
     },
     {
      "name": "kubedns",
      "startTime": "2017-09-04T16:06:33Z",
      "cpu": {
       "time": "2017-09-04T16:13:10Z",
       "usageNanoCores": 609739,
       "usageCoreNanoSeconds": 348014134
      },
      "memory": {
       "time": "2017-09-04T16:13:10Z",
       "availableBytes": 170975232,
       "usageBytes": 7282688,
       "workingSetBytes": 7282688,
       "rssBytes": 6557696,
       "pageFaults": 2524,
       "majorPageFaults": 0
      },
      "rootfs": {
       "time": "2017-09-04T16:13:10Z",
       "availableBytes": 13987569664,
       "capacityBytes": 52710469632,
       "usedBytes": 50057216,
       "inodesFree": 2681895,
       "inodes": 3276800,
       "inodesUsed": 15
      },
      "logs": {
       "time": "2017-09-04T16:13:10Z",
       "availableBytes": 13987569664,
       "capacityBytes": 52710469632,
       "usedBytes": 20480,
       "inodesFree": 2681895,
       "inodes": 3276800,
       "inodesUsed": 594905
      },
      "userDefinedMetrics": null
     }
    ],
    "network": {
     "time": "2017-09-04T16:13:03Z",
     "rxBytes": 290526,
     "rxErrors": 0,
     "txBytes": 266470,
     "txErrors": 0
    },
    "volume": [
     {
      "time": "2017-09-04T16:07:15Z",
      "availableBytes": 10369159168,
      "capacityBytes": 10369171456,
      "usedBytes": 12288,
      "inodesFree": 2531527,
      "inodes": 2531536,
      "inodesUsed": 9,
      "name": "kube-dns-token-8b24g"
     }
    ]
   }
  ]
 }

@runcom runcom force-pushed the crio-handler branch 2 times, most recently from 0da90e4 to fe9ddf3 Compare September 5, 2017 09:34
@runcom
Copy link
Contributor Author

runcom commented Sep 5, 2017

pretty sure tests failure is unrelated. Added some unit tests also.

@derekwaynecarr
Copy link
Collaborator

/test pull-cadvisor-e2e

@derekwaynecarr
Copy link
Collaborator

@runcom -- looks like a gofmt error on handler_test.go i think.

Signed-off-by: Antonio Murdaca <[email protected]>
@runcom
Copy link
Contributor Author

runcom commented Sep 5, 2017

@runcom -- looks like a gofmt error on handler_test.go i think.

@derekwaynecarr fixed

// No need for compression in local communications.
tr.DisableCompression = true
tr.Dial = func(_, _ string) (net.Conn, error) {
return net.DialTimeout(proto, addr, 32*time.Second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be shorter? rkt is 2s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker has the exact same value when using socket connection

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for the detail.

@derekwaynecarr
Copy link
Collaborator

@runcom - just the one question on timeout looking rather high. once addressed, this is LGTM

@derekwaynecarr
Copy link
Collaborator

LGTM

@derekwaynecarr derekwaynecarr merged commit e8dbd50 into google:master Sep 6, 2017
@runcom runcom deleted the crio-handler branch September 6, 2017 19:49
vdemeester pushed a commit to vdemeester/kubernetes that referenced this pull request Sep 7, 2017
Automatic merge from submit-queue (batch tested with PRs 51728, 49202)

Enable CRI-O stats from cAdvisor

**What this PR does / why we need it**:
cAdvisor may support multiple container runtimes (docker, rkt, cri-o, systemd, etc.)

As long as the kubelet continues to run cAdvisor, runtimes with native cAdvisor support may not want to run multiple monitoring agents to avoid performance regression in production.  Pending kubelet running a more light-weight monitoring solution, this PR allows remote runtimes to have their stats pulled from cAdvisor when cAdvisor is registered stats provider by introspection of the runtime endpoint.

See issue kubernetes#51798

**Special notes for your reviewer**:
cAdvisor will be bumped to pick up google/cadvisor#1741

At that time, CRI-O will support fetching stats from cAdvisor.

**Release note**:
```release-note
NONE
```
openshift-merge-robot added a commit to openshift/origin that referenced this pull request Sep 19, 2017
Automatic merge from submit-queue

cadvisor/runc updates

Support CRI-O:
google/cadvisor#1741

Fix memory stats:
google/cadvisor#1728
opencontainers/runc#1378

@derekwaynecarr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants