Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Diagnostics] Fix AnalyzeLogs to provide more clear debug message #18654

Merged
merged 4 commits into from
Feb 21, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ func resolveServerIP(serverUrl string, fn dnsResolver) ([]string, error) {

func searchNodesForIP(nodes []kapi.Node, ips []string) types.DiagnosticResult {
r := types.NewDiagnosticResult(MasterNodeName)
r.Debug("DClu3005", fmt.Sprintf("Seaching for a node with master IP: %s", ips))
r.Debug("DClu3005", fmt.Sprintf("Searching for a node with master IP: %s", ips))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug message currently shows:

debug: map[origin-node:{origin-node false false false 0} openvswitch:{openvswitch true false true 0} iptables:{iptables true true true 0} docker:{docker true true true 0} etcd:{etcd true true true 0} kubernetes:{kubernetes false false false 0} origin-master:{origin-master false false false 0} atomic-openshift-master:{atomic-openshift-master false false false 0} atomic-openshift-node:{atomic-openshift-node true true true 0}]

Actually I wanted to delete it, but if I deleted it, the number DClu3005 will be absent in master_node.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no special reason to keep an identifier.

But, I think you are talking about the debug statement in locate_units.go

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to keep the message just to preserve DClu3005 tag. If the tag is present then it should point to unique error/debug but if the error/debug is not relevant any more then we should delete the error/debug. Don't keep stale code just to maintain these tags (we have done the same thing for networkcheck and aggregatedlogging diagnostics).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always pinpoint the error/debug/warning without these tags, we just need ose release (oc version) and filepath:line-number which are available. I think these tags are more confusing than useful. May be we should get rid of them?
@sosiouxme

Copy link
Member

@sosiouxme sosiouxme Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pravisankar The tags certainly serve no purpose for debug/info level at this time. The purpose in warn/error is to have something stable and semantically meaningless that's safe to grep in automation, as there is no machine-readable output at this time. The tags were originally intended for machine-readable output, which is a requested and long-neglected feature.

I doubt anyone's using the tags for that purpose now. I suppose overall it may be better to get rid of tags entirely. It would not be hard to retrofit them when fleshing out the machine-readable feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I made a comment under wrong line. As @sosiouxme, I meant logger.Debug("DS1003", fmt.Sprintf("%v", systemdUnits)). (And I removed it now.)

The message DClu3005 is really necessary until this bug was fixed https://bugzilla.redhat.com/show_bug.cgi?id=1368865


// Loops = # of nodes * number of IPs per node (2 commonly) * # of IPs the
// server hostname resolves to. (should usually be 1)
Expand Down
17 changes: 7 additions & 10 deletions pkg/oc/admin/diagnostics/diagnostics/systemd/analyze_logs.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,13 @@ func (d AnalyzeLogs) Check() types.DiagnosticResult {
if err := json.Unmarshal(bytes, &entry); err != nil {
r.Debug("DS0003", fmt.Sprintf("Couldn't read the JSON for this log message:\n%s\nGot error %s", string(bytes), errStr(err)))
} else {
if lineCount > 500 && stampTooOld(entry.TimeStamp, timeLimit) {
r.Debug("DS0004", fmt.Sprintf("Stopped reading %s log: timestamp %s too old", unitName, entry.TimeStamp))
break // if we've analyzed at least 500 entries, stop when age limit reached (don't scan days of logs)
epochns, err := strconv.ParseInt(entry.TimeStamp, 10, 64)
if err == nil && time.Unix(epochns/1000000, 0).Before(timeLimit) && lineCount > 500 {
r.Debug("DS0005", fmt.Sprintf("Stopped reading %s log: timestamp %s more than 1 hour ago", unitName, time.Unix(epochns/1000000, 0)))
break
} else if err != nil {
r.Warn("DS0004", err, fmt.Sprintf("Find invalid timestamp %s in %s log", entry.TimeStamp, unitName))
continue
}
if unit.StartMatch.MatchString(entry.Message) {
break // saw log message for unit startup; don't analyze previous logs
Expand Down Expand Up @@ -142,10 +146,3 @@ func (d AnalyzeLogs) Check() types.DiagnosticResult {

return r
}

func stampTooOld(stamp string, timeLimit time.Time) bool {
if epochns, err := strconv.ParseInt(stamp, 10, 64); err == nil {
return time.Unix(epochns/1000000, 0).Before(timeLimit)
}
return true // something went wrong, stop looking...
}
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ func GetSystemdUnits(logger *log.Logger) map[string]types.SystemdUnit {
}
}

logger.Debug("DS1003", fmt.Sprintf("%v", systemdUnits))
return systemdUnits
}

Expand Down