Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Backtrace Screen #1270

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Add Backtrace Screen #1270

wants to merge 5 commits into from

Conversation

MrCirdo
Copy link
Contributor

@MrCirdo MrCirdo commented Jul 19, 2023

Hello everyone!

This is my first big pull request 😃

The goal of this PR is to add a backtrace screen for process or thread.
Here's what it looks like for a thread :
image

And for a process:
image

Behind, I use the tool called eu-stack provided by elf-utils.
The standard output is parsed and printed to the screen.
Currently, I have implemented only the Refresh button. And my world is inspired by TraceScreen and OpenFilesScreen.

I still have some work to do before my work is ready (Formatting, bug fixes, ...).
Currently, this is more of a demonstration than a cool feature 😄 .

What do you think about my work? Is it a feature that can be added?

@BenBE BenBE added the new feature Completely new feature label Jul 19, 2023
Copy link
Member

@BenBE BenBE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your PR. The feature itself looks interesting and fits quite nicely with existing functionality.

But, the use of external tools is a bit problematic and is best to be avoided. For the case of retrieving stack traces, there are several libraries available, that might be worth a look (e.g. libunwind). Also I'm not sure if the code as-is would properly run on e.g. FreeBSD or Darwin. Thus I strongly prefer a solution that allows to split out these platform-dependent parts where necessary (in the case of lsof, things are portable enough across all our target platforms so it's not an issue there; not sure about eu-stack though).

While reading through your PR I noticed that this seems to handle debug information. As such, it would be nice to have the module, source file and line be available separately (where available). Also the setting of hiding path names should be respected for module filenames in order to be consistent with the rest of the UI. Taking the highlight basename setting into account would be nice too.

Another code refactoring task is related to a recent addition of the generalized columns code that @natoscott recently worked on. Please have a look there if you like.

Also there's a few further notes regarding the code which you can find below. Please feel free to rebase the fixes to those issues directly into the existing commits as you see fit.

If you need further assistance feel free to ask.

@MrCirdo
Copy link
Contributor Author

MrCirdo commented Jul 20, 2023

Thank you for your reactivity and your review.

I agree with you when you say eu-stack is not a good idea. I only used it to make a quick demo of my idea.
I'm also unsure if eu-stack is supported on the BSD platforms.

Moreover, I think also the library unwind is more appropriate. I just have to deal with DWARF information.
I will probably close all suggestions regarding the execution/parsing of eu-stack.

I saw very quickly the work of @natoscott and I will wait until his work is finished.

@BenBE
Copy link
Member

BenBE commented Jul 20, 2023

Thank you for your reactivity and your review.

You're welcome.

I agree with you when you say eu-stack is not a good idea. I only used it to make a quick demo of my idea. I'm also unsure if eu-stack is supported on the BSD platforms.

There seems to be some patches re FreeBSD, but I'd not actually call this support … ;-)

That's with Darwin aside entirely … ;-)

Moreover, I think also the library unwind is more appropriate. I just have to deal with DWARF information. I will probably close all suggestions regarding the execution/parsing of eu-stack.

Sure, go ahead. Maybe check for similar places elsewhere in case something similar was missed in other places of this PR.

I saw very quickly the work of @natoscott and I will wait until his work is finished.

If you prepare your PR to anticipate that change, e.g. by preparing the data structures in a way that makes this move easy, you could even prepare things now. Given the remaining aspects in that other PR will take some time to sort out, don't feel obliged to wait for those changes to land. If the structures introduced in this PR are clean enough there's not too much work later to move things over to this new interface. Most of the work actually has already been done.

@Explorer09
Copy link
Contributor

Please don't merge the htop-dev:main branch. Rebase it instead. You know what git rebase is, don't you?

@MrCirdo
Copy link
Contributor Author

MrCirdo commented Sep 5, 2023

The flake commit is just for me. I will remove it at the end.
I don't start my work. However, I keep updating my branch. So it's not ready to review.

@MrCirdo
Copy link
Contributor Author

MrCirdo commented Nov 16, 2023

Hey,

I didn't have a lot of time recently for this PR. But I have some questions.

If you prepare your PR to anticipate that change, e.g. by preparing the data structures in a way that makes this move easy, you could even prepare things now. Given the remaining aspects in that other PR will take some time to sort out, don't feel obliged to wait for those changes to land. If the structures introduced in this PR are clean enough there's not too much work later to move things over to this new interface. Most of the work actually has already been done.

I'm looking for Row workflow. I don't know if it's a good idea to use it. Row_display uses the object Settings to get fields. Does it mean I have to make my own Settings object? With all process settings (like highlightThreads)? And do I have to make my ScreenSettings? Is it okay?

@BenBE
Copy link
Member

BenBE commented Nov 16, 2023

I didn't have a lot of time recently for this PR. But I have some questions.

If you prepare your PR to anticipate that change, e.g. by preparing the data structures in a way that makes this move easy, you could even prepare things now. Given the remaining aspects in that other PR will take some time to sort out, don't feel obliged to wait for those changes to land. If the structures introduced in this PR are clean enough there's not too much work later to move things over to this new interface. Most of the work actually has already been done.

I'm looking for Row workflow. I don't know if it's a good idea to use it. Row_display uses the object Settings to get fields. Does it mean I have to make my own Settings object? With all process settings (like highlightThreads)? And do I have to make my ScreenSettings? Is it okay?

The Settings and ScreenSettings objects are shared between all tabs. The Settings provides with the global settings, applicable to all screens, while ScreenSettings is applicable to only to the current tab only. As you are basically inheriting the settings from the tab you are opening the backtrace details on, you can just copy the pointer to those settings. These are held in the Screen mostly for convenience and to reduce dependency scope. I.e. if you don't need any of the values from the settings (and none of the functions you're relying on) you could even skip these objects entirely.

@MrCirdo
Copy link
Contributor Author

MrCirdo commented Nov 17, 2023

I didn't have a lot of time recently for this PR. But I have some questions.

If you prepare your PR to anticipate that change, e.g. by preparing the data structures in a way that makes this move easy, you could even prepare things now. Given the remaining aspects in that other PR will take some time to sort out, don't feel obliged to wait for those changes to land. If the structures introduced in this PR are clean enough there's not too much work later to move things over to this new interface. Most of the work actually has already been done.

I'm looking for Row workflow. I don't know if it's a good idea to use it. Row_display uses the object Settings to get fields. Does it mean I have to make my own Settings object? With all process settings (like highlightThreads)? And do I have to make my ScreenSettings? Is it okay?

The Settings and ScreenSettings objects are shared between all tabs. The Settings provides with the global settings, applicable to all screens, while ScreenSettings is applicable to only to the current tab only. As you are basically inheriting the settings from the tab you are opening the backtrace details on, you can just copy the pointer to those settings. These are held in the Screen mostly for convenience and to reduce dependency scope. I.e. if you don't need any of the values from the settings (and none of the functions you're relying on) you could even skip these objects entirely.

Thank you very much for the clarification. I was not sure I would be able to modify it (I mean If my changes would be accepted).

Copy link
Member

@BenBE BenBE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no custom build system files from e.g. Flake …

@MrCirdo
Copy link
Contributor Author

MrCirdo commented Jan 9, 2024

Hi @BenBE ,
I completely forgot to say to not review my code. I just updated my branch.
I deeply apologize.

@BenBE
Copy link
Member

BenBE commented Jan 9, 2024

Hi @BenBE , I completely forgot to say to not review my code. I just updated my branch. I deeply apologize.

Don't worry. No need to apologize. The PR is marked as draft anyway, thus the only things I remarked upon was what I noticed immediately. I also ticked off some of the previous comments that no longer apply to the current state of this PR. Basically some maintenance.

NB: This PR would be scheduled for 3.4.x earliest anyway given that 3.3.0 is about to ship anytime soon.

@MrCirdo
Copy link
Contributor Author

MrCirdo commented Mar 16, 2024

Hi,

This PR is ready for review.
I closed all previous conversations.

Currently, I add only the support of Linux.

@MrCirdo MrCirdo marked this pull request as ready for review March 16, 2024 17:35
@MrCirdo MrCirdo force-pushed the main branch 2 times, most recently from c53af06 to 792bdba Compare March 16, 2024 20:46
@BenBE
Copy link
Member

BenBE commented Mar 17, 2024

Given that this feature will be available on multiple platforms and requires changes in different places I'd suggest to have a top-level feature flag for backtrace support and additional flags for the various libraries (iberty, eu-stack, …) The actual implementation for the stack traces should go in the platform code for each system, leaving the actual screen implementation mostly platform independent.

@MrCirdo
Copy link
Contributor Author

MrCirdo commented Mar 19, 2024

Given that this feature will be available on multiple platforms and requires changes in different places I'd suggest to have a top-level feature flag for backtrace support and additional flags for the various libraries (iberty, eu-stack, …) The actual implementation for the stack traces should go in the platform code for each system, leaving the actual screen implementation mostly platform independent.

Ok I see what you mean, is it a good idea to be inspired by for example Machine_new?

@BenBE
Copy link
Member

BenBE commented Mar 19, 2024

Given that this feature will be available on multiple platforms and requires changes in different places I'd suggest to have a top-level feature flag for backtrace support and additional flags for the various libraries (iberty, eu-stack, …) The actual implementation for the stack traces should go in the platform code for each system, leaving the actual screen implementation mostly platform independent.

Ok I see what you mean, is it a good idea to be inspired by for example Machine_new?

Maybe a better candidate may be the way in which Platform_getNetworkIO and Platform_getProcessLocks are implemented in <platform>/*Platform.[ch]

@MrCirdo
Copy link
Contributor Author

MrCirdo commented Mar 20, 2024

Given that this feature will be available on multiple platforms and requires changes in different places I'd suggest to have a top-level feature flag for backtrace support and additional flags for the various libraries (iberty, eu-stack, …) The actual implementation for the stack traces should go in the platform code for each system, leaving the actual screen implementation mostly platform independent.

Ok I see what you mean, is it a good idea to be inspired by for example Machine_new?

Maybe a better candidate may be the way in which Platform_getNetworkIO and Platform_getProcessLocks are implemented in <platform>/*Platform.[ch]

Okay thank you, I'll take a look

@MrCirdo
Copy link
Contributor Author

MrCirdo commented Jan 7, 2025

Hello,

I've improved the option used by cplus_demangling and added some safety guards to avoid exploding the memory.
Unfortunately, I wasn't able to reproduce the crash you encountered @BenBE. I tested in Ubuntu with teamviewerd but nothing. The safeguard should work.
Also, I add some messages during the configuration.

@MrCirdo MrCirdo force-pushed the main branch 3 times, most recently from c9b2068 to 5e79cdb Compare January 14, 2025 16:37
@MrCirdo MrCirdo force-pushed the main branch 2 times, most recently from 2fc69d5 to 05dd74b Compare January 28, 2025 10:18
@BenBE BenBE linked an issue Feb 8, 2025 that may be closed by this pull request
@MrCirdo MrCirdo requested review from BenBE and Explorer09 February 11, 2025 18:13
@@ -681,6 +695,115 @@ bool Platform_getNetworkIO(NetworkIOData* data) {
return true;
}

void Platform_getBacktrace(Vector* frames, const BacktracePanel* panel, char** error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't hurt to initialize *error at the start of the function.

Suggested change
void Platform_getBacktrace(Vector* frames, const BacktracePanel* panel, char** error) {
void Platform_getBacktrace(Vector* frames, const BacktracePanel* panel, char** error) {
*error = NULL;

Copy link
Contributor Author

@MrCirdo MrCirdo Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding an assert that checks if *error is not null? Is it a good idea?

assert(*error != NULL)

The goal is to prevent an error on another error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with output parameters is always how you handle pre-initialized values. As there is just a single caller, an assert(error && !*error); should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is, you are using the error parameter for output only, not for input. It should be a good practice for the callee to initialize the output value to something stable. Expecting the caller to do it is more prone to errors. And notice that you only call this function in one place, that extra line of initialization shouldn't add any extra code from the compiler when it inlines it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Explorer09 I agree with adding the initializer inside the function. My point with the assert covering both error and *error is about catching a mal-behaved caller that passes some non-NULL value for the error return value.

An alternative would be change the API for void Platform_getBacktrace(Vector* frames, const BacktracePanel* panel, char** error); to char* Platform_getBacktrace(Vector* frames, const BacktracePanel* panel); (ignoring my comment below regarding the BacktracePanel* for a moment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another note. I believe the whole function should be wrapped in a #ifdef HAVE_BACKTRACE ... #endif conditional.

Also we can replace the "The backtrace screen is not implemented" dummy screen and dummy message with a compule time error (#error "Backtrace is enabled without engine selected.").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point with the assert covering both error and *error is about catching a mal-behaved caller that passes some non-NULL value for the error return value.

I was looking at asprintf(3) that would return the output string in a similar way. If asprintf doesn't need an assertion for this, we won't need it either.

assert(error) can be covered by ATTR_NONNULL to the whole function. assert(!*error) won't do really much here, because whether *error is a malloc'd buffer should be tracked by the compiler through other means.

Copy link
Contributor Author

@MrCirdo MrCirdo Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be change the API for void Platform_getBacktrace(Vector* frames, const BacktracePanel* panel, char** error); to char* Platform_getBacktrace(Vector* frames, const BacktracePanel* panel); (ignoring my comment below regarding the BacktracePanel* for a moment).

I'm not a fan of this alternative. I chose char **error because that indicates to the developer this variable is for error. I found this pattern when I learned Objective-C and NSError. Returning char * that returns an error, I think, it's not very clear on the data returned.

Also we can replace the "The backtrace screen is not implemented" dummy screen and dummy message with a compule time error (#error "Backtrace is enabled without engine selected.").

I don't understand the compute time error. If during the configuration, the backtrace feature is disabled, Does it produce a compute time error? However, it's a good suggestion to change the message...

I just realized that I prefer that error should be an optional parameter. If the developer that uses Plateform_getBacktrace doesn't want to check the error, it's his choice. But it involves checking if error is null.
Currently, it's easier to add an assert, it will make the code more readable than adding an if to every error set.

MrCirdo and others added 5 commits February 14, 2025 11:47
Co-authored-by: Benny Baumann <[email protected]>
Co-authored-by: Kang-Che Sung <[email protected]>
Co-authored-by: Benny Baumann <[email protected]>
Co-authored-by: Kang-Che Sung <[email protected]>
Co-authored-by: Benny Baumann <[email protected]>
Co-authored-by: Kang-Che Sung <[email protected]>
Co-authored-by: Benny Baumann <[email protected]>
Co-authored-by: Kang-Che Sung <[email protected]>
Co-authored-by: Benny Baumann <[email protected]>
Co-authored-by: Kang-Che Sung <[email protected]>
@BenBE
Copy link
Member

BenBE commented Feb 14, 2025

Looking at the source a bit closer, I noticed something that is kinda bad from an architecture PoV:

For void Platform_getBacktrace(Vector* frames, const BacktracePanel* panel, char** error) you are passing the BacktracePanel* panel to the platform code, which should not have any idea about any of the GUI stuff. I see this was added as BacktraceFrame* frame = BacktraceFrame_new(panel); wants the panel, when all it should require is the panel->process instead. Please update the BacktraceFrame structure to only reference the Process * it belongs to. Functions that need access to the actual panel or displayOptions can easily be provided with these parameters directly (AFAICS this should only involve code, that is called by the BacktracePanel itself anyway).

frame->offset = offset;
frame->isSignalFrame = unw_is_signal_frame(&cursor);

frame->functionName = xStrndup(procName, 2048);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to be verified: Does unw_get_proc_name always NUL-terminate the string even when the length of the procName string to print is exactly 2048 bytes?

This is to prevent a potential off-by-one error. And I don't like length coded as a magic number here (should have been using sizeof()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to be verified: Does unw_get_proc_name always NUL-terminate the string even when the length of the procName string to print is exactly 2048 bytes?

It's a good question. And I just checked. Yes, the unw_get_proc_name null terminates the string.

Moreover, writing a sizeof here is a good suggestion. Thank you!

Copy link
Contributor

@Explorer09 Explorer09 Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrCirdo According to my tracing just now:

  1. src/mi/Gget_proc_name.c: line 100 unw_get_proc_name calls
  2. src/mi/Gget_proc_name.c: line 49 unw_get_proc_name_by_ip

unw_get_proc_name_by_ip has two paths. If it goes to intern_string(src/mi/Gget_proc_name.c line 30), then we can ensure the string is null terminated. But there is another path that calls *(unw_accessors_t *)a->get_proc_name (a function pointer), and the path would go to...

  1. src/coredump/_UCD_accessors.c: line 35 _UCD_accessors.get_proc_name = _UCD_get_proc_name
  2. src/coredump/_UCD_get_proc_name.c: line 131 _UCD_get_proc_name calls
  3. src/coredump/_UCD_get_proc_name.c: line 83 elf_w (CD_get_proc_name) calls
  4. src/elfxx.c: line 593 elf_w (get_proc_name_in_image) calls
  5. src/elfxx.c: line 337 elf_w (lookup_symbol) calls
  6. src/elfxx.c: line 313 elf_w (lookup_symbol_callback) calls
  7. strncpy from libc (strncpy does not guarantee the string is always null terminated.)

EDIT: My mistake. I missed the line d->buf[d->buf_len - 1] = '\0'; just below elf_w (lookup_symbol_callback). So the string can be safely assumed to be null terminated.

if (unw_get_elf_filename(&cursor, elfFileName, sizeof(elfFileName), &offsetElfFileName) == 0) {
frame->objectPath = xStrndup(elfFileName, 2048);

char *lastSplash = strrchr(frame->objectPath, '/');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you mean "slash".

Suggested change
char *lastSplash = strrchr(frame->objectPath, '/');
char* lastSlash = strrchr(frame->objectPath, '/');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 😄 . Thank you!

@MrCirdo
Copy link
Contributor Author

MrCirdo commented Feb 21, 2025

For void Platform_getBacktrace(Vector* frames, const BacktracePanel* panel, char** error) you are passing the BacktracePanel* panel to the platform code, which should not have any idea about any of the GUI stuff. I see this was added as BacktraceFrame* frame = BacktraceFrame_new(panel); wants the panel, when all it should require is the panel->process instead. Please update the BacktraceFrame structure to only reference the Process * it belongs to. Functions that need access to the actual panel or displayOptions can easily be provided with these parameters directly (AFAICS this should only involve code, that is called by the BacktracePanel itself anyway).

The goal of passing a BacktracePanel into a BactraceFrame was to avoid using a static variable for display options (static variables can easily become a nightmare). My concern is how I can use some variables from BacktracePanel like printingHelper into BacktraceFrame_display. Do I have to use a static variable?

@BenBE
Copy link
Member

BenBE commented Feb 21, 2025

For void Platform_getBacktrace(Vector* frames, const BacktracePanel* panel, char** error) you are passing the BacktracePanel* panel to the platform code, which should not have any idea about any of the GUI stuff. I see this was added as BacktraceFrame* frame = BacktraceFrame_new(panel); wants the panel, when all it should require is the panel->process instead. Please update the BacktraceFrame structure to only reference the Process * it belongs to. Functions that need access to the actual panel or displayOptions can easily be provided with these parameters directly (AFAICS this should only involve code, that is called by the BacktracePanel itself anyway).

The goal of passing a BacktracePanel into a BactraceFrame was to avoid using a static variable for display options (static variables can easily become a nightmare). But here it should avoid a bad design.

Oversight on my side. The prototype for Object_display somewhat forces the frame to contain a reference to all additional data. So placing the reference into the frame itself is not the problem.

My point regarding architecture was publishing the BacktracePanel type to the platform code, thereby introducing some dependency of the platform code on the BacktracePanel implementation. All that the platform code should ever have to handle is platform objects like processes. As we are already casting quite a few things, passing the BacktracePanel* as Object* would reduce exposing this implementation detail inside the platform code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Completely new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backtrace feature similar to strace
4 participants