Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(libsinsp): enable metrics collector on all platforms #1870

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mrgian
Copy link
Contributor

@mrgian mrgian commented May 16, 2024

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

fix(libsinsp): enable metrics collector on all platforms

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering whether we should tie available sinsp_stats_v2_collectors to eg: MINIMAL_BUILD (for example, container-related ones will always be 0 on MINIMAL_BUILD builds).
This should be as simple as adding a compilation guard around collector entries.

@@ -274,9 +272,11 @@ class libs_metrics_collector
uint32_t m_metrics_flags = METRICS_V2_KERNEL_COUNTERS | METRICS_V2_LIBBPF_STATS | METRICS_V2_RESOURCE_UTILIZATION | METRICS_V2_STATE_COUNTERS | METRICS_V2_PLUGINS;
std::vector<metrics_v2> m_metrics;

#ifdef __linux__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to move these in the scap_platform vtable, likely as a
struct scap_metrics_vtable (embedded in each scap_foo_platform), so that we could get platform-dependent metrics from the scap handle. Again, this might be an idea for a future refactor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had time to check out this PR but reading your comment @FedeDP I would like that. Especially since the scap refactor the CPU usage calculation is broken when only having a plugin source even when on Linux because we do not instantiate the agent info in that case which is used in the CPU usage calculation.

@mrgian mrgian changed the title [WIP] fix(libsinsp): enable metrics collector on all platforms fix(libsinsp): enable metrics collector on all platforms May 16, 2024
@mrgian mrgian marked this pull request as ready for review May 16, 2024 12:55
@poiana poiana requested a review from incertum May 16, 2024 12:56
@FedeDP
Copy link
Contributor

FedeDP commented May 16, 2024

Since we don't need this for the next release, i'd put this in the
/milestone 0.18.0

@poiana poiana added this to the 0.18.0 milestone May 16, 2024
@mrgian
Copy link
Contributor Author

mrgian commented May 16, 2024

I think we might want to move these in the scap_platform vtable, likely as a
struct scap_metrics_vtable (embedded in each scap_foo_platform), so that we could get platform-dependent metrics from the scap handle. Again, this might be an idea for a future refactor.

Ei @FedeDP make sense!
Since you moved this to the next milestone and we are not in hurry, I can take care of this :)

@incertum
Copy link
Contributor

I think we might want to move these in the scap_platform vtable, likely as a
struct scap_metrics_vtable (embedded in each scap_foo_platform), so that we could get platform-dependent metrics from the scap handle. Again, this might be an idea for a future refactor.

Ei @FedeDP make sense! Since you moved this to the next milestone and we are not in hurry, I can take care of this :)

Added this as item to falcosecurity/falco#3194 (comment).
Just to reiterate: If we could fix the agent info initialization for Linux for the plugin platform (see falcosecurity/falco#2821) -- it would be fantastic. For macOS and Windows CPU utilization and memory usage calculation would need to be new code, not sure if truly needed, WDYT?

@FedeDP
Copy link
Contributor

FedeDP commented May 17, 2024

If we could fix the agent info initialization for Linux for the plugin platform (see falcosecurity/falco#2821) -- it would be fantastic.

Agree!

For macOS and Windows CPU utilization and memory usage calculation would need to be new code, not sure if truly needed, WDYT?

I think it is interesting to expose those metric for osx and win too, but yes it's not high priority.

@incertum
Copy link
Contributor

@mrgian hope all is well, just wanted to kindly check in and ask what our current plan is to get out of the regression in our scap platforms approach? (falcosecurity/falco#2821) If we can have a proper refactor -- amazing. Else I would also support something more intermediate to ensure the next Falco release does not have this regression anymore. CC @FedeDP @leogr

Thanks in advance!

@mrgian
Copy link
Contributor Author

mrgian commented Jul 16, 2024

Hey @incertum
For now I'm just moving linux-specific metrics collection logic to the scap_platform vtable. So that we can use the scap handle to gather platform-dependent metrics. This will make libs_metrics_collector platform agnostic.
I'm not working on a proper refactor that will solve the regression, but if you have any idea for that please let me know!

@mrgian mrgian force-pushed the plugin-api-metrics-win-test branch from 7b2e258 to 73a97af Compare July 17, 2024 08:50
@poiana poiana added size/XXL and removed size/M labels Jul 17, 2024
@poiana
Copy link
Contributor

poiana commented Jul 17, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mrgian
Once this PR has been reviewed and has the lgtm label, please assign andreagit97 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mrgian mrgian marked this pull request as ready for review July 17, 2024 08:52
@poiana poiana requested a review from LucaGuerra July 17, 2024 08:54
@incertum
Copy link
Contributor

Hey @incertum For now I'm just moving linux-specific metrics collection logic to the scap_platform vtable. So that we can use the scap handle to gather platform-dependent metrics. This will make libs_metrics_collector platform agnostic. I'm not working on a proper refactor that will solve the regression, but if you have any idea for that please let me know!

Posted here falcosecurity/falco#2821 (comment)

@gnosek
Copy link
Contributor

gnosek commented Jul 23, 2024

Seems like this one and #2821 are intertwined :)

@mrgian, please take a look at my comment #1969 (comment) for some ideas about the future direction of libscap/libsinsp and scap_platform. IMO, let's move stuff out of libscap, not into (especially here: libscap doesn't care one bit about these metrics, they're purely for libsinsp use).

If you agree with that, then I think it's not a good idea to add more stuff to scap_platform. Instead, we can make metrics_collector a virtual base class (this is effectively what a scap_platform is) and move the concrete implementation to e.g. userspace/libsinsp/linux/metrics_collector.cpp.

Then, we have two options for the consumers of the metrics:

  • provide a no-op userspace/libsinsp/generic/metrics_collector.cpp for other platforms so that we always have some metrics collector, or
  • add a #define (presumably via cmake) that says we do have a metrics_collector for this platform and #ifdef on that, rather than __linux__

(I'd rather go for 1, personally)

One thing to bikeshed would be the directory structure (it's trivial here, but it will set precedent for future per-platform components). I see two approaches:

libsinsp/linux/metrics_collector.cpp:

  • (good) we can build e.g. sinsp_linux.a from the whole libsinsp/linux directory, simplifying the build system a little
  • (bad) the API header would have to live directly in libsinsp/

libsinsp/metrics_collector/linux_metrics_collector.cpp:

  • (good) provides a nice place for a platform-agnostic header with the base class definition
  • (bad) platform-specific code is spread across directories, making it a bit less convenient to create common per-platform helpers (would have to live in something like libsinsp/linux/common.h)

I don't have a strong opinion on this either way tbh.

@mrgian
Copy link
Contributor Author

mrgian commented Jul 23, 2024

Ehi @gnosek
I see now. I agree on keeping the metrics collection logic out of the scap_platform.
Also the scap_platform it's plain-C, this can make collecting other kinds of metrics harder.

Instead, we can make metrics_collector a virtual base class

If I'm not wrong, currently libs_resource_utilization (https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/metrics_collector.h#L271-L300) is the only class with linux-only code.
A similar solution would be making libs_resource_utilization a virtual class (with platform-specific implementations).

As you said, taking a decision on the directory naming will influence future components development, so I'll wait to know what the maintainers think.

@mrgian mrgian marked this pull request as draft July 23, 2024 09:57
Copy link

Perf diff from master - unit tests

     1.43%     +2.44%  [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
     5.31%     -1.20%  [.] next
     9.09%     +0.92%  [.] sinsp_parser::reset
     0.91%     +0.91%  [.] sinsp::fetch_next_event
     8.07%     -0.84%  [.] sinsp::next
     4.53%     -0.74%  [.] sinsp_evt::load_params
     0.75%     -0.65%  [.] sinsp_split[abi:cxx11]
     1.26%     -0.58%  [.] sinsp_parser::event_cleanup
     4.81%     -0.56%  [.] sinsp_parser::process_event
     0.92%     +0.46%  [.] scap_event_encode_params_v

Perf diff from master - scap file

    10.09%     -4.18%  [.] sinsp_filter_check_thread::extract_single
    10.04%     -2.73%  [.] next
     6.73%     +2.46%  [.] sinsp_filter_check::rawval_to_string
     9.63%     -2.34%  [.] 0x00000000000a76b4
    13.34%     +1.76%  [.] sinsp_filter_check::extract_nocache
     3.17%     +1.31%  [.] sinsp_filter_check::tostring
     6.72%     -0.77%  [.] sinsp_filter_check::get_transformed_field_info
     3.35%     -0.48%  [.] libsinsp::runc::match_container_id
     3.37%     -0.35%  [.] sinsp_evt::get_param_as_str
     3.37%     -0.34%  [.] std::_Hashtable<long, std::pair<long const, std::shared_ptr<sinsp_threadinfo> >, std::allocator<std::pair<long const, std::shared_ptr<sinsp_threadinfo> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node

Heap diff from master - unit tests

total runtime: -0.13s.
calls to allocation functions: -33334 (258403/s)
temporary memory allocations: 247 (-1914/s)
peak heap memory consumption: -36B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

total runtime: -0.01s.
calls to allocation functions: -1906 (272285/s)
temporary memory allocations: 7 (-1000/s)
peak heap memory consumption: 152B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Copy link

codecov bot commented Jul 23, 2024

Codecov Report

Attention: Patch coverage is 5.30973% with 107 lines in your changes missing coverage. Please review.

Project coverage is 50.76%. Comparing base (4e3aebe) to head (cd54b89).
Report is 10 commits behind head on master.

Files Patch % Lines
userspace/libscap/linux/scap_linux_platform.c 1.09% 90 Missing ⚠️
userspace/libscap/scap_platform.c 7.14% 13 Missing ⚠️
userspace/libsinsp/metrics_collector.cpp 42.85% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1870      +/-   ##
==========================================
- Coverage   50.95%   50.76%   -0.19%     
==========================================
  Files         310      310              
  Lines       39540    39552      +12     
  Branches    17208    17324     +116     
==========================================
- Hits        20146    20078      -68     
- Misses      14354    14450      +96     
+ Partials     5040     5024      -16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@incertum
Copy link
Contributor

If I'm not wrong, currently libs_resource_utilization (https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/metrics_collector.h#L271-L300) is the only class with linux-only code.

Confirmed.

As you said, taking a decision on the directory naming will influence future components development, so I'll wait to know what the maintainers think.

Also don't have any preference. Maybe go with what @gnosek deems slightly better, because Grzeg has been around the block for some time and I get all the callouts. The ifdefs were a good solution to get these metrics going. Now we can finally get it right. By now 4+ folks already refactored the libs metrics collector, so there is hope that we will stabilize that code at some point 🙃 .

@FedeDP
Copy link
Contributor

FedeDP commented Aug 27, 2024

Any news on this @mrgian ?

@mrgian
Copy link
Contributor Author

mrgian commented Aug 27, 2024

Ei @FedeDP
Not yet!
We decided to refactor this again :( and currently I'm busy with other tasks
So I don't think this will make it in the next release, but I will start working on this as soon as I can

@FedeDP
Copy link
Contributor

FedeDP commented Aug 27, 2024

Ok! Moving to next milestone then :)
/milestone 0.19.0

@poiana poiana modified the milestones: 0.18.0, 0.19.0 Aug 27, 2024
Signed-off-by: Gianmatteo Palmieri <[email protected]>
@mrgian mrgian force-pushed the plugin-api-metrics-win-test branch from cd54b89 to 5a9a8c0 Compare October 2, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants