-
Notifications
You must be signed in to change notification settings - Fork 1.1k
core,server: report defrag metrics #5560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
af4a388
to
26b1361
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, small comments
src/core/page_usage_stats.h
Outdated
#include <mimalloc/types.h> | ||
|
||
#include <vector> | ||
#include "core/bloom.h" | ||
#include "server/tx_base.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what specifically do you need from tx_base? It just creates a two-way dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed the ShardId
typedef, but I have changed the map to just use the underlying type now.
src/core/page_usage_stats.cc
Outdated
UniquePages::UniquePages() | ||
: pages_scanned{MakeSBF()}, | ||
pages_marked_for_realloc{MakeSBF()}, | ||
pages_full{MakeSBF()}, | ||
pages_reserved_for_malloc{MakeSBF()}, | ||
pages_with_heap_mismatch{MakeSBF()}, | ||
pages_above_threshold{MakeSBF()} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be just a default constructor on FilterWithSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
src/core/page_usage_stats.h
Outdated
absl::btree_map<ShardId, ShardUsageSummary> shard_wide_summary; | ||
|
||
void Merge(CollectedPageStats&& other, ShardId shard_id); | ||
static CollectedPageStats Merge(CollectedPageStats* stats, size_t size, float threshold); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge(absl::Span<CollectedPageStats>, float...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to use span
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to take vector by rvalue ref to address #5560 (comment)
src/core/page_usage_stats.h
Outdated
std::string ToString() const; | ||
}; | ||
|
||
struct UniquePages { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this type can be hidden in PageUsage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to private type
src/core/page_usage_stats.cc
Outdated
std::vector<std::string> rows; | ||
rows.push_back(absl::StrFormat("Page usage threshold: %f", threshold * 100)); | ||
rows.push_back(absl::StrFormat("Pages scanned: %d", pages_scanned)); | ||
rows.push_back(absl::StrFormat("Pages marked for reallocation: %d", pages_marked_for_realloc)); | ||
rows.push_back(absl::StrFormat("Pages full: %d", pages_full)); | ||
rows.push_back(absl::StrFormat("Pages reserved for malloc: %d", pages_reserved_for_malloc)); | ||
rows.push_back( | ||
absl::StrFormat("Pages skipped due to heap mismatch: %d", pages_with_heap_mismatch)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could've been made a little simpler:
absl::StrAppend(&out, "Page scanned: ", pages_scanned, "\n");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
src/server/memory_cmd.cc
Outdated
if (auto* shard = EngineShard::tlocal(); shard) | ||
shard->DoDefrag(CollectPageStats::YES, threshold); | ||
|
||
CollectedPageStats results[shard_set->size()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually use heap arrays to not overflow the fiber stack (small) with large structures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to vector with initial size
The histogram dependency is used in page usage to store percentage of page usage in a histogram. Signed-off-by: Abhijat Malviya <abhijat@dragonflydb.io>
Store the page addresses in bloom filters to count unique pages per category. Also store the page usage (used block/capacity) in histogram, where the page usage is above threshold but would otherwise be fit for reallocation, ie it is not full, not a malloc target etc. Signed-off-by: Abhijat Malviya <abhijat@dragonflydb.io>
Collect and summarize defrag results from shards when command is run interactively. Signed-off-by: Abhijat Malviya <abhijat@dragonflydb.io>
Signed-off-by: Abhijat Malviya <abhijat@dragonflydb.io>
09c1a94
to
4fbcbfb
Compare
Signed-off-by: Abhijat Malviya <abhijat@dragonflydb.io>
4fbcbfb
to
a75a525
Compare
The stats collected during defragmentation are reported in this PR. The following counts are reported (summed up over all shards):
Additionally per shard, we report a histogram of pages which were not full but above threshold, the histogram is for p50, p90 and p99 (ie 50% were below or equal to X threshold and so on). This can guide the next defragment attempt, if the current attempt did not touch enough pages.
Internally hdr histogram is used to store the histogram, we also use bloom filter (SBF) to collect counts, because accurate counts are not necessary and storing hash sets of
uintptr_t
takes around 8 bytes per item whereas sbf uses much lower amount of memory with a sufficiently small false positive rate.Scanning/reporting is done only for the interactive
memory defragment
command, for the scheduled task we do not collect stats.