-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise allocations in AssertEx.NormalizeWhitespace
and OperationTreeVerifier.Verify
#75202
base: main
Are you sure you want to change the base?
Conversation
while (true) | ||
{ | ||
// Go to next preceding \r or \n (if any) | ||
i = buffer.AsSpan().Slice(0, i).LastIndexOfAny("\r\n".AsSpan()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: AsSpan(0, i)
(and other places)
expectedOperationTree = expectedOperationTree.Trim(newLineChars); | ||
expectedOperationTree = expectedOperationTree.Replace("\r\n", "\n").Replace(" \n", "\n").Replace("\n", Environment.NewLine); | ||
// Finds spaces before newlines and places them in the specified toRemove buffer in reverse index order | ||
static void FindSpacesBeforeNewlineHelper(char[] buffer, int bufferSize, ref int[] toRemove, ref int toRemoveCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wonder whether the complex dance is measurably better than doing manipulation in StringBuilder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaredpar does this PR measurably improve your scenario?
What would the allocations for StringBuilder.Replace
on .NET Framework be like @huoyaoyuan (which @jaredpar was concerned about iirc)? I'm not familiar with how it performs, as I usually am just .Append
ing on StringBuilder
s when I use them.
It seems like you'd at least have ~2x total string length in total, to store StringBuilder buffer & result string, whereas the current one is 3x for actual, and 4x for expected - so it seems like it could be at most ~1.8x better in the best scenario (scenario being the impl of .Replace
). Whereas this should be amortised allocation free (but obviously has a bit more complexity in implementation as a result). Also, if @jaredpar wanted to optimise other methods in the future for allocations too, a large amount of the code here could be extracted & re-used / minorly adjusted to make them amortised alloc-free too.
Edit: I suppose using StringBuilder could actually theoretically be more like ~3.5x better if you copy into an ArrayPool buffer instead of ToString
(again, depending on impl. of Replace).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaredpar does this PR measurably improve your scenario?
Going to be profiling this probably Tuesday. Monday tends to be pretty meeting heavy for me. Hoping to have time to profile this out tomorrow when I'm back working on tests.
if (pooled1 != null) | ||
{ | ||
ArrayPool<char>.Shared.Return(pooled1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this pattern but it's also feels a bit much to me. Too much manual work. I'm wondering if we should instead approach it with something like the following:
struct ArrayPoolValue<T> : IDisposable
{
private T[] _array;
public T[] T Array
{
get => _array;
set
{
if (_array is not null)
ArrayPool<T>.Shared.Return(_array);
_array = value;
}
}
public void Dispose()
{
if (_array is not null)
{
ArrayPool<T>.Shared.Return(_array);
_array = null;
}
}
}
Then we could change the signatures to take ref ArrayPoolValue<T> pool
. Consumption becomes simply:
using var pooled1 = new ArrayPool<char>;
@333fred for thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MemoryPool<T>
and IMemoryOwner<T>
does exactly that, but with overhead of extra abstraction through Memory<T>
.
Maybe we should introduce such helper in BCL because ArrayPool
usage with finally
is more and more now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I care that much. These helpers are presumably not touched much, and I don't think we care about ensuring it's always returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I don't think we care about ensuring it's always returned
No, it's not a correctness issue to not return it. Unless assertion failure is expected (e.g., if you have a bunch of tests like bool didError = false; try { ... calls Assert(); } catch { didError = true; } Assert.True(didError);
), there's not really any reason to worry about ensuring it imo.
Then we could change the signatures to take
ref ArrayPoolValue<T>
pool. Consumption becomes simply:
using var pooled1 = new ArrayPool<char>;
You can't pass using
variables by reference, we will very quickly run into this issue I think.
} | ||
|
||
internal static string NormalizeWhitespace(string input) | ||
internal static ReadOnlyMemory<char> NormalizeWhitespaceROM(ReadOnlySpan<char> input, out char[] pooledAlloc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do suffix on the name we'd prefer the longer form vs. the abbreviation. Overall though I think I'd prefer we just have the one overload that returns ReadOnlyMemory<char>
and then let the consumers do .Span
as needed. That is a bit of ceremony but it's one that we've already done in a few other places.
} | ||
|
||
return output.ToString(); | ||
// Return the valid part of our allocation | ||
if (pooledAlloc == null) return default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the style in roslyn requires that this be put on separate lines. There are a few instances of this in the PR that need attention.
int prev = 0; | ||
int next = input.IndexOfAny("\r\n".AsSpan()); | ||
if (next < 0) next = input.Length; | ||
while (true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through this I'm wondering if it should be broken up a bit. Basically should we have one method that iterates through a ReadOnlySpan<char>
and gives us back ReadOnlyMemory<char>
for the lines within it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure what you mean by this?
char[] newArr = ArrayPool<char>.Shared.Rent(allocSize + newChars); | ||
pooledAlloc.AsSpan().Slice(0, allocSize).CopyTo(newArr.AsSpan().Slice(0, allocSize)); | ||
ArrayPool<char>.Shared.Return(pooledAlloc); | ||
pooledAlloc = newArr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern of basically increasing the size of the pooled array is repeated a few times in the PR. Think we should factor this out into a helper method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean another inline method, or in some helper type? If you mean in a helper type, which one, or where would I create it?
} | ||
|
||
// Copy new text into buffer | ||
pooledAlloc.AsSpan().Slice(allocSize, spacesToAppend).Fill(' '); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pooledAlloc.AsSpan().Slice(allocSize, spacesToAppend).Fill(' '); | |
pooledAlloc.AsSpan(allocSize, spacesToAppend).Fill(' '); |
actual = actual.Replace(" \n", "\n").Replace(" \r", "\r"); | ||
expectedOperationTree = expectedOperationTree.Trim(newLineChars); | ||
expectedOperationTree = expectedOperationTree.Replace("\r\n", "\n").Replace(" \n", "\n").Replace("\n", Environment.NewLine); | ||
// Finds spaces before newlines and places them in the specified toRemove buffer in reverse index order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local functions go at the end of the method for our style.
} | ||
} | ||
|
||
// Emulate actualOperationTree.Trim(newLineChars).Replace(" \n", "\n").Replace(" \r", "\r") into actualBuffer with actualBufferSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I have the brain power to review this today, but I'm going to insist that we actually spell out the general algorithm going on here in more plain english.
They should now be amortised allocation free (in the assert succeed case). I have also removed pointless allocations for indentation I noticed in
OperationTreeVerifier
.Works by using char[] buffers from
ArrayPool<char>.Shared
(along withint[]
ones in some parts too), and checking the test condition manually (instead of via theAssert
type), then calling intoAssert
if it fails with a string from.ToString
. Buffers are returned after the assertion condition does not fail.Please let me know if there's any parts which don't follow style rules that I missed, or need more comments, etc. :)
/cc @jaredpar who wanted these optimised for allocations.