-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Perf. #933
base: cli_perf
Are you sure you want to change the base?
[WIP] Perf. #933
Conversation
Old Lion's Court Combat Replay
Generic updates
Moved Settings into PArserCommon to share it between cli and windowed. Changed the test dependencies to ParserCommon wich allows the tests to also target non windows frameworks. Updated the language version of subprojects to 10 for upcoming changes. Had to change the .editorconfig, because var declaraion was in violation of an arror rule. Probably a change in what is considered for the specific rule between versions. I really tried to make this work as one codebase, but VS _really_ doesnt wnat you to build two different executables form the same codebase. I again almost got it working, but in the end the Forms designer wouldnt load. If you ever wnat to attempt the unification you need to - change it to a target framework set instead of a specific one - set different output paths - set different intermediate paths before the actual project file gets laoded. This requires adding a Build.Directory.props (iirc the name) - splitting project packages and includes based on target framework (`<Compile Remove="" />` forms and resources in the generic version) - find a way to get the forms designer working
…ystem application, and it gets rid of a windows dependency.
Weissman: 34177
Weissman: 28670
probably not worth the improvement in itself, but this might give some foundataion also i probabaly missed a bunch taht wil lcome later Weissman: 13020
Weissman got worse somehow however: 14108
Weissman: 14099
No performance improvements to note, while this hould have more impact it might not be worth it.
Weissman is at 12889
weissman is at 11623
weissman: 11589
Just a short remark on how I arrived at the performance improvements: Some of them are obvious, but i never pull them out of thin air. If i'm uncertain about a change I first test out the generated code in sharplab. Is a comparison between the initialization code that gets generated for several comparable methods that just return a list of numbers, showing that for the readonly case the readonly span is vastly superior to all other variants |
…ead of List's Hard to say if this is worth it tbh. The improvement is highly dependent on the log.
GW2EIEvtcParser/EIData/CombatReplay/Decorations/Forms/DoughnutDecoration.cs
Outdated
Show resolved
Hide resolved
/// </summary> | ||
public static void CreateJSON(JsonLog jsonLog, Stream stream, bool indent) | ||
{ | ||
JsonSerializer.Serialize(stream, jsonLog, indent ? SerializerSettings.Indentent : SerializerSettings.Default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the serializer here also take into account the custom tuple converter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not, as the generator needs separate instructions per root type.
I did indeed forget to add converters for the JsonLog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new Tuple2ConverterFactory that replaces the old converter and added it to the JsonLog.
Quick question: why are there even two different json representations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One if officially documented and stable, the other one is unstable and is tailored in a way that matches the components of the html (it also does not contain data not exploited by the html, contrary to the documented JSON), so it changes anytime the components are moved around or reorganized.
Converters = [ typeof(IntTupleArrayConverter) ] | ||
)] | ||
[JsonSerializable(typeof(LogDataDto))] | ||
[JsonSerializable(typeof(AgentConnector.AgentConnectorDescriptor))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing we would not need those AgentConnector serializables if the GetConnected to API were to return a generic ConnectorDescriptor class with the others declared as JsonDerived? Just as a note to myself, trying to get used to System.Text.Json useage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In the process of clearing up the visibility question im about to add the following comment to that code :D :
[JsonSerializable(typeof(LogDataDto))]
//NOTE(Rennorb): The following types need to be manually added to the generator, because they are used as opaque object fields.
// The generator cannot know the types of those fields, and we want to avoid having to use slow reflection to find out the members of those arbitrary types.
// Specifying them here tells the generator to generate switches for these specific types whenever an opaque Object needs to be serialized.
[JsonSerializable(typeof(AgentConnector.AgentConnectorDescriptor))]
[JsonSerializable(typeof(PositionConnector.PositionConnectorDescriptor))]
[JsonSerializable(typeof(AgentFacingConnector.AgentFacingConnectorDescriptor))]
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should also mention that this is not required to use the System.Text.Json serializer, only if you use the generated (de)serializers. These are quite a bit faster because they don't use reflection to find fields, but you need to specify more data in cases that use inheritance (which using "object" as the base is just a special case of).
…e not needed and changed concrete decorations back to internal visibility baaron4#933 (review)
…phas added preprocesor macro /define (`DEBUG_EFFECTS`) from the debug functions to avoid building them into production binaries or normal debug builds baaron4#933 (review)
{ | ||
foreach (Segment segment in value.BuffChart) | ||
//TODO(Rennorb) @correctnes: there was a null check here, i have no clue why. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely a very agressive defensive programming, I don't see how BuffChart could contain null values.
Added it to JsonLog, and replaced the old converter on LogDataDto baaron4#933 (comment)
} | ||
return new List<BreakbarDamageEvent>(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the inconsistent behavior on the "out" of TryGets? Some put null when nothing is found, others a empty list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you are talking about the TryGetEffectEventsByGUID function; this did not change from the original implementation, which is here: https://github.com/baaron4/GW2-Elite-Insights-Parser/pull/933/files/8c9b50cc509cebc9413ebbdafee850160fe0c526#diff-d5e44689367f787448b7f0dfe2eee8beb2da1421aabd9d2e78d663e33736ba32L1306-L1320
and also returns null on failure.
Ah, i understand what you are talking about.
This is the result of me going back and forth between how to handle these functions:
On one hand the semantics of returning null are consistent to other Try functions, on the other hand its frankly incompetent to throw away a list thats already allocated just because "try functions should set the out param to null in the failure case", since most of the time you want to then create some new list and populate it with initial values. I will have to go over this again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on wether it should be null or an empty list but I would prefer the behavior to be consistent. If your preference is "we have already created the list, just put it on the out", that is perfectly acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea but this is exactly why they behave differently in the first place: one of them creates the list no matter what, and the other one filters an enumerable that may or may not exist to obtain the result.
Maybe some renaming in in order here, because "success" doesn't even make sense in the context of merging multiple lists. I will look into just dropping the "Try" prefix from some of these and see if that cleans it up.
/// <summary> | ||
/// Appends effect events by the given agent and effect GUID. | ||
/// </summary> | ||
public void AppendEffectEventsBySrcWithGUID(AgentItem agent, GUID effect, List<EffectEvent> effectEvents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the Append methods public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be completely honest the reason is because i don't believe in the OOP paradigm of encapsulation and therefore declare almost everything public in my own work. I tried doing a reasonable job with this, but that one must have slipped through. We can certainly make it private if that is preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so after having a second look at this:
Why shouldn't the append methods be public? The TryGet methods that currently use them are are just wrappers around a loop of append calls, so why shouldn't they share the same visibility? Sometimes one or the other method might be more convenient to call, why take away that choice from the consumer and pretend to know which one will be more usefull?
This is also the main reason why I have such issues with encapsulation: It assumes you as the library author know better what the consumer wants than the consumer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is quite the tangent, "To be completely honest the reason is because i don't believe in the OOP paradigm of encapsulation and therefore declare almost everything public in my own work" was a perfectly acceptable answer =D. The API does not expose something that is private nor allow something private to be modified. I saw that the method was only used locally, thus mu curiosity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have another look and find out if it would be sensible to use the append version in other places aswell, then report back before we resolve this.
} | ||
|
||
public override void SetBuffDistributionItem(BuffDistribution distribs, long start, long end, long buffID) | ||
if (_buffStackItem.IsExtension) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work properly, StackID and SeedSrc are the only attributes that are constant during a BuffStackItem's lifecycle, for the same reason Start, Duration, Src and TotalDuration have to be stored, so does IsExtension.
private readonly AgentItem _seedSrc; | ||
private readonly bool _isExtension; | ||
private long _totalDuration { get; } | ||
_buffStackItem = buffStackItem; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really worth keeping the whole buff stack item alive for the whole duration of the processing of the log just to have a pointer on SeedSrc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some argument to be made for keeping it around, because it might reduce the amount of individual small deallocations, but this is a remnant from me trying to just keep the item around and not copy all the fields (see comments below).
I reverted it to store individual fields.
public override long GetActualDuration() | ||
public override IEnumerable<AgentItem> GetSources() | ||
{ | ||
return [ _buffStackItem.Src ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should return _src
{ | ||
internal abstract class BuffSimulationItem : AbstractSimulationItem | ||
//NOTE(Rennorb): I changed the element to have start + end instead of start + duration to fix a bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fully honest don't quite remember. It had something to do with negative durations, but this section got chewed on quite a bit. Ive mainly kept it this way because the end time is the thing getting modified and read back for most operations, so it being represented by s+e makes it easier to reason about changes.
As the comment states it should not matter much which way this is represented, but most operations are simpler in this version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind the change, end being the attribute matches better the useage indeed. As you have observed, it was initially following arc's start + duration logic.
Without context, it seemed that you changed it to this in order to fix a bug that is currently present in master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea i get the concern. I don't think it is something on master, I'm confident it was something i introduced and subsequently fixed. I will look into it again before this gets finished.
https://github.com/baaron4/GW2-Elite-Insights-Parser/pull/933/files/8c9b50cc509cebc9413ebbdafee850160fe0c526#r1835685093 https://github.com/baaron4/GW2-Elite-Insights-Parser/pull/933/files/8c9b50cc509cebc9413ebbdafee850160fe0c526#r1835685586 https://github.com/baaron4/GW2-Elite-Insights-Parser/pull/933/files/8c9b50cc509cebc9413ebbdafee850160fe0c526#r1835684634
Weismann sais 11053
Weissman at 10897
|
||
//TODO(Rennorb) @cleanup: So ... why are these parametric points, but we still add null values between them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convenience mainly. It guarentees the same delta between elements and easily find start offsets.
In GetDistanceToTarget, the number of skipped elements on processed actor gives us immediately the start index on the stackCenter/commanderPosition lists.
Ok, this is taking way longer than expected. I'm also somewhat busy with other things right now, so I'll just open this pr for scrutinization.
This pr also:
GW2EIParser.tst\ParsingErrors.cs
andGW2EIParser.tst\ParsesWithoutError.cs
.I have not fully tested everything after the final rebase, so more commits might follow fixing changes from that.
This means it touches basically every file, but don't be afraid: just tell git to ignore whitespaces, that already cuts out 85% of the changes.
Regarding the numbers:
I should also mention that there are some changes like the the explicit naming of variables that can easily be changed back in case you prefer it that way, but for now i left everything the way i was working with it.
I will also paste some notes i wrote down during the whole process, and while you didn't ask for a code review i necessarily read a lot of the codebase so it also has some recommendations at the end:
Some notes on genral changes and preformance optimizations:
Liberal use of the
var
keyword:This allows easier itteraton, because of the following scenario:
If calculate needs to be changed to return a float, all usages also need to be changed because even just storing the temporary result now fails (on in this case even worse; truncates the result).
If the code was written as
Then
calculate
's return type can be changed freely without causing type errors on every usage. This ofcorse entails thatdo_somethign_else
also works with the new type, for for things like numerics and enumerable types this is usually the case.The only conceivable downside comes up in the case where the new return type does convert into the next input type, but does so unexpectedly. In practice this happens so rarely that it is vastly outweight by the flexibility gained for refactoring.
Replacing
List<T> myList = new List<T>();
(or similar) with justList<T> myList = new()
or preferablyvar myList = new List<T>()
in local contexts:This just makes hte code more readable. If hte type is already stated in the same line, there is no need to repeat it again.
Also again this makes it easier to refactor, because only one change is needed instead of two.
File level namespaces:
Namespaces can now be global for a file without needing to wrap the classes etc in braces and add an indentation level.
Replacing
new List<T>()
,new T[]
and similar with just[ ]
(and to a lesser extend[ ..other]
):This is a new C#12 feature that allows collection initialization regardless of the collection type. This has two main upsides, but unfortunately also one downside:
Array<T>.Empty
, which is a very efficient type to use in this scenario, as it only gets allocated once for the whole program. Using=> new List<T>()
instead allocates every time the function is called.List<T> newList = [..oldList];
specifically when oldList is the plainList<T>
type, but it does not yet work for evenIReadOnlyList
. For al other types this spread initialization will enumerate the collection without reserving space in the target first, so if you know the oldList length it is always better to explicitly reserve first before adding the elements.Replaced
Newtonsoft.Json
withSystem.Text.Json
:Dotnet now has its own json library, and its really good. It causes almost no allocations and is therefore very fast and memory efficient.
It also is a bit smarter with how numeric output is produced, and I would have expected this to also reduce the output size, but it somehow increased the final output by ~0.5% (after compression) for some reason.
Changed several interfaces from
IReadOnlyList
toIEnumerable
:This is relevant for collections that get filtered and transformed without being used as the list itself. In almost all cases it is better to not create an intermediate list, since it needs to be reserved and reallocated a bunch just to be consumed with another linq query again.
For all cases where the result is used multiple times however the enumerable should be collected first.
This is also quite dependent on the code that generates such a result. If that codereturns a reference to an existing list, we ofcourse just return that list. But if it generates a sequence we return the generator and don't create superfluous copies where it is not necessary.
Changed several abstract output types to concrete ones, specifically changed
IReadOnlyList<T>
to justList<T>
:This is relevant because in many cases functions were returning a readonly list that was created from a list, just to then be consumed by another function that would turn the ro list back into a normal list.
This "turning back" requires a complete copy of the whole list, and just plainly wastes cpu cycles. I only did this in cases where the List is constructed as part of the first function call. I chose different approaches in cases where it is part of an internally stored cache and therefore "shared" between all consumers.
Used structs instead of classes:
If there are lots of instances that get created of a specific type (e.g. Segment / Point) that are mostly readonly you want to have that be a struct. Structs don't need to be individually allocated but can live on the stack or in other datatypes, so it saves a lot of garbage collection and memory to have them be a struct.
Unfortunately there are som cases where a struct cannot be used, or is actually degrading performance. The biggest example are modifying list elements, as
list[3].value = 3
does not work if the generic type of the list is a struct type. The whole struct needs to be replaced.For such use cases there unfortunately is no good solution in net standard except implementing a list type with ref indexing (which im not going to do).
Implemented own sort algorithm (kinda):
I added a port of fluxsort to the repo, a stable sorting algorithm with restrictable and predictable memory usage. The implementation uses ArrayPooling where possible, but I went for speed over memory. The pooling is not ideal, but the best solution i could come up with for the port.
Initialized lists to known or estimated lengths to prevent reallocations.
Changed properties to fields wherever possible:
This is very easy. Properties bad. There is no point in them existing. They add an additional layer of function calls to everything and refactoring code is trivially easy. If you don't really need them, don't use them.
the only reason to have them is if you want to use the binding system of Winforms or WPF, but those are limited and should not be the default.
Recommendations and future work:
Code style / architecture:
//
instead of empty lines breaks git diffing and merging. I had to do multiple 300 lines+ manual diffs where only a few liens changed during the final rebase, just because they were all glued together.I also don't quite understand where this even comes form, as empty liens are used for visual and mental separation of code blocks. Gluing the blocks together with empty comments not only introduces more noise, but it doesn't serve the original purpose of spacing out conceptually different segments since your eye glaze over the "spaces" because they are not empty.
Abstract..
types to just the name. So instead ofAbstractDecoration
just call itDecoration
. There is no value in this prefix. Same goes forGeneric...
.Tests:
Performance:
Replace other small structures like Point3d with structs.