Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Perf. #933

Draft
wants to merge 82 commits into
base: cli_perf
Choose a base branch
from
Draft

[WIP] Perf. #933

wants to merge 82 commits into from

Conversation

SaculRennorb
Copy link
Contributor

@SaculRennorb SaculRennorb commented Nov 5, 2024

Ok, this is taking way longer than expected. I'm also somewhat busy with other things right now, so I'll just open this pr for scrutinization.

This pr also:

  • Adds some tests (without any logs, see GW2EIParser.tst\ParsingErrors.cs and GW2EIParser.tst\ParsesWithoutError.cs.
    I have not fully tested everything after the final rebase, so more commits might follow fixing changes from that.
  • Introduces Nullability. In general its a very good idea to use it, but i'm not sure if i will find the time to resolve the remaining related issues correctly.
  • updates to C#12
  • switches to file scoped namespaces
  • enabled global default usings

This means it touches basically every file, but don't be afraid: just tell git to ignore whitespaces, that already cuts out 85% of the changes.

Regarding the numbers:

  • This improves overall processing speed by 15% in average up to 30% for simulation heavy logs (cerus). I'm still working on better benchmarks
  • I have not measured memory usage yet, but it should be equal or less than before since i mostly just removed copies and streamed iterations.
  • For some unknown reason the output files are ~0.5% larger. I have not figured out why that is, but it could very well just be slight changes in compression efficiency because of changes in numeric output (we now omit tailing zeros after the dot)

I should also mention that there are some changes like the the explicit naming of variables that can easily be changed back in case you prefer it that way, but for now i left everything the way i was working with it.

I will also paste some notes i wrote down during the whole process, and while you didn't ask for a code review i necessarily read a lot of the codebase so it also has some recommendations at the end:

Some notes on genral changes and preformance optimizations:

  • Liberal use of the var keyword:
    This allows easier itteraton, because of the following scenario:

    int calculate(int a, int b) { /**/ }
    
    int result = calculate(1, 3);
    do_something_else(result);

    If calculate needs to be changed to return a float, all usages also need to be changed because even just storing the temporary result now fails (on in this case even worse; truncates the result).
    If the code was written as

    int calculate(int a, int b) { /**/ }
    
    var result = calculate(1, 3); // <-  var use here
    do_something_else(result);

    Then calculate's return type can be changed freely without causing type errors on every usage. This ofcorse entails that
    do_somethign_else also works with the new type, for for things like numerics and enumerable types this is usually the case.
    The only conceivable downside comes up in the case where the new return type does convert into the next input type, but does so unexpectedly. In practice this happens so rarely that it is vastly outweight by the flexibility gained for refactoring.

  • Replacing List<T> myList = new List<T>(); (or similar) with just List<T> myList = new() or preferably var myList = new List<T>() in local contexts:
    This just makes hte code more readable. If hte type is already stated in the same line, there is no need to repeat it again.
    Also again this makes it easier to refactor, because only one change is needed instead of two.

  • File level namespaces:
    Namespaces can now be global for a file without needing to wrap the classes etc in braces and add an indentation level.

  • Replacing new List<T>(), new T[] and similar with just [ ] (and to a lesser extend [ ..other]):
    This is a new C#12 feature that allows collection initialization regardless of the collection type. This has two main upsides, but unfortunately also one downside:

    • This can generate more specific types. for example, doing things like
      IEnumerable<T> get_elements() => [ ];
      generates access to Array<T>.Empty, which is a very efficient type to use in this scenario, as it only gets allocated once for the whole program. Using => new List<T>() instead allocates every time the function is called.
    • It makes it easier to refactor, since hte target type can change without the semantics having to change.
    • The downside is that
      var list = new List<T>();
      list.AddRange(readOnlyList);
      is still more efficient for anything that is not a list as the input type. There is an optimization specifically for List<T> newList = [..oldList]; specifically when oldList is the plain List<T> type, but it does not yet work for even IReadOnlyList. For al other types this spread initialization will enumerate the collection without reserving space in the target first, so if you know the oldList length it is always better to explicitly reserve first before adding the elements.
    • I also didn't do it in places where there probably should be a capacity set, but I either don't know it or haven't gotten to setting it.
  • Replaced Newtonsoft.Json with System.Text.Json:
    Dotnet now has its own json library, and its really good. It causes almost no allocations and is therefore very fast and memory efficient.
    It also is a bit smarter with how numeric output is produced, and I would have expected this to also reduce the output size, but it somehow increased the final output by ~0.5% (after compression) for some reason.

  • Changed several interfaces from IReadOnlyList to IEnumerable:
    This is relevant for collections that get filtered and transformed without being used as the list itself. In almost all cases it is better to not create an intermediate list, since it needs to be reserved and reallocated a bunch just to be consumed with another linq query again.
    For all cases where the result is used multiple times however the enumerable should be collected first.
    This is also quite dependent on the code that generates such a result. If that codereturns a reference to an existing list, we ofcourse just return that list. But if it generates a sequence we return the generator and don't create superfluous copies where it is not necessary.

  • Changed several abstract output types to concrete ones, specifically changed IReadOnlyList<T> to just List<T>:
    This is relevant because in many cases functions were returning a readonly list that was created from a list, just to then be consumed by another function that would turn the ro list back into a normal list.
    This "turning back" requires a complete copy of the whole list, and just plainly wastes cpu cycles. I only did this in cases where the List is constructed as part of the first function call. I chose different approaches in cases where it is part of an internally stored cache and therefore "shared" between all consumers.

  • Used structs instead of classes:
    If there are lots of instances that get created of a specific type (e.g. Segment / Point) that are mostly readonly you want to have that be a struct. Structs don't need to be individually allocated but can live on the stack or in other datatypes, so it saves a lot of garbage collection and memory to have them be a struct.
    Unfortunately there are som cases where a struct cannot be used, or is actually degrading performance. The biggest example are modifying list elements, as list[3].value = 3 does not work if the generic type of the list is a struct type. The whole struct needs to be replaced.
    For such use cases there unfortunately is no good solution in net standard except implementing a list type with ref indexing (which im not going to do).

  • Implemented own sort algorithm (kinda):
    I added a port of fluxsort to the repo, a stable sorting algorithm with restrictable and predictable memory usage. The implementation uses ArrayPooling where possible, but I went for speed over memory. The pooling is not ideal, but the best solution i could come up with for the port.

  • Initialized lists to known or estimated lengths to prevent reallocations.

  • Changed properties to fields wherever possible:
    This is very easy. Properties bad. There is no point in them existing. They add an additional layer of function calls to everything and refactoring code is trivially easy. If you don't really need them, don't use them.
    the only reason to have them is if you want to use the binding system of Winforms or WPF, but those are limited and should not be the default.

Recommendations and future work:

Code style / architecture:

  • Move graph generation out of the parser. In general the architecture of the library is very questionable. Generating a graph is not a parser task.
  • Clean up the comments: a lot of comments are just empty or pointless. Just remove empty ones, they are only noise when reading code.
  • Leave empty lines! The // instead of empty lines breaks git diffing and merging. I had to do multiple 300 lines+ manual diffs where only a few liens changed during the final rebase, just because they were all glued together.
    I also don't quite understand where this even comes form, as empty liens are used for visual and mental separation of code blocks. Gluing the blocks together with empty comments not only introduces more noise, but it doesn't serve the original purpose of spacing out conceptually different segments since your eye glaze over the "spaces" because they are not empty.
  • Rename Abstract.. types to just the name. So instead of AbstractDecoration just call it Decoration. There is no value in this prefix. Same goes for Generic....

Tests:

  • Add actual tests. At least one test of each encounter that gets special treatment, preferably additional ones for know edgecases. Each time a bug is reported and fixed add a regression test for that bug.
  • Add fuzztesting for the parser. This is a bit more complicated, but the evtc should easily be able to be simulated using something like https://github.com/nathaniel-bennett/winsharpfuzz.

Performance:

  • Compress the html css and html output in general.
  • Stream the html output.
  • Replace other small structures like Point3d with structs.

EliphasNUIT and others added 30 commits October 25, 2024 20:51
Old Lion's Court Combat Replay
Moved Settings into PArserCommon to share it between cli and windowed.
Changed the test dependencies to ParserCommon wich allows the tests to also target non windows frameworks.

Updated the language version of subprojects to 10 for upcoming changes.
Had to change the .editorconfig, because var declaraion was in violation of an arror rule. Probably a change in what is considered for the specific rule between versions.

I really tried to make this work as one codebase, but VS _really_ doesnt wnat you to build two different executables form the same codebase.
I again almost got it working, but in the end the Forms designer wouldnt load.
If you ever wnat to attempt the unification you need to
 - change it to a target framework set instead of a specific one
 - set different output paths
 - set different intermediate paths before the actual project file gets laoded. This requires adding a Build.Directory.props (iirc the name)
 - splitting project packages and includes based on target framework (`<Compile Remove="" />` forms and resources in the generic version)
 - find a way to get the forms designer working
…ystem application, and it gets rid of a windows dependency.
Weissman: 28670
probably not worth the improvement in itself, but this might give some foundataion
also i probabaly missed a bunch taht wil lcome later

Weissman: 13020
Weissman got worse somehow however: 14108
Weissman: 14099
No performance improvements to note, while this hould have more impact it might not be worth it.
@EliphasNUIT EliphasNUIT changed the base branch from master to cli November 5, 2024 08:42
@EliphasNUIT EliphasNUIT changed the base branch from cli to cli_perf November 5, 2024 09:36
@SaculRennorb
Copy link
Contributor Author

Just a short remark on how I arrived at the performance improvements:

Some of them are obvious, but i never pull them out of thin air. If i'm uncertain about a change I first test out the generated code in sharplab.

For example https://sharplab.io/#v2:D4AQTAjAsAUCAMACEEAsBuWIDMyyIGFEBvWRcxAGQEsBnAFwB5qA7egPkQHEBTegVRbUAjgFceAOQAKBAJIARWgAoAlGQqkYFbcgDs6nYgDaBwxQgAaRGCvYrqU9oC6mLRQC+p2QCUeAQwATAHkWABsATxoGZjZOXgEhMUkZBVoICFVTTTM9RwoTNxzyS2tbezzyF1NPQsRWeiMnbj5BEXFpOUUwTNrssxB9Wu0CovMrG0Q7RAchiiramu1fQJCIgGUABz8WGI5mhLbkztpsHu0+wwGK42uKAFYrADYrXSsADmv57Rr3IA==

Is a comparison between the initialization code that gets generated for several comparable methods that just return a list of numbers, showing that for the readonly case the readonly span is vastly superior to all other variants

…ead of List's

Hard to say if this is worth it tbh. The improvement is highly dependent on the log.
/// </summary>
public static void CreateJSON(JsonLog jsonLog, Stream stream, bool indent)
{
JsonSerializer.Serialize(stream, jsonLog, indent ? SerializerSettings.Indentent : SerializerSettings.Default);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the serializer here also take into account the custom tuple converter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not, as the generator needs separate instructions per root type.
I did indeed forget to add converters for the JsonLog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new Tuple2ConverterFactory that replaces the old converter and added it to the JsonLog.

Quick question: why are there even two different json representations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One if officially documented and stable, the other one is unstable and is tailored in a way that matches the components of the html (it also does not contain data not exploited by the html, contrary to the documented JSON), so it changes anytime the components are moved around or reorganized.

Converters = [ typeof(IntTupleArrayConverter) ]
)]
[JsonSerializable(typeof(LogDataDto))]
[JsonSerializable(typeof(AgentConnector.AgentConnectorDescriptor))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we would not need those AgentConnector serializables if the GetConnected to API were to return a generic ConnectorDescriptor class with the others declared as JsonDerived? Just as a note to myself, trying to get used to System.Text.Json useage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In the process of clearing up the visibility question im about to add the following comment to that code :D :

[JsonSerializable(typeof(LogDataDto))]
//NOTE(Rennorb): The following types need to be manually added to the generator, because they are used as opaque object fields.
// The generator cannot know the types of those fields, and we want to avoid having to use slow reflection to find out the members of those arbitrary types.
// Specifying them here tells the generator to generate switches for these specific types whenever an opaque Object needs to be serialized.
[JsonSerializable(typeof(AgentConnector.AgentConnectorDescriptor))]
[JsonSerializable(typeof(PositionConnector.PositionConnectorDescriptor))]
[JsonSerializable(typeof(AgentFacingConnector.AgentFacingConnectorDescriptor))]
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also mention that this is not required to use the System.Text.Json serializer, only if you use the generated (de)serializers. These are quite a bit faster because they don't use reflection to find fields, but you need to specify more data in cases that use inheritance (which using "object" as the base is just a special case of).

…e not needed

and changed concrete decorations back to internal visibility

baaron4#933 (review)
…phas

added preprocesor macro /define (`DEBUG_EFFECTS`) from the debug functions to avoid building them into production binaries or normal debug builds

baaron4#933 (review)
{
foreach (Segment segment in value.BuffChart)
//TODO(Rennorb) @correctnes: there was a null check here, i have no clue why.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely a very agressive defensive programming, I don't see how BuffChart could contain null values.

}
return new List<BreakbarDamageEvent>();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the inconsistent behavior on the "out" of TryGets? Some put null when nothing is found, others a empty list.

Copy link
Contributor Author

@SaculRennorb SaculRennorb Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you are talking about the TryGetEffectEventsByGUID function; this did not change from the original implementation, which is here: https://github.com/baaron4/GW2-Elite-Insights-Parser/pull/933/files/8c9b50cc509cebc9413ebbdafee850160fe0c526#diff-d5e44689367f787448b7f0dfe2eee8beb2da1421aabd9d2e78d663e33736ba32L1306-L1320

and also returns null on failure.

Ah, i understand what you are talking about.
This is the result of me going back and forth between how to handle these functions:
On one hand the semantics of returning null are consistent to other Try functions, on the other hand its frankly incompetent to throw away a list thats already allocated just because "try functions should set the out param to null in the failure case", since most of the time you want to then create some new list and populate it with initial values. I will have to go over this again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on wether it should be null or an empty list but I would prefer the behavior to be consistent. If your preference is "we have already created the list, just put it on the out", that is perfectly acceptable.

Copy link
Contributor Author

@SaculRennorb SaculRennorb Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea but this is exactly why they behave differently in the first place: one of them creates the list no matter what, and the other one filters an enumerable that may or may not exist to obtain the result.

Maybe some renaming in in order here, because "success" doesn't even make sense in the context of merging multiple lists. I will look into just dropping the "Try" prefix from some of these and see if that cleans it up.

/// <summary>
/// Appends effect events by the given agent and effect GUID.
/// </summary>
public void AppendEffectEventsBySrcWithGUID(AgentItem agent, GUID effect, List<EffectEvent> effectEvents)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the Append methods public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be completely honest the reason is because i don't believe in the OOP paradigm of encapsulation and therefore declare almost everything public in my own work. I tried doing a reasonable job with this, but that one must have slipped through. We can certainly make it private if that is preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so after having a second look at this:
Why shouldn't the append methods be public? The TryGet methods that currently use them are are just wrappers around a loop of append calls, so why shouldn't they share the same visibility? Sometimes one or the other method might be more convenient to call, why take away that choice from the consumer and pretend to know which one will be more usefull?

This is also the main reason why I have such issues with encapsulation: It assumes you as the library author know better what the consumer wants than the consumer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is quite the tangent, "To be completely honest the reason is because i don't believe in the OOP paradigm of encapsulation and therefore declare almost everything public in my own work" was a perfectly acceptable answer =D. The API does not expose something that is private nor allow something private to be modified. I saw that the method was only used locally, thus mu curiosity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have another look and find out if it would be sensible to use the append version in other places aswell, then report back before we resolve this.

}

public override void SetBuffDistributionItem(BuffDistribution distribs, long start, long end, long buffID)
if (_buffStackItem.IsExtension)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work properly, StackID and SeedSrc are the only attributes that are constant during a BuffStackItem's lifecycle, for the same reason Start, Duration, Src and TotalDuration have to be stored, so does IsExtension.

private readonly AgentItem _seedSrc;
private readonly bool _isExtension;
private long _totalDuration { get; }
_buffStackItem = buffStackItem;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worth keeping the whole buff stack item alive for the whole duration of the processing of the log just to have a pointer on SeedSrc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some argument to be made for keeping it around, because it might reduce the amount of individual small deallocations, but this is a remnant from me trying to just keep the item around and not copy all the fields (see comments below).
I reverted it to store individual fields.

public override long GetActualDuration()
public override IEnumerable<AgentItem> GetSources()
{
return [ _buffStackItem.Src ];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return _src

{
internal abstract class BuffSimulationItem : AbstractSimulationItem
//NOTE(Rennorb): I changed the element to have start + end instead of start + duration to fix a bug.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fully honest don't quite remember. It had something to do with negative durations, but this section got chewed on quite a bit. Ive mainly kept it this way because the end time is the thing getting modified and read back for most operations, so it being represented by s+e makes it easier to reason about changes.
As the comment states it should not matter much which way this is represented, but most operations are simpler in this version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the change, end being the attribute matches better the useage indeed. As you have observed, it was initially following arc's start + duration logic.
Without context, it seemed that you changed it to this in order to fix a bug that is currently present in master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i get the concern. I don't think it is something on master, I'm confident it was something i introduced and subsequently fixed. I will look into it again before this gets finished.


//TODO(Rennorb) @cleanup: So ... why are these parametric points, but we still add null values between them?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convenience mainly. It guarentees the same delta between elements and easily find start offsets.
In GetDistanceToTarget, the number of skipped elements on processed actor gives us immediately the start index on the stackCenter/commanderPosition lists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants