Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DisposeAsync hangs within DistributedApplicationTestingBuilder #7139

Closed
afscrome opened this issue Jan 17, 2025 · 5 comments · Fixed by #7474
Closed

DisposeAsync hangs within DistributedApplicationTestingBuilder #7139

afscrome opened this issue Jan 17, 2025 · 5 comments · Fixed by #7474
Assignees
Labels
area-app-testing Issues pertaining to the APIs in Aspire.Hosting.Testing
Milestone

Comments

@afscrome
Copy link
Contributor

I've been encountering a number of cases where our Tests appear to be done, yet the tests are still running in the test runner. Further investigation suggests this hanging is happening when disposing the app built through DistributedApplicationTestingBuilder. I've sometimes seen delays of several minutes here.

The criteria seem to be

  • Running within DistributedApplicationTestingBuilder (doesn't replicate using plain DistributedApplication.CreateBuilder)
  • Needs to have a resource with waits for another resource AND has it's own health check

A minimal repro for this is below . I expect the time between the two beeps to be negligible, however it is 10+ seconds.

AppHost:

var builder = DistributedApplication.CreateBuilder();

var failsToStart = builder.AddContainer("failsToStart", "does/not/exist");

var dependency = builder.AddExecutable("app", "pwsh", ".")
    .WaitFor(failsToStart)
    .WithHttpEndpoint()
    .WithHttpHealthCheck();

using var app = builder.Build();
app.Run();

Test:

    [Test]
    public async Task Test()
    {
        var appHost = await DistributedApplicationTestingBuilder.CreateAsync<Projects.AspireApp2_AppHost>();

        await using var app = await appHost.BuildAsync();

        try
        {
            await app.StartAsync().WaitAsync(TimeSpan.FromSeconds(1));
        }
        catch
        {
            Console.Beep();
            await app.DisposeAsync();
            Console.Beep();

        }
    }

Looking at console output from the test and you can see there is a ~13 second gap between Dependency resource 'initial' failed to start. and Failed to create resource app. Given the 1 second timeout in the above code, I'd expect this to be no more than that.

info: Aspire.Hosting.DistributedApplication[0]
      Aspire version: 9.0.0+01ed51919f8df692ececce51048a140615dc759d
info: Aspire.Hosting.DistributedApplication[0]
      Distributed application starting.
info: Aspire.Hosting.DistributedApplication[0]
      Application host directory is: S:\AspireApp2\AspireApp2.AppHost
fail: Aspire.Hosting.Dcp.dcpctrl.ContainerReconciler[0]
      could not create the container	{"Container": {"name":"initial-danpsbmy-621867ae"}, "Reconciliation": 3, "error": "object not found\ncontainer not found\ndocker command 'CreateContainer' returned with non-zero exit code 1: command output: Stdout: '' Stderr: 'invalid reference format\n'"}

info: AspireApp2.AppHost.Resources.app[0]
      1: 2025-01-17T13:22:01.1537805Z Waiting for resource 'initial' to enter the 'Running' state.
fail: AspireApp2.AppHost.Resources.app[0]
      2: 2025-01-17T13:22:14.9408991Z Dependency resource 'initial' failed to start.
fail: AspireApp2.AppHost.Resources.app[0]
      3: 2025-01-17T13:22:14.9432384Z Failed to create resource app
Aspire.Hosting.DistributedApplicationException: Dependency resource 'initial' failed to start.
         at Aspire.Hosting.ApplicationModel.ResourceNotificationService.WaitUntilHealthyAsync(IResource resource, IResource dependency, CancellationToken cancellationToken) in /_/src/Aspire.Hosting/ApplicationModel/ResourceNotificationService.cs:line 139
         at Aspire.Hosting.ApplicationModel.ResourceNotificationService.WaitForDependenciesAsync(IResource resource, CancellationToken cancellationToken) in /_/src/Aspire.Hosting/ApplicationModel/ResourceNotificationService.cs:line 257
         at Aspire.Hosting.DistributedApplicationBuilder.<>c.<<-ctor>b__32_0>d.MoveNext() in /_/src/Aspire.Hosting/DistributedApplicationBuilder.cs:line 174
      --- End of stack trace from previous location ---
         at Aspire.Hosting.Eventing.DistributedApplicationEventing.<>c__DisplayClass4_0`1.<<Subscribe>b__0>d.MoveNext() in /_/src/Aspire.Hosting/Eventing/DistributedApplicationEventing.cs:line 82
      --- End of stack trace from previous location ---
         at Aspire.Hosting.Eventing.DistributedApplicationEventing.PublishAsync[T](T event, EventDispatchBehavior dispatchBehavior, CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Eventing/DistributedApplicationEventing.cs:line 69
         at Aspire.Hosting.Dcp.ApplicationExecutor.<CreateExecutablesAsync>g__CreateResourceExecutablesAsyncCore|62_0(IResource resource, IEnumerable`1 executables, CancellationToken cancellationToken) in /_/src/Aspire.Hosting/Dcp/ApplicationExecutor.cs:line 1216
info: Aspire.Hosting.DistributedApplication[0]
      Distributed application started. Press Ctrl+C to shut down.

Interestingly this only happens through the test host - replace the app.Run() call in Program.cs with the following, and DisposeAsync is near instant

try
{
    await app.StartAsync().WaitAsync(TimeSpan.FromSeconds(1));
}
catch
{
    Console.Beep();
    await app.DisposeAsync();
    Console.Beep();
}

May also be some relationship with #7009 as they both share the criteria of mixing health checks with waitfor, although this issue does only reproduce under the test host.

@davidfowl davidfowl added the area-app-testing Issues pertaining to the APIs in Aspire.Hosting.Testing label Jan 17, 2025
@davidfowl davidfowl added this to the 9.1 milestone Jan 17, 2025
@ReubenBond
Copy link
Member

ReubenBond commented Jan 29, 2025

The testing host waits for the AppHost's thread to exit before DisposeAsync will complete, so if the AppHost's thread is stuck (eg, on StartAsync) then it will hang.

There is an internal race in lifecycle which can prevent container notifications from DCP reaching the AppHost once shutdown has begun.

We have some non-mutually-exclusive options to address this:

  • Add an internal timeout to DisposeAsync
  • Attempt to cancel the AppHost thread's StartAsync/StopAsync calls when the testing host's StopAsync/Dispose/DisposeAsync methods are called

That being said, does this arise in real-world scenarios? If so, could you elaborate for me?

@eerhardt eerhardt assigned eerhardt and unassigned eerhardt Jan 29, 2025
@ReubenBond ReubenBond added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 29, 2025
@ReubenBond ReubenBond self-assigned this Jan 29, 2025
@ReubenBond
Copy link
Member

@afscrome what do you think?

@afscrome
Copy link
Contributor Author

This error was uncovered by a dev on another team who asked for my help. So the tests they have likely better represent what a "real" developer would do, not the shienagans I've been known to get up to. This minimal repo was derived from that real world use case. Report from my colleague that I was asked to help investigate:

What they were experiencing was tests passing, but Visual Studio / Rider's test runners were hanging for minutes after the test passed/failed before you could run another test.

From debugging, it’s clear that the end of the test is reached after a short time (~10s) but then it takes minutes to actually finish and report a pass/failure.

Let me try and pull the code version up from when this was reported and more fully remind myself of the scenario.

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 31, 2025
@ReubenBond
Copy link
Member

Much appreciated, @afscrome!

@afscrome
Copy link
Contributor Author

Was a while ago now so I can't remember exactly which project and version it was in and I'm mixing up the various different issues we've encountered with the test host. I'll keep digging to remind myself of that case.

I have found one of the failure cases, although this behaves slightly differently in that it hangs forever:

Tests:

   private DistributedApplication? _app;
   private ResourceNotificationService? _resourceNotificationService;

   [SetUp]
   public async Task SetUp()
   {
      // Arrange
      var appHost = await DistributedApplicationTestingBuilder.CreateAsync<Projects.My_AppHost>();

      appHost.Services.ConfigureHttpClientDefaults(clientBuilder => { clientBuilder.AddStandardResilienceHandler(); });
      appHost.Services.AddLogging(x => x
         .AddNUnit()
         .AddFilter("Default", LogLevel.Information)
         .AddFilter("Microsoft.AspNetCore", LogLevel.Warning)
         .AddFilter("Aspire.Hosting.Dcp", LogLevel.Warning)
      );

      _app = await appHost.BuildAsync();
      _resourceNotificationService = _app.Services.GetRequiredService<ResourceNotificationService>();

      await _app.StartAsync()
         .WaitAsync(TimeSpan.FromMinutes(1));
   }

   [TearDown]
   public async Task TearDown()
   {
      if (_app != null)
      {
         await _app.StopAsync();
         await _app.DisposeAsync();
      }
   }

   [TestCase("resource")]
   [CancelAfter(120_000)]
   public async Task HttpResourceRootReturnsOkStatusCode(string resourceName, CancellationToken ct)
   {
      // Frankly contents of this test are irrelevant - it can be a no-op and the SetUp / TearDown combination will still hang foever

      // Arrange
      await _resourceNotificationService!
         .WaitForResourceAsync(resourceName, KnownResourceStates.Running, ct)
         .WaitAsync(TimeSpan.FromSeconds(30), ct);
      using var httpClient = _app!.CreateHttpClient(resourceName);

      // Act
      var response = await httpClient.GetAsync("/", ct);

      // Assert
      Assert.That(response.StatusCode, Is.EqualTo(HttpStatusCode.OK));
   }

App: I can't really share my app in a way that is usable externally, but essentially it boiled down to

  1. A resource with a health check that didn't go healthy
  2. Something depending on that resource

The following roughly simulates that:

var failsToGoHealthy = builder.AddRedis("failsToGoHealthy")
   .WithHttpEndpoint(targetPort: 1234)
   .WithHttpHealthCheck();

var dependency = builder.AddExecutable("dependency", "pwsh.exe", ".")
   .WaitFor(failsToGoHealthy);

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-app-testing Issues pertaining to the APIs in Aspire.Hosting.Testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants