Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degraded Performance for Long-Lived Containers #889

Open
Kokotewa opened this issue Jul 6, 2024 · 1 comment
Open

Degraded Performance for Long-Lived Containers #889

Kokotewa opened this issue Jul 6, 2024 · 1 comment

Comments

@Kokotewa
Copy link

Kokotewa commented Jul 6, 2024

I've observed this issue for some time, but have only recently decided to post about it as I've exhausted all of my leads. As summarized within the title, after a period of time, my containerized MSSQL instance degrades in performance. Ideally, I'd like to identify, and correct, this behavior. At worst, I'm hopeful I can find others who have encountered this behavior.

My Deployment//Configuration

I've deployed the mssql container on an Azure Standard_B1ms VM (Ubuntu 20.04) using Docker. The current image is a historical version of 2019-latest, however I've successfully reproduced with 2017 as well. There is nothing "special" about the deployment, it is "out of the box".

docker image ls
mcr.microsoft.com/mssql/server   2019-latest    b5316516906d   7 months ago    1.47GB
mcr.microsoft.com/mssql/server   2017-latest    19d34f261156   11 months ago   1.33GB

The compose file is fairly minimal. All database connections are local through the overlay network.

Minimal Compose File

  mssql:
    image: mcr.microsoft.com/mssql/server:2019-latest
    logging:
      driver: "local"
      options:
        max-size: "10m"
        max-file: "1"
        compress: "false"
    environment:
      ACCEPT_EULA: 'Y'
    volumes:
      - "./volumes/mssql:/var/opt/mssql"

I acknowledge that this is a small VM SKU, however it is "bored" with CPU credits and memory to spare. The entirety of the MSSQL docker volume could fit in memory with space to spare.
MSSQL Volume Disk Usage

du -d1 -h volumes/
501M    volumes/mssql

htop (cpu and memory within limits)

  CPU[|||||||||||||||||||||||||||||                                                                                           22.5%]   Tasks: 61, 321 thr; 1 running
  Mem[||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||1.47G/1.86G]   Load average: 0.77 0.55 0.43
  Swp[                                                                                                                        0K/0K]   Uptime: 38 days, 22:28:28

    PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
1618939 10001      20   0 5162M  667M 12204 S 21.2 35.1  2h41:11 /opt/mssql/bin/sqlservr
1619033 10001      20   0 5162M  667M 12204 S 13.9 35.1  1h44:21 /opt/mssql/bin/sqlservr
1619053 10001      20   0 5162M  667M 12204 S  4.0 35.1 20:18.70 /opt/mssql/bin/sqlservr
1619058 10001      20   0 5162M  667M 12204 S  2.6 35.1  0:40.83 /opt/mssql/bin/sqlservr
1639066 10001      20   0 5162M  667M 12204 S  2.0 35.1  0:03.08 /opt/mssql/bin/sqlservr

Excess Azure CPU Credits
image

My Symptoms

My (web) application interacts through MSSQL exclusively through stored procedures. The web stack consists of nginx and php-fpm containers and the associated performance tests inherent to my framework are a convenient way to illustrate the performance drop. Each row indicates the end-to-end execution time for an individual stored procedure (SP), with the exception of mssql_connect which indicates the time-taken to retrieve a connection from the connection pool.

Nginx+PHPFPM; uptime=13 hours

all	725.5 ms
msql_connect	0.1 ms
eft_names	15.3 ms
skl_names	14.6 ms
itm_parameters	56.0 ms
itm_catalyst	9.1 ms
itm_combil	12.2 ms
itm_healinc_to_itm	4.7 ms
itm_healinc_from_itm	3.2 ms
itm_healinc_from_combil	13.2 ms
itm_merc	4.0 ms
itm_make	0.0 ms
itm_proc	6.3 ms
itm_summon	0.0 ms
itm_cutepet_armor	6.5 ms
itm_cutepet_food	3.5 ms
itm_cutepet_tame	12.5 ms
itm_cutepet_egg	19.9 ms
itm_craft_enchant	7.8 ms
itm_craft_archer	15.2 ms
itm_craft_alchemist	23.1 ms
itm_craft_blacksmith	15.9 ms
itm_craft_other	27.8 ms
itm_craft_chef	34.1 ms
itm_craft_npc	40.1 ms
itm_shops	5.2 ms
itm_from_proc_itm	4.5 ms
itm_from_proc_combil	13.6 ms
itm_from_itm_make	72.4 ms
itm_from_mon	161.4 ms
itm_from_mvp	123.3 ms

Nginx+PHPFPM; uptime=5 minutes

all	31.0 ms
msql_connect	0.1 ms
eft_names	0.9 ms
skl_names	7.6 ms
itm_parameters	6.2 ms
itm_catalyst	0.4 ms
itm_combil	0.5 ms
itm_healinc_to_itm	0.4 ms
itm_healinc_from_itm	0.3 ms
itm_healinc_from_combil	0.4 ms
itm_merc	0.8 ms
itm_make	0.0 ms
itm_proc	0.4 ms
itm_summon	0.0 ms
itm_cutepet_armor	0.3 ms
itm_cutepet_food	0.4 ms
itm_cutepet_tame	0.3 ms
itm_cutepet_egg	0.3 ms
itm_craft_enchant	0.3 ms
itm_craft_archer	0.4 ms
itm_craft_alchemist	0.4 ms
itm_craft_blacksmith	0.4 ms
itm_craft_other	0.5 ms
itm_craft_chef	0.5 ms
itm_craft_npc	2.5 ms
itm_shops	0.3 ms
itm_from_proc_itm	0.4 ms
itm_from_proc_combil	0.4 ms
itm_from_itm_make	4.3 ms
itm_from_mon	0.6 ms
itm_from_mvp	0.4 ms

These results are reproducible through the MSSQL Management Studio client as captured with set statistics time. By removing nginx and phpfpm from the process, I'm fairly confident the issue lies with MSSQL alone.
MSSQL Management Studio; uptime=13 hours

set statistics time on 
EXEC	[itm_from_mon]
...
set statistics time off

SQL Server parse and compile time: 
   CPU time = 86 ms, elapsed time = 104 ms.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 SQL Server Execution Times:
   CPU time = 5 ms,  elapsed time = 11 ms.
 SQL Server Execution Times:
   CPU time = 91 ms,  elapsed time = 115 ms.
Completion time: 2024-07-05T21:49:11.8869818-04:00

MSSQL Management Studio; uptime=5 minutes

...
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.
Completion time: 2024-07-05T21:56:58.8489065-04:00

Kicking MSSQL, and no other applications/services, temporarily resolves the issue. However, eventually, it'll resurface until MSSQL is kicked again. I'd really like to know what is causing this and if it can be resolved, as I've tried (and failed) and I need closure. This is what I've investigated so far...

  • MSSQL plans - as far as I can tell, the execution plans for the "slow" queries and the "fast" queries are identical
  • DBCC DROPCLEANBUFFERS - has no effect
  • DBCC FREEPROCCACHE - has no effect
  • potentially a few others, although the methods immediately escape me
@Kokotewa
Copy link
Author

Kokotewa commented Aug 8, 2024

I've adopted a workaround, i.e. I've migrated from MSSQL to Azure SQL Edge. I've been unable to reproduce on the new application and it has a lower resource footprint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant