Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[percona] update ps-80 version #17310

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adivinho
Copy link
Contributor

@adivinho adivinho commented Aug 6, 2024

No description provided.

@adivinho adivinho requested a review from a team as a code owner August 6, 2024 13:58
Copy link

github-actions bot commented Aug 6, 2024

Diff for 659fade:
diff --git a/_bashbrew-cat b/_bashbrew-cat
index 35d4c8b..dac347a 100644
--- a/_bashbrew-cat
+++ b/_bashbrew-cat
@@ -12,8 +12,8 @@ GitCommit: b89efa5f100edacc0ef660cef37975acfaf67326
 Directory: percona-server-5.7
 File: Dockerfile-dockerhub
 
-Tags: 8.0.36-28-centos, 8.0-centos, 8-centos, 8.0.36-28, 8.0, 8, ps-8.0.36-28, ps-8.0, ps-8
-GitCommit: 02252c71b5fcf2d2235f7ec0d81940d4f2c45b64
+Tags: 8.0.37-29-centos, 8.0-centos, 8-centos, 8.0.37-29, 8.0, 8, ps-8.0.37-29, ps-8.0, ps-8
+GitCommit: efe9f2a1ebaf673d199c5322c4abf69b782008f6
 Directory: percona-server-8.0
 
 Tags: psmdb-4.2.24, psmdb-4.2
diff --git a/_bashbrew-list b/_bashbrew-list
index b19b65a..c16bc85 100644
--- a/_bashbrew-list
+++ b/_bashbrew-list
@@ -12,8 +12,8 @@ percona:8
 percona:8-centos
 percona:8.0
 percona:8.0-centos
-percona:8.0.36-28
-percona:8.0.36-28-centos
+percona:8.0.37-29
+percona:8.0.37-29-centos
 percona:centos
 percona:psmdb-4.2
 percona:psmdb-4.2.24
@@ -30,4 +30,4 @@ percona:ps-5.7
 percona:ps-5.7.44
 percona:ps-8
 percona:ps-8.0
-percona:ps-8.0.36-28
+percona:ps-8.0.37-29
diff --git a/percona_ps-8/Dockerfile b/percona_ps-8/Dockerfile
index 3ccf519..20279c3 100644
--- a/percona_ps-8/Dockerfile
+++ b/percona_ps-8/Dockerfile
@@ -16,16 +16,15 @@ RUN set -ex; \
     useradd -u 1001 -r -g 1001 -s /sbin/nologin \
         -m -c "Default Application User" mysql
 
-ENV PS_VERSION 8.0.36-28.1
-ENV MYSQL_SHELL_VERSION 8.0.36-1
+ENV PS_VERSION 8.0.37-29.1
+ENV MYSQL_SHELL_VERSION 8.0.37-1
 ENV OS_VER el9
 ENV FULL_PERCONA_VERSION "$PS_VERSION.$OS_VER"
 ENV FULL_MYSQL_SHELL_VERSION "$MYSQL_SHELL_VERSION.$OS_VER"
 ENV PS_REPO release
-ENV PS_TELEMETRY_VERSION 8.0.36-28-1
+ENV PS_TELEMETRY_VERSION 8.0.37-29-1
 ENV CALL_HOME_DOWNLOAD_SHA256 5e84d2f1a5d57f44c46e6a1f16794d649d3de09fe8021f0294bc321c89e51068
 ENV CALL_HOME_VERSION 0.1
-
 # Do not report during Docker image creation.
 # Note that doing so, would create telemetry config file
 # which would prevent reporting when new container is started.
@@ -36,10 +35,10 @@ ARG PERCONA_TELEMETRY_DISABLE=1
 # check repository package signature in secure way
 RUN set -ex; \
     export GNUPGHOME="$(mktemp -d)"; \
-    gpg --batch --keyserver keyserver.ubuntu.com --recv-keys 430BDF5C56E7C94E848EE60C1C4CBDCDCD2EFD2A 99DB70FAE1D7CE227FB6488205B555B38483C65D; \
-    gpg --batch --export --armor 430BDF5C56E7C94E848EE60C1C4CBDCDCD2EFD2A > ${GNUPGHOME}/RPM-GPG-KEY-Percona; \
+    gpg --batch --keyserver keyserver.ubuntu.com --recv-keys 4D1BB29D63D98E422B2113B19334A25F8507EFA5 99DB70FAE1D7CE227FB6488205B555B38483C65D; \
+    gpg --batch --export --armor 4D1BB29D63D98E422B2113B19334A25F8507EFA5 > ${GNUPGHOME}/PERCONA-PACKAGING-KEY; \
     gpg --batch --export --armor 99DB70FAE1D7CE227FB6488205B555B38483C65D > ${GNUPGHOME}/RPM-GPG-KEY-centosofficial; \
-    rpmkeys --import ${GNUPGHOME}/RPM-GPG-KEY-Percona ${GNUPGHOME}/RPM-GPG-KEY-centosofficial; \
+    rpmkeys --import ${GNUPGHOME}/PERCONA-PACKAGING-KEY ${GNUPGHOME}/RPM-GPG-KEY-centosofficial; \
     curl -Lf -o /tmp/percona-release.rpm https://repo.percona.com/yum/percona-release-latest.noarch.rpm; \
     rpmkeys --checksig /tmp/percona-release.rpm; \
     rpm -i /tmp/percona-release.rpm; \
@@ -47,6 +46,7 @@ RUN set -ex; \
     rpm --import /etc/pki/rpm-gpg/PERCONA-PACKAGING-KEY; \
     percona-release disable all; \
     percona-release enable ps-80 ${PS_REPO}; \
+    percona-release enable telemetry ${PS_REPO}; \
     percona-release enable mysql-shell ${PS_REPO}
 
 RUN set -ex; \
@@ -64,6 +64,7 @@ RUN set -ex; \
         curl \
         glibc \
         libnghttp2 \
+        openssh \
         python3; \
     \
     dnf -y install \
@@ -108,6 +109,17 @@ RUN set -eux; \
 ENV CALL_HOME_OPTIONAL_PARAMS=" -s ${OS_VER}"
 
 COPY ps-entry.sh /docker-entrypoint.sh
+COPY telemetry-agent-supervisor.sh /usr/bin/
+RUN set -ex; \
+    chown  mysql /usr/bin/telemetry-agent-supervisor.sh; \
+    chown  mysql /usr/bin/percona-telemetry-agent; \
+    chown  mysql /usr/local/percona/telemetry/history; \
+    chmod ug+rwx /usr/bin/telemetry-agent-supervisor.sh; \
+    chmod -R go+w /var/log/percona
+ENV PERCONA_TELEMETRY_CHECK_INTERVAL=86400
+ENV PERCONA_TELEMETRY_HISTORY_KEEP_INTERVAL=604800
+ENV PERCONA_TELEMETRY_RESEND_INTERVAL=60
+ENV PERCONA_TELEMETRY_URL=https://check.percona.com/v1/telemetry/GenericReport
 ENTRYPOINT ["/docker-entrypoint.sh"]
 
 USER mysql
diff --git a/percona_ps-8/ps-entry.sh b/percona_ps-8/ps-entry.sh
index 8b92cf5..a7fe81b 100755
--- a/percona_ps-8/ps-entry.sh
+++ b/percona_ps-8/ps-entry.sh
@@ -248,7 +245,9 @@ else
   CALL_HOME_OPTIONAL_PARAMS+=" -c 2"
 fi
 
-# PERCONA_TELEMETRY_DISABLE is handled at the very beginning of call-home.sh
-/call-home.sh -f "PRODUCT_FAMILY_PS" -v "${PS_TELEMETRY_VERSION}" -d "DOCKER" ${CALL_HOME_OPTIONAL_PARAMS} &> /dev/null || :
-
-exec "$@"
+if [[ ${PERCONA_TELEMETRY_DISABLE} -ne "0" ]]; then
+  exec "$@" --percona_telemetry_disable=1
+else
+  /usr/bin/telemetry-agent-supervisor.sh &
+  exec "$@"
+fi
diff --git a/percona_ps-8/telemetry-agent-supervisor.sh b/percona_ps-8/telemetry-agent-supervisor.sh
new file mode 100644
index 0000000..6ad026e
--- /dev/null
+++ b/percona_ps-8/telemetry-agent-supervisor.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+
+# phase-0 telemetry
+/call-home.sh -f "PRODUCT_FAMILY_PS" -v "${PS_TELEMETRY_VERSION}" -d "DOCKER" ${CALL_HOME_OPTIONAL_PARAMS} &> /dev/null || :
+
+# phase-1 telemetry
+for i in {1..3}; do
+    /usr/bin/percona-telemetry-agent >> /var/log/percona/telemetry-agent.log 2>> /var/log/percona/telemetry-agent-error.log
+    if [ $? -eq 0 ]; then
+      break
+    fi
+    sleep 5
+done
+sleep infinity

Relevant Maintainers:

@whalelines
Copy link
Contributor

What sets PERCONA_TELEMETRY_DISABLE to 0 such that the test in this added conditional would work?

-# PERCONA_TELEMETRY_DISABLE is handled at the very beginning of call-home.sh
-/call-home.sh -f "PRODUCT_FAMILY_PS" -v "${PS_TELEMETRY_VERSION}" -d "DOCKER" ${CALL_HOME_OPTIONAL_PARAMS} &> /dev/null || :
-
-exec "$@"
+if [[ ${PERCONA_TELEMETRY_DISABLE} -ne "0" ]]; then
+  exec "$@" --percona_telemetry_disable=1
+else
+  /usr/bin/telemetry-agent-supervisor.sh &
+  exec "$@"
+fi

Why is it necessary for the telemetry script to sleep infinity rather than just exit?

@adivinho
Copy link
Contributor Author

adivinho commented Aug 8, 2024

What sets PERCONA_TELEMETRY_DISABLE to 0 such that the test in this added conditional would work?

The data collection may be disabled via setting an environment variable PERCONA_TELEMETRY_DISABLE=1 during Linux package installation or Docker container startup.

For example, on a Docker container (this action disables the continuous telemetry part as well):
docker run -e PERCONA_TELEMETRY_DISABLE=1 percona/percona-server

@adivinho
Copy link
Contributor Author

adivinho commented Aug 8, 2024

Why is it necessary for the telemetry script to sleep infinity rather than just exit?

https://github.com/percona/telemetry-agent?tab=readme-ov-file#continuous-telemetry

@whalelines
Copy link
Contributor

To clarify my questions a bit more:

The telemetry-agent-supervisor.sh calls two commands, redirecting the stdout and stderr to files. Since these commands are not started in the background, presumably they are either short-lived processes or fork and exit on their own. With that being the case, why is it necessary for the telemetry-agent-supervisor.sh script to remain running?

The conditional for the value of the PERCONA_TELEMETRY_DISABLE variable explicitly checks if the value is "0". I see nothing that sets the value explicitly to "0". As such, when will that conditional ever be true, short of the user setting it explicitly to zero? The documentation indicates you should set it to one to disable it, but as I read it, it seems by default that conditional would not evaluate to true, so perhaps the docs should say that telemetry is disabled by default and you should set PERCONA_TELEMETRY_DISABLE to zero to enable it.

@yosifkit
Copy link
Member

yosifkit commented Aug 8, 2024

Errors from GitHub Actions tests:

  1. percona:5.6.51-2-centos should be removed or updated to be FROM a supported image; centos:7 has reached end of life.
  2. percona:5.7.44-centos and percona:psmdb-* are failing to build with SIGNATURES NOT OK; they probably need updated keys?

Review of changes in percona:8.0.37-29:

  1. + /usr/bin/telemetry-agent-supervisor.sh &

    Running unsupervised background processes in the container is not generally allowed in Official Images since that then requires a third process to supervise them: handling and sending signals, reaping zombies, and deciding what to do if one or more of them exit (or don't exit when asked to). Unsupervised background processes are processes that don't have a responsible parent process. Even if their PID relationship puts them in a parent/child relationship (like because of an exec), the parent does not take responsibility for the child (tracking/restarting, forwarding signals, reaping, etc). Most of this is instead better handled at a higher level in functionality provided by dockerd or in an orchestration of multiple containers like docker-compose and kubernetes (though neither seem a great fit for a "sidecar" telemetry process).
    See also https://github.com/docker-library/official-images/blob/e070cf74d7f816287583930d2eeb9df1a93ad06e/README.md#init.

  2. + chown mysql /usr/bin/telemetry-agent-supervisor.sh; \

    Files from a previous layer (like COPY) should not be modified in a later layer as it duplicates content and increases the final image size. Permissions/ownership should be done via values tracked in git (like executability) and/or --chown and --chmod flags on COPY (docs.docker).

@surbhat1595
Copy link

surbhat1595 commented Aug 26, 2024

The telemetry-agent-supervisor.sh calls two commands, redirecting the stdout and stderr to files. Since these commands are not started in the background, presumably they are either short-lived processes or fork and exit on their own. With that being the case, why is it necessary for the telemetry-agent-supervisor.sh script to remain running?

The second command, i.e. /usr/bin/percona-telemetry-agent runs in the background and has to be always running and will exit only in case the container is killed. In case of any issues, it will try re-starting the percona-telemetry-agent process 3 times. Since percona-telemetry-agent process has to always be running and to avoid any zombie processes being created when the supervisor script exits, we have the supervisor script always running.

The conditional for the value of the PERCONA_TELEMETRY_DISABLE variable explicitly checks if the value is "0". I see nothing that sets the value explicitly to "0". As such, when will that conditional ever be true, short of the user setting it explicitly to zero? The documentation indicates you should set it to one to disable it, but as I read it, it seems by default that conditional would not evaluate to true, so perhaps the docs should say that telemetry is disabled by default and you should set PERCONA_TELEMETRY_DISABLE to zero to enable it.

No, the percona-telemetry-agent process runs in both conditions, when PERCONA_TELEMETRY_DISABLE is set to "0" and even when PERCONA_TELEMETRY_DISABLE is unintialized. In the condition if [[ ${PERCONA_TELEMETRY_DISABLE} -ne "0" ]]; -ne is an arithmetic operator, in an arithmetic context the empty string evaluates to zero since an unset variable expands to the empty string. All these conditions have been tested before publishing the docker images.

@surbhat1595
Copy link

  • /usr/bin/telemetry-agent-supervisor.sh &

Running unsupervised background processes in the container is not generally allowed in Official Images since that then requires a third process to supervise them: handling and sending signals, reaping zombies, and deciding what to do if one or more of them exit (or don't exit when asked to). Unsupervised background processes are processes that don't have a responsible parent process. Even if their PID relationship puts them in a parent/child relationship (like because of an exec), the parent does not take responsibility for the child (tracking/restarting, forwarding signals, reaping, etc). Most of this is instead better handled at a higher level in functionality provided by dockerd or in an orchestration of multiple containers like docker-compose and kubernetes (though neither seem a great fit for a "sidecar" telemetry process).
See also https://github.com/docker-library/official-images/blob/e070cf74d7f816287583930d2eeb9df1a93ad06e/README.md#init.

The reason we added a sleep infinity command in the telemetry-agent-supervisor.sh is because the process might turn into a zombie process if the script exits. We agree that it can be handled better using supervisord etc, but it seemed to be too heavy considering our requirement. Nevertheless, we shall check if there are ways to improve this

  • chown mysql /usr/bin/telemetry-agent-supervisor.sh; \

Files from a previous layer (like COPY) should not be modified in a later layer as it duplicates content and increases the final image size. Permissions/ownership should be done via values tracked in git (like executability) and/or --chown and --chmod flags on COPY (docs.docker).

Sure, Thank you, we shall try replacing chmod/chown commands with the above mentioned ways.

@whalelines
Copy link
Contributor

The telemetry-agent-supervisor.sh calls two commands, redirecting the stdout and stderr to files. Since these commands are not started in the background, presumably they are either short-lived processes or fork and exit on their own. With that being the case, why is it necessary for the telemetry-agent-supervisor.sh script to remain running?

The second command, i.e. /usr/bin/percona-telemetry-agent runs in the background and has to be always running and will exit only in case the container is killed. In case of any issues, it will try re-starting the percona-telemetry-agent process 3 times. Since percona-telemetry-agent process has to always be running and to avoid any zombie processes being created when the supervisor script exits, we have the supervisor script always running.

I am confused by your response. Does /usr/bin/percona-telemetry-agent fork itself and run in the background or does it stay in the foreground and the telemetry-agent-supervisor.sh monitors it? In other words, are those three retries done because of transient startup errors or to recover from any issue the agent may have during the lifetime of the container?

Here is the script in question:

#!/bin/bash

# phase-0 telemetry
/call-home.sh -f "PRODUCT_FAMILY_PS" -v "${PS_TELEMETRY_VERSION}" -d "DOCKER" ${CALL_HOME_OPTIONAL_PARAMS} &> /dev/null || :

# phase-1 telemetry
for i in {1..3}; do
    /usr/bin/percona-telemetry-agent >> /var/log/percona/telemetry-agent.log 2>> /var/log/percona/telemetry-agent-error.log
    if [ $? -eq 0 ]; then
      break
    fi
    sleep 5
done
sleep infinity

@tianon
Copy link
Member

tianon commented Aug 26, 2024

Either way, and to try and make this more explicitly clear, a generic process supervisor (either in shell or an explicit supervisor tool like supervisord) is unfortunately not going to be acceptable here.

@tianon
Copy link
Member

tianon commented Sep 10, 2024

All images specified in library/percona are now failing to build -- if this is not rectified soon, we will be disabling them. ❤️

@EvgeniyPatlan
Copy link
Contributor

EvgeniyPatlan commented Sep 11, 2024

Either way, and to try and make this more explicitly clear, a generic process supervisor (either in shell or an explicit supervisor tool like supervisord) is unfortunately not going to be acceptable here.

@tianon and what would be the suggested way? What is the best practice for such situation?

@tianon
Copy link
Member

tianon commented Sep 11, 2024

If both processes are necessary and must stay resident/running, a separate container for each or a proper manager/worker model as seen in Apache, HAProxy, NGINX, etc.

@surbhat1595
Copy link

@tianon , We followed the official documentation from Docker here which suggests using either a wrapper script or supervisord for managing multiple processes in a container. Our implementation uses a wrapper script ( telemetry-agent-supervisor.sh) as per the guidance, structured like this:

percona_bash_script
├── mysqld
└── telemetry_agent

Alternatively, we also explored supervisord which is also mentioned in the above doc,

supervisord
├── mysqld
└── telemetry_agent

Also, the docker images you mentioned as a reference, i.e. Apache, HAProxy, NGINX, etc. run one process in entrypoint which forks sub-processes. Since you mentioned that supervisord or a wrapper script is not acceptable, could you kindly clarify what exactly needs to be changed in our approach or any alternative method you would recommend to run multiple processes in a container?
Thank you.

@yosifkit
Copy link
Member

Also, the docker images you mentioned as a reference, i.e. Apache, HAProxy, NGINX, etc. run one process in entrypoint which forks sub-processes. Since you mentioned that supervisord or a wrapper script is not acceptable, could you kindly clarify what exactly needs to be changed in our approach or any alternative method you would recommend to run multiple processes in a container?

Without making the telemetry process a child managed by mysqld (similar to NGINX and its worker processes), there isn't really an alternative within Docker Official Images. I guess a "hacky" supervisor configuration could be hidden from our review by making it and the telemetry script part of a percona-server-* packaging. However, that should also make it more directly supported/tested within Percona (and bugs would not be considered just a quirk of containers) and then used by all users of Percona Server whether or not they use the Docker images.

Running multiple processes side by side within a single container is complex and there are many edge cases that we can't always remember and review in PRs to official images. To ensure that all images meet the high quality users expect from Docker Official Images, we restrict them to a single process (aside from those that spawn and manage their own worker processes).


Just for clarification as to why multi-process containers are so complex. Here are just a few edges and considerations that are needed for a multi-process container; this list is neither exhaustive nor an acceptance checklist:

  • minimum of 3 processes
    • supervisor process: one process responsible for managing the other processes, signal handling/forwarding, and exiting
    • main processes: the child processes that if any of them exits should cause the container to exit
    • sidecar processes: the children that if they exit, should just be restarted by the supervisor (or ignored?)
      • it could be zero sidecar processes and more than one main process
  • when should the supervisor exit (causing the whole container to stop and cause all other processes to be forcibly killed)
  • what exit code should the supervisor use?
  • supervisor reaping (the supervisor is expected to handle pid 1 responsibilities, like reaping zombies, as well as notice when the main or sidecar processes exit)
  • forwarding signals
    • which signals are sent to which children?
    • which signals should cause the specific child to exit?
    • is there a timeout on signals that expect the child to exit (and when should that cause the supervisor to exit)
      • how does that interact with user-configured Docker or Kubernetes stop timeouts?
  • what to do if a sidecar process exits unexpectedly (immediately or after a long time)
    • does the sidecar need a startup grace period (like initialDelaySeconds in Kubernetes liveness probes)
    • is the sidecar required to be ready before starting the main process? Or the other way around?
      • what is "ready"
      • how does a user reflect the sidecar processes in a health check (or Kubernetes liveness/readiness probes)?
    • is the response different if the sidecar has failed 10 times, or 100 times?
  • docker users expect most logs will be available on stdout and stderr
    • does each process (supervisor, main, and sidecar) need a prefix on their output to clarify where it came from
    • changing logging configuration now can be done in multiple, possibly conflicting, places (e.g., my.cnf, supervisor config, and supervisor child-specific config)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants