Skip to content

Commit

Permalink
[#25900] YSQL: Capture stdout and stderr from the pg_upgrade and emit…
Browse files Browse the repository at this point in the history
… them in master logs

Summary:
- Redirect the stdout and stderr of the `pg_upgrade` and `initdb` process to the master logs. This will help in debugging errors from these processes. This uses the same logic currently in use by `ExternalMiniCluster` tests.
- Improve the upgrade tests to check for exact errors strings in the logs.
- Running the YB-specific checks in `pg_upgrade` at the start, as these catch issues like missing upgrade compatibility GUCs. This issue was found with the logging changes.
- Use a separate pg data directory (`pg_upgrade_data`) for the upgrade. This helps avoid conflict with the `pg_data` folder used by the yb-tserver. All `pg_upgrade` logs and dump data will are placed in this directory, which also makes it simpler for post upgrade cleanup operations (#25907).

Example output from yb-master logs

```
I0205 16:40:10.333971 1836642304 subprocess.cc:888] [pg_upgrade] Performing Consistency Checks
I0205 16:40:10.334014 1836642304 subprocess.cc:888] [pg_upgrade] -----------------------------
I0205 16:40:10.439098 1836642304 subprocess.cc:888] [pg_upgrade] Checking expression pushdown is disabled                    ok
I0205 16:40:10.447584 1836642304 subprocess.cc:888] [pg_upgrade] Checking attributes of the user used to access the old clusterChecking 'yugabyte' user attributes                         ok
I0205 16:40:10.448410 1836642304 subprocess.cc:888] [pg_upgrade] Checking 'yugabyte' user attributes                         ok
I0205 16:40:10.464574 1836642304 subprocess.cc:888] [pg_upgrade] Checking for all 3 system databases                         ok
I0205 16:40:10.464596 1836642304 subprocess.cc:888] [pg_upgrade] ok
I0205 16:40:11.043160 1836642304 subprocess.cc:888] [pg_upgrade] Checking database connection settings                       ok
I0205 16:40:11.281297 1825746944 ysql_manager.cc:138] Checking if ysql major catalog upgrade is done
I0205 16:40:11.286499 1836642304 subprocess.cc:888] [pg_upgrade] Checking for system-defined composite types in user tables  ok
I0205 16:40:11.473160 1836642304 subprocess.cc:888] [pg_upgrade] Checking for reg* data types in user tables                 ok
I0205 16:40:11.473188 1836642304 subprocess.cc:888] [pg_upgrade] Checking for contrib/isn with bigint-passing mismatch       ok
I0205 16:40:11.678638 1836642304 subprocess.cc:888] [pg_upgrade] Checking for user-defined postfix operators                 ok
I0205 16:40:11.963297 1836642304 subprocess.cc:888] [pg_upgrade] Checking for incompatible polymorphic functions             ok
I0205 16:40:12.090222 1836642304 subprocess.cc:888] [pg_upgrade] Checking for tables WITH OIDS                               ok
I0205 16:40:12.281961 1825173504 ysql_manager.cc:138] Checking if ysql major catalog upgrade is done
I0205 16:40:12.348526 1836642304 subprocess.cc:888] [pg_upgrade] Checking for invalid "sql_identifier" user columns          ok
I0205 16:40:12.472633 1836642304 subprocess.cc:888] [pg_upgrade] Creating dump of global objects                             ok
I0205 16:40:13.284899 1825173504 ysql_manager.cc:138] Checking if ysql major catalog upgrade is done
I0205 16:40:14.286425 1825173504 ysql_manager.cc:138] Checking if ysql major catalog upgrade is done
I0205 16:40:15.288862 1834921984 ysql_manager.cc:138] Checking if ysql major catalog upgrade is done
I0205 16:40:15.831990 1836642304 subprocess.cc:888] [pg_upgrade] Creating dump of database schemas                           ok
I0205 16:40:15.884467 1826320384 master.cc:702] resp: entries { db_oid: 1 shm_index: 0 }
I0205 16:40:15.903304 1834921984 master.cc:702] resp: entries { db_oid: 1 shm_index: 0 }
I0205 16:40:16.010345 1840656384 master.cc:702] resp: entries { db_oid: 13260 shm_index: 3 }
I0205 16:40:16.051797 1844670464 master.cc:702] resp: entries { db_oid: 1 shm_index: 0 }
I0205 16:40:16.076397 1836642304 subprocess.cc:888] [pg_upgrade] Checking for prepared transactions                          ok
I0205 16:40:16.076427 1836642304 subprocess.cc:888] [pg_upgrade] Checking for new cluster tablespace directories             ok
I0205 16:40:16.076438 1836642304 subprocess.cc:888] [pg_upgrade]
I0205 16:40:16.076444 1836642304 subprocess.cc:888] [pg_upgrade] If pg_upgrade fails after this point, you must re-initdb the
I0205 16:40:16.076449 1836642304 subprocess.cc:888] [pg_upgrade] new cluster before continuing.
I0205 16:40:16.076454 1836642304 subprocess.cc:888] [pg_upgrade]
I0205 16:40:16.076459 1836642304 subprocess.cc:888] [pg_upgrade] Performing Upgrade
I0205 16:40:16.076464 1836642304 subprocess.cc:888] [pg_upgrade] ------------------
```

#25900
Jira: DB-15212

Test Plan:
Pg15UpgradeTest.UsersAndRoles
YsqlMajorUpgradeDdlBlockingTest.CreateAndDropDBs
Pg15UpgradeTest.CheckUpgradeCompatibilityGuc

Reviewers: telgersma, asrivastava

Reviewed By: asrivastava

Subscribers: svc_phabricator, ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D41702
  • Loading branch information
hari90 committed Feb 7, 2025
1 parent 8712768 commit 18bd8b6
Show file tree
Hide file tree
Showing 14 changed files with 303 additions and 134 deletions.
53 changes: 25 additions & 28 deletions src/postgres/src/bin/pg_upgrade/check.c
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ static char *get_canonical_locale_name(int category, const char *locale);

/* Yugabyte-specific checks */

static void yb_check_pushdown_is_disabled(PGconn *old_cluster_conn);
static void yb_check_upgrade_compatibility_guc(PGconn *old_cluster_conn);

static void yb_check_system_databases_exist(PGconn *old_cluster_conn);

Expand Down Expand Up @@ -102,6 +102,25 @@ check_and_dump_old_cluster(bool live_check)
if (!is_yugabyte_enabled() && !live_check)
start_postmaster(&old_cluster, true);

if (is_yugabyte_enabled())
{
PGconn *old_cluster_conn = connectToServer(&old_cluster, "template1");

/*
* --check can be run any time before the upgrade. Upgrade Compatibility
* GUC is only required to be set during the actual upgrade.
*/
if (!user_opts.check)
yb_check_upgrade_compatibility_guc(old_cluster_conn);

yb_check_old_cluster_user(old_cluster_conn);
yb_check_yugabyte_user(old_cluster_conn);
yb_check_system_databases_exist(old_cluster_conn);

PQfinish(old_cluster_conn);
check_ok();
}

/* Extract a list of databases and tables from the old cluster */
get_db_and_rel_infos(&old_cluster);

Expand Down Expand Up @@ -190,28 +209,6 @@ check_and_dump_old_cluster(bool live_check)
if (!is_yugabyte_enabled() && GET_MAJOR_VERSION(old_cluster.major_version) <= 903)
old_9_3_check_for_line_data_type_usage(&old_cluster);

if (is_yugabyte_enabled())
{
PGconn *old_cluster_conn = connectToServer(&old_cluster, "template1");

/*
* Yugabyte does not support expression pushdown during major upgrades.
* Only check this when we are ready to actually upgrade the cluster,
* because users may want to run this check long before the upgrade.
*/
if (!user_opts.check)
{
yb_check_pushdown_is_disabled(old_cluster_conn);
yb_check_old_cluster_user(old_cluster_conn);
}

yb_check_yugabyte_user(old_cluster_conn);
yb_check_system_databases_exist(old_cluster_conn);

PQfinish(old_cluster_conn);
check_ok();
}

/*
* While not a check option, we do this now because this is the only time
* the old server is running.
Expand Down Expand Up @@ -1596,13 +1593,13 @@ get_canonical_locale_name(int category, const char *locale)
}

/*
* yb_check_pushdown_is_disabled()
* yb_check_upgrade_compatibility_guc()
*
* Check we are the install user, and that the new cluster
* has no other users.
* Make sure the yb_major_version_upgrade_compatibility GUC is set to the
* correct value.
*/
static void
yb_check_pushdown_is_disabled(PGconn *old_cluster_conn)
yb_check_upgrade_compatibility_guc(PGconn *old_cluster_conn)
{
PGresult *res;

Expand Down Expand Up @@ -1634,7 +1631,7 @@ yb_check_system_databases_exist(PGconn *old_cluster_conn)
"VALUES ('template0'), ('template1'), ('yugabyte') EXCEPT SELECT datname FROM pg_database;");

if (PQntuples(res) != 0)
pg_fatal("Missing system database %s\n", PQgetvalue(res, 0, 0));
pg_fatal("Missing system database '%s'\n", PQgetvalue(res, 0, 0));

PQclear(res);

Expand Down
43 changes: 34 additions & 9 deletions src/yb/integration-tests/external_mini_cluster.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2233,11 +2233,6 @@ void ExternalMiniCluster::SetMaxGracefulShutdownWaitSec(int max_graceful_shutdow
}
}

LogWaiter::LogWaiter(ExternalDaemon* daemon, const std::string& string_to_wait) :
daemon_(daemon), string_to_wait_(string_to_wait) {
daemon_->SetLogListener(this);
}

Status ExternalMiniCluster::CallYbAdmin(
const std::vector<std::string>& args, MonoDelta timeout, std::string* output) {
auto command = ToStringVector(
Expand All @@ -2258,6 +2253,28 @@ Status ExternalMiniCluster::CallYbAdmin(
return status;
}

LogWaiter ExternalMiniCluster::GetMasterLogWaiter(const std::string& log_message) const {
std::vector<ExternalDaemon*> masters;
for (const scoped_refptr<ExternalMaster>& master : masters_) {
masters.push_back(master.get());
}
return LogWaiter(masters, log_message);
}

//------------------------------------------------------------
// LogWaiter
//------------------------------------------------------------

LogWaiter::LogWaiter(ExternalDaemon* daemon, const std::string& string_to_wait)
: LogWaiter(std::vector<ExternalDaemon*>({daemon}), string_to_wait) {}

LogWaiter::LogWaiter(std::vector<ExternalDaemon*> daemons, const std::string& string_to_wait)
: daemons_(std::move(daemons)), string_to_wait_(string_to_wait) {
for (auto daemon : daemons_) {
daemon->SetLogListener(this);
}
}

void LogWaiter::Handle(const GStringPiece& s) {
if (s.contains(string_to_wait_)) {
event_occurred_ = true;
Expand All @@ -2266,14 +2283,22 @@ void LogWaiter::Handle(const GStringPiece& s) {

Status LogWaiter::WaitFor(const MonoDelta timeout) {
constexpr auto kInitialWaitPeriod = 100ms;
return ::yb::WaitFor(
[this]{ return event_occurred_.load(); }, timeout,
Format("Waiting for log record '$0' on $1...", string_to_wait_, daemon_->id()),

std::vector<std::string> daemons_ids;
std::transform(
daemons_.begin(), daemons_.end(), std::back_inserter(daemons_ids),
[](const auto* daemon) { return daemon->id(); });

return ::yb::LoggedWaitFor(
[this] { return event_occurred_.load(); }, timeout,
Format("Waiting for log record '$0' on $1...", string_to_wait_, ToString(daemons_ids)),
kInitialWaitPeriod);
}

LogWaiter::~LogWaiter() {
daemon_->RemoveLogListener(this);
for (auto daemon : daemons_) {
daemon->RemoveLogListener(this);
}
}

//------------------------------------------------------------
Expand Down
10 changes: 8 additions & 2 deletions src/yb/integration-tests/external_mini_cluster.h
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,8 @@ struct ExternalMiniClusterOptions {

YB_STRONGLY_TYPED_BOOL(RequireExitCode0);

class LogWaiter;

// A mini-cluster made up of subprocesses running each of the daemons separately. This is useful for
// black-box or grey-box failure testing purposes -- it provides the ability to forcibly kill or
// stop particular cluster participants, which isn't feasible in the normal MiniCluster. On the
Expand Down Expand Up @@ -586,6 +588,9 @@ class ExternalMiniCluster : public MiniClusterBase {
const std::vector<std::string>& args, MonoDelta timeout = MonoDelta::FromSeconds(60),
std::string* output = nullptr);

// Get a LogWaiter that waits for the given log message across all masters.
LogWaiter GetMasterLogWaiter(const std::string& log_message) const;

protected:
friend class UpgradeTestBase;
FRIEND_TEST(MasterFailoverTest, TestKillAnyMaster);
Expand Down Expand Up @@ -668,6 +673,7 @@ YB_STRONGLY_TYPED_BOOL(SafeShutdown);
class LogWaiter : public ExternalDaemon::StringListener {
public:
LogWaiter(ExternalDaemon* daemon, const std::string& string_to_wait);
LogWaiter(std::vector<ExternalDaemon*> daemons, const std::string& string_to_wait);

Status WaitFor(MonoDelta timeout);
bool IsEventOccurred() { return event_occurred_; }
Expand All @@ -677,9 +683,9 @@ class LogWaiter : public ExternalDaemon::StringListener {
private:
void Handle(const GStringPiece& s) override;

ExternalDaemon* daemon_;
std::vector<ExternalDaemon*> daemons_;
std::atomic<bool> event_occurred_{false};
std::string string_to_wait_;
const std::string string_to_wait_;
};

// Resumes a daemon that was stopped with ExteranlDaemon::Pause() upon
Expand Down
18 changes: 13 additions & 5 deletions src/yb/integration-tests/upgrade-tests/pg15_upgrade-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,9 @@ TEST_F(Pg15UpgradeTest, CheckVersion) {
ASSERT_STR_CONTAINS(ysql_catalog_config, "catalog_version: 15");

// Running validation on the upgraded cluster should fail since its already on the higher version.
ASSERT_NOK_STR_CONTAINS(ValidateUpgradeCompatibility(), kPgUpgradeFailedError);
ASSERT_OK(ValidateUpgradeCompatibilityFailure(
"This version of the utility can only be used for checking YSQL version 11. The cluster is "
"currently on YSQL version 15"));
}

TEST_F(Pg15UpgradeTest, SimpleTableUpgrade) { ASSERT_OK(TestUpgradeWithSimpleTable()); }
Expand Down Expand Up @@ -1440,7 +1442,7 @@ TEST_F(Pg15UpgradeTest, YbGinIndex) {
}
}

TEST_F(Pg15UpgradeTest, CheckPushdownIsDisabled) {
TEST_F(Pg15UpgradeTest, CheckUpgradeCompatibilityGuc) {
// Whether or not yb_major_version_upgrade_compatibility is enabled, pg_upgrade --check will not
// error.
ASSERT_OK(cluster_->AddAndSetExtraFlag("ysql_yb_major_version_upgrade_compatibility", "11"));
Expand All @@ -1449,8 +1451,12 @@ TEST_F(Pg15UpgradeTest, CheckPushdownIsDisabled) {
ASSERT_OK(cluster_->AddAndSetExtraFlag("ysql_yb_major_version_upgrade_compatibility", "0"));
ASSERT_OK(ValidateUpgradeCompatibility());

// However, when we actually run the YSQL upgrade, pg_upgrade will error if pushdown is enabled.
// However, when we actually run the YSQL upgrade, pg_upgrade will error since now
// ysql_yb_major_version_upgrade_compatibility is not set.
auto log_waiter =
cluster_->GetMasterLogWaiter("yb_major_version_upgrade_compatibility must be set to 11");
ASSERT_NOK_STR_CONTAINS(UpgradeClusterToMixedMode(), kPgUpgradeFailedError);
ASSERT_TRUE(log_waiter.IsEventOccurred());
}

TEST_F(Pg15UpgradeSequenceTest, Sequences) {
Expand Down Expand Up @@ -1540,10 +1546,12 @@ TEST_F(Pg15UpgradeTest, UsersAndRoles) {

auto pg_conn = ASSERT_RESULT(pgwrapper::PGConnBuilder(pg_conn_settings).Connect());
ASSERT_OK(pg_conn.Execute("DROP USER yugabyte"));
ASSERT_NOK_STR_CONTAINS(ValidateUpgradeCompatibility(postgres_user), kPgUpgradeFailedError);
ASSERT_OK(ValidateUpgradeCompatibilityFailure("The 'yugabyte' user is missing", postgres_user));

ASSERT_OK(pg_conn.Execute("CREATE USER yugabyte"));
ASSERT_NOK_STR_CONTAINS(ValidateUpgradeCompatibility(postgres_user), kPgUpgradeFailedError);

ASSERT_OK(ValidateUpgradeCompatibilityFailure(
"The 'yugabyte' user is missing the 'rolsuper' attribute", postgres_user));

ASSERT_OK(pg_conn.Execute("DROP USER yugabyte"));
ASSERT_OK(pg_conn.Execute(
Expand Down
20 changes: 19 additions & 1 deletion src/yb/integration-tests/upgrade-tests/pg15_upgrade_test_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
#include "yb/master/master_admin.proxy.h"
#include "yb/master/master_defaults.h"
#include "yb/util/backoff_waiter.h"
#include "yb/util/logging_test_util.h"
#include "yb/yql/pgwrapper/libpq_utils.h"
#include "yb/util/env_util.h"

Expand Down Expand Up @@ -56,7 +57,24 @@ Status Pg15UpgradeTestBase::ValidateUpgradeCompatibility(const std::string& user

LOG(INFO) << "Running " << AsString(args);

return Subprocess::Call(args);
return Subprocess::Call(args, /*log_stdout_and_stderr=*/true);
}

Status Pg15UpgradeTestBase::ValidateUpgradeCompatibilityFailure(
const std::string& expected_error, const std::string& user_name) {
auto log_waiter = StringWaiterLogSink(expected_error);
auto status = ValidateUpgradeCompatibility(user_name);
if (status.ok()) {
return STATUS_FORMAT(
IllegalState, "Expected pg_upgrade to fail with error: $0", expected_error);
}
SCHECK(
status.message().Contains(kPgUpgradeFailedError), IllegalState, "Unexpected status: $0",
status);
SCHECK_FORMAT(
log_waiter.IsEventOccurred(), IllegalState, "Expected pg_upgrade to fail with error: $0",
expected_error);
return Status::OK();
}

Status Pg15UpgradeTestBase::UpgradeClusterToMixedMode() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ class Pg15UpgradeTestBase : public UpgradeTestBase {

// Run pg_upgrade --check
virtual Status ValidateUpgradeCompatibility(const std::string& user_name = "yugabyte");
Status ValidateUpgradeCompatibilityFailure(
const std::string& expected_error, const std::string& user_name = "yugabyte");

// Restarts all masters in the current version, runs ysql major version upgrade, and restarts
// tserver kMixedModeTserverPg15 in the current version. Other tservers are kept in the pg11
Expand Down
3 changes: 2 additions & 1 deletion src/yb/integration-tests/upgrade-tests/upgrade_test_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -457,7 +457,8 @@ Status UpgradeTestBase::WaitForYsqlMajorCatalogUpgradeToFinish() {
};

return LoggedWaitFor(
is_upgrade_done, 10min, "Waiting for ysql major catalog upgrade to complete");
is_upgrade_done, 10min, "Waiting for ysql major catalog upgrade to complete",
/*initial_delay*/ 1s);
}

Status UpgradeTestBase::PromoteAutoFlags(AutoFlagClass flag_class) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ TEST_F(YsqlMajorUpgradeDdlBlockingTest, CreateAndDropDBs) {
ASSERT_OK(template1_conn.ExecuteFormat("DROP DATABASE system_platform"));
ASSERT_OK(template1_conn.ExecuteFormat("DROP DATABASE yugabyte"));

ASSERT_NOK_STR_CONTAINS(ValidateUpgradeCompatibility(), kPgUpgradeFailedError);
ASSERT_OK(ValidateUpgradeCompatibilityFailure("Missing system database 'yugabyte'"));

ASSERT_OK(template1_conn.ExecuteFormat("CREATE DATABASE yugabyte"));
ASSERT_OK(ValidateUpgradeCompatibility());
Expand Down
6 changes: 4 additions & 2 deletions src/yb/master/ysql/ysql_initdb_major_upgrade_handler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -352,10 +352,12 @@ YsqlInitDBAndMajorUpgradeHandler::GetDbNameToOidListForMajorUpgrade() {
Status YsqlInitDBAndMajorUpgradeHandler::PerformPgUpgrade(const LeaderEpoch& epoch) {
const auto& master_opts = master_.opts();

const auto pg_upgrade_data_dir =
JoinPathSegments(master_opts.fs_opts.data_paths.front(), "pg_upgrade_data");

// Run local initdb to prepare the node for starting postgres.
auto pg_conf = VERIFY_RESULT(pgwrapper::PgProcessConf::CreateValidateAndRunInitDb(
FLAGS_rpc_bind_addresses, master_opts.fs_opts.data_paths.front() + "/pg_data",
master_.GetSharedMemoryFd()));
FLAGS_rpc_bind_addresses, pg_upgrade_data_dir, master_.GetSharedMemoryFd()));

pg_conf.master_addresses = master_opts.master_addresses_flag;
pg_conf.pg_port = FLAGS_ysql_upgrade_postgres_port;
Expand Down
Loading

0 comments on commit 18bd8b6

Please sign in to comment.