Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action required by 4 March 2025 ci-release #3852

Open
richardlau opened this issue Aug 6, 2024 · 16 comments
Open

Action required by 4 March 2025 ci-release #3852

richardlau opened this issue Aug 6, 2024 · 16 comments

Comments

@richardlau
Copy link
Member

https://cloud.ibm.com/classic/support/event/details/162467655

Event Description
IMS 2024 Announcement Closures: DAL09 - POD3 and POD4 ======================================================================

Subject: Time-sensitive action required: Datacenter modernization announcement

Thank you for your business and trust in IBM as your valued business partner and cloud provider. We’re committed to your success and prioritizing your experience within our datacenter infrastructure.

We have made significant investments in our new IBM Cloud datacenters and Multizone Regions (MZRs) designed to deliver a more resilient architecture with higher levels of network throughput and redundancy with our latest generation cloud technologies.

As part of this modernization strategy, we have made the decision to consolidate select data centers and help our customers shift operations to our newer and higher-capacity facilities, including the decision to close the following datacenter on March 04, 2025. This will not impact any other PODs within DAL09.

• DAL09 - POD3 and POD4

This means you will need to migrate the workloads running in these locations to one of our newer IBM Cloud datacenters before this date – but don’t worry we can help ensure you pay the same or a lower price that you pay today for the same or a better configuration!

For our valued IBM Cloud® platform customers:

We recognize transferring datacenters can be complex and costly, so we are offering:
• Two months free on replacement servers or services in our new datacenters (Promo Code: DCMIGRATE2024) or Four months free when you optimize your IT infrastructure by migrating from Bare Metal to VSI or VPC (Promo Code: UPGRADE2VSI2024).
• The same or a lower price with a same or better configuration.
• Migration assistance. This includes a free architectural consultation with guidance on
recommended configurations to help you transition and maximize solution performance. We have the support of a third-party partner available to help with your data migration at no charge. Additionally, we may be able to help with other migration requirements, depending on your needs

Your action needed:
• Identify impacted servers/services. Contact us through your IBM Cloud
Portal (https://cloud.ibm.com/login) or reach out to the Customer Success team via live chat (https://www.ibm.com/cloud/data-centers?focusArea=WCP%20- %20Cloud%20services%20-%20all%20other&contactmodule) or by phone: (US) 866- 597-9687; (EMEA) +31 20 308 0540; (APAC) +65 6622 2231.
• Migrate your workloads currently running in the impacted datacenters/PODs.
• Free migration assistance is available through our partner Wanclouds
(https://www.wanclouds.net/ibm-request).
• Cancel your servers after migration. After you complete the migration to your new servers, make sure to cancel your existing servers / services. Existing services will continue to be invoiced until cancelled.

Key Milestones:
Between now and the final reclaim date, there are several key milestones to be aware of:
• August 06, 2024: General Announcement Date
• August 06, 2024: No New Account provisioning in impacted data centers.
• October 14, 2024: No provisioning on existing accounts in impacted data centers.
• February 05, 2025: network maintenance: Remaining services in DAL09 PODs 3 and 4
will experience network disruption during the network maintenance. Customers will need
to contact IBM Cloud to restore service.
• February 10, 2025: Final date to submit migration assistance request
• March 04, 2025: DATACENTER CONSOLIDATION DATE: final day to migrate data in
DAL09 PODs 3 and 4

Where can I get more information?

To identify your impacted servers, take advantage of our special offers, or learn about recommended configurations or datacenters, contact our IBM Customer Success team via:
• Live chat (https://www.ibm.com/cloud/data-centers/?focusArea=WCP%20- %20Pooled%20CSM&contactmodule)
• Phone: (US) 866-597-9687; (EMEA) +31 20 308 0540; (APAC) +65 6622 2231
• About datacenter closures on: https://cloud.ibm.com/docs/get-support?topic=get-
support-dc-closure

Thank you for your continued partnership with IBM. If you have additional questions or would like help during this migration, please let us know.

IBM Cloud, Customer Success Team
Devices Affected
infra-ibm-ubuntu1804-x64-1.nodejs.private

The affected machine is ci-release (FWIW despite its infra-ibm-ubuntu1804-x64-1.nodejs.private name, it is running Ubuntu 20.04). We don't have to migrate it urgently, but we should plan to avoid September/October (Node.js 23) and March (Node.js 24).

@targos
Copy link
Member

targos commented Oct 18, 2024

I'd like to try tackling it this weekend.
Would it be doable to create a new machine with similar specs (on Ubuntu 24.04) and migrate the data and config to it?

@richardlau
Copy link
Member Author

I'd like to try tackling it this weekend. Would it be doable to create a new machine with similar specs (on Ubuntu 24.04) and migrate the data and config to it?

Yes, I think so. We rebuilt ci-release back in 2021 so there's some history for reference: #2626 (comment)

@targos
Copy link
Member

targos commented Oct 20, 2024

  • VM created: https://cloud.ibm.com/gen1/infrastructure/virtual-server/146903884/details#main
  • Install and enable nginx
    • apt install nginx
    • systemctl enable nginx
    • systemctl start nginx
  • Install jenkins
  • Configure nginx
    • cd /etc/nginx
    • copy conf.d/jenkins-static.conf
    • install additional modules: apt install libnginx-mod-http-image-filter libnginx-mod-http-xslt-filter libnginx-mod-mail libnginx-mod-stream
    • copy sites-available/jenkins-iojs
    • ln -s ../sites-available/jenkins-iojs sites-enabled/jenkins-iojs
    • unlink sites-enabled/default
    • copy ssl dir
    • systemctl restart nginx
  • Configure iptables
    • apt install iptables-persistent
    • Run iptables-save > /etc/iptables/rules.v4 on old server
    • Copy /etc/iptables/rules.v4
    • systemctl restart netfilter-persistent
  • Setup and mount jenkins data disk (already attached to /dev/xvdc)
    • fdisk /dev/xvdc
    • n, p, 1, default, default, p, w
    • mkfs.xfs /dev/xvdc1
    • systemctl stop jenkins
    • cd /var/lib
    • mv jenkins jenkins-old
    • mkdir jenkins
    • chown jenkins:jenkins jenkins
    • more /etc/mtab and copy line to /etc/fstab
    • mv jenkins-old/* jenkins/
    • mv jenkins-old/.* jenkins/
    • rmdir jenkins-old
  • Stop old server and rsync Jenkins data from it
  • Change DNS entry
  • Update ansible inventory: anisble: update ci-release server #3937
  • Convince all of the release nodes to connect to this new server?
  • Update backup scripts/config?
  • SSL certificate renewal?
  • ... ?
  • Decommission old server

targos added a commit to targos/nodejs-build that referenced this issue Oct 20, 2024
@targos
Copy link
Member

targos commented Oct 20, 2024

I did everything I could. https://ci-release.nodejs.org now points to the new server. There are a few open questions/tasks:

  • Do we need to do something on the release nodes? https://ci-release.nodejs.org/computer/ seems to suggest they connected to the new machine without issues.
  • I don't know what needs to be done for backups
  • What about the SSL certificate?
  • Anything else?

I'm leaving for holiday tomorrow. Anyone else feel free to finish the migration while I'm away.

Test build: https://ci-release.nodejs.org/job/iojs+release/10554/

@richardlau
Copy link
Member Author

💚 Thanks for doing this.

I did everything I could. https://ci-release.nodejs.org now points to the new server. There are a few open questions/tasks:

* Do we need to do something on the release nodes? https://ci-release.nodejs.org/computer/ seems to suggest they connected to the new machine without issues.

I don't believe we need to do anything on the release nodes so long as ci-release.nodejs.org points to the correct server.

* I don't know what needs to be done for backups

Again I think this should just work so long as ci-release.nodejs.org was updated. I'll look at the whats on the backup machine tomorrow.

* What about the SSL certificate?

ci-release.nodejs.org seems to be behind the expected certificate -- I can't remember if this is being server from nginx or Cloudflare for the Jenkins servers.

targos added a commit that referenced this issue Oct 20, 2024
@targos
Copy link
Member

targos commented Oct 20, 2024

  • What about the SSL certificate?

ci-release.nodejs.org seems to be behind the expected certificate -- I can't remember if this is being server from nginx or Cloudflare for the Jenkins servers.

It's being served from nginx. My interrogation is about renewal. Do we need to do something on the server so it is automatically renewed when necessary?

@targos
Copy link
Member

targos commented Oct 20, 2024

Test build: ci-release.nodejs.org/job/iojs+release/10554

macOS jobs haven't started:
CleanShot 2024-10-20 at 20 19 42

@richardlau
Copy link
Member Author

Not seeing any osx13 machines in https://ci-release.nodejs.org/computer/ although I'm not sure if we expect to with the ephermeral VM set up. @UlisesGascon @ryanaslett

@richardlau
Copy link
Member Author

My interrogation is about renewal. Do we need to do something on the server so it is automatically renewed when necessary?

No, at least at the moment the certificates have been manually updated yearly.

@richardlau
Copy link
Member Author

* I don't know what needs to be done for backups

Again I think this should just work so long as ci-release.nodejs.org was updated. I'll look at the whats on the backup machine tomorrow.

Well I was wrong -- it doesn't look like the backups worked (last update in /data/backup/periodic/daily.0/ci-release.nodejs.org/jobs/iojs+release/builds/ on the backup server is from 19 Oct). This is because the server needs to have the public key for backup (found in the infra section of the secrets repo) added to authorized_keys on the Jenkins servers so that the backup machine can ssh into them. Will fix.

@richardlau
Copy link
Member Author

I checked that I could successfully ssh into ci-release from the backup machine (after removing the known host as the server has changed). I also tried to run remove_old.sh ci-release.nodejs.org but this failed at the end trying to trigger a reload -- possibly due to the credential being used? (@ryanaslett it looks like backup is using your credential for Jenkins -- I have a vague recollection you may have asked/mentioned this before when setting up the backup server but I've forgotten the context (possibly it was using a former Build WG member's credential who was removed from one the Node.js org teams?).)

# /root/backup_scripts/remove_old.sh ci-release.nodejs.org
<html>
<head><title>400 Bad Request</title></head>
<body>
<center><h1>400 Bad Request</h1></center>
<hr><center>nginx/1.24.0 (Ubuntu)</center>
</body>
</html>

@ryanaslett
Copy link
Contributor

Yes, backup was using a different contributor's credentials, and we didn't have a good mechanism for a service account. I'll open a separate issue to investigate how to address that.

@ryanaslett
Copy link
Contributor

Unrelated, but I cant update my ssh config using ansible, so I can get onto the new server to investigate the vpn connectivity.

The secrets/build/test/inventory.yml file wasnt encrypted with my key for some reason.

@ryanaslett
Copy link
Contributor

For new jenkins hosts we'll need to add https://github.com/nodejs/build/blob/main/doc/orka-vpn.md as another set of steps (until its automated).

The vpn is now connected again, and jobs are running.

@richardlau
Copy link
Member Author

Backups now appear to be working.

root@infra-mnx-ubuntu2204-x64-1:~# ls -al /data/backup/periodic/daily.0/ci-release.nodejs.org/jobs/iojs+release/builds/
...
drwxr-xr-x  3 root root 4096 Oct 19 09:51 10550
drwxr-xr-x  3 root root 4096 Oct 19 12:10 10551
drwxr-xr-x  3 root root 4096 Oct 20 09:52 10552
drwxr-xr-x  3 root root 4096 Oct 20 12:14 10553
drwxr-xr-x  3 root root 4096 Oct 20 16:12 10554
drwxr-xr-x  3 root root 4096 Oct 21 06:00 10555
drwxr-xr-x  3 root root 4096 Oct 21 10:00 10556
drwxr-xr-x  3 root root 4096 Oct 21 23:10 10557

@ryanaslett
Copy link
Contributor

I also opened a stub issue so we dont forget to look at that "individual contributor access token" for backups: #3939

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants