Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move GOA to K8s #1996

Closed
artntek opened this issue Oct 22, 2024 · 8 comments
Closed

Move GOA to K8s #1996

artntek opened this issue Oct 22, 2024 · 8 comments
Assignees

Comments

@artntek
Copy link
Contributor

artntek commented Oct 22, 2024

Tracking progress for moving goa.nceas.ucsb.edu to k8s prod cluster.

Add any notes to this issue, and follow checklist in sub-issue #2061

@artntek artntek self-assigned this Oct 22, 2024
@artntek artntek converted this from a draft issue Oct 22, 2024
@artntek
Copy link
Contributor Author

artntek commented Oct 22, 2024

Initial rsync done.

Notes

  • ceph is not mounted on GOA host (mn-ucsb-2.dataone.org), and I'm rsyncing across hosts to datateam using brooke login (see commands below).
  • Therefore, need to log into datateam, and chown -R brooke:brooke on /mnt/ceph/repos/goa/metacat and /mnt/ceph/repos/goa/postgresql destination ceph directories, before running the rsync on mn-ucsb-2
  • After comleting rsync, need to log into datateam, and chown back to 59997 and 59996

Commands

$ time sudo rsync -aHAX --delete /var/goa/data/ [email protected]:/mnt/ceph/repos/goa/metacat/data/
real	1m14.854s
user	0m0.159s
sys	0m0.104s

brooke@mn-ucsb-2:~$ time sudo rsync -aHAX --delete /var/goa/documents/ [email protected]:/mnt/ceph/repos/goa/metacat/documents/
real	0m5.177s
user	0m0.144s
sys	0m0.081s

brooke@mn-ucsb-2:~$ time sudo rsync -aHAX --delete /var/goa/logs/ [email protected]:/mnt/ceph/repos/goa/metacat/logs/
real	0m2.261s
user	0m0.114s
sys	0m0.052s

brooke@mn-ucsb-2:~$ time sudo rsync -aHAX --delete /var/lib/postgresql/14/ [email protected]:/mnt/ceph/repos/goa/postgresql/14/
real	1m8.735s
user	0m10.161s
sys	0m24.387s

@artntek artntek added this to the 3.1.0 milestone Oct 22, 2024
@artntek artntek moved this from In Progress to Todo in Metacat Releases Nov 28, 2024
@artntek artntek removed the status in Metacat Releases Nov 28, 2024
@artntek artntek modified the milestones: 3.1.0, 3.1.0-deployment Dec 10, 2024
@artntek artntek moved this to Todo in Metacat Releases Feb 4, 2025
@artntek artntek moved this from Todo to In Progress in Metacat Releases Feb 13, 2025
@artntek
Copy link
Contributor Author

artntek commented Feb 14, 2025

ok - all finished and hashstore conversion errors taken care of. Changed values (external URL etc) to reflect live goa site, and pointed metacat at the prod CN,,, but at this point I could not get it to start up due to the following error:

metacat 20250213-23:38:41: [INFO]: nodeIdMatchesClientCert(): TRUE! (nodeId: urn:node:GOA; Common Name (CN) from client cert: urn:node:GOA [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:nodeIdMatchesClientCert:268]
metacat 20250213-23:38:41: [INFO]: Sending updated node capabilities to DataONE CN: https://cn.dataone.org/cn/v2 [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:updateCN:403]
metacat 20250213-23:38:41: [INFO]: temp outputFile is: /usr/local/tomcat/temp/mmp.output.6491576582861615276.tmp [org.dataone.mimemultipart.SimpleMultipartEntity:generateTempFile:211]
[...]
metacat 20250213-23:38:42: [DEBUG]: Starting handshake [org.apache.http.conn.ssl.SSLConnectionSocketFactory:createLayeredSocket:435]
metacat 20250213-23:38:42: [DEBUG]: Secure session established [org.apache.http.conn.ssl.SSLConnectionSocketFactory:verifyHostname:465]
metacat 20250213-23:38:42: [DEBUG]:  negotiated protocol: TLSv1.2 [org.apache.http.conn.ssl.SSLConnectionSocketFactory:verifyHostname:466]
metacat 20250213-23:38:42: [DEBUG]:  negotiated cipher suite: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 [org.apache.http.conn.ssl.SSLConnectionSocketFactory:verifyHostname:467]
metacat 20250213-23:38:42: [DEBUG]:  peer principal: CN=cn.dataone.org [org.apache.http.conn.ssl.SSLConnectionSocketFactory:verifyHostname:475]
metacat 20250213-23:38:42: [DEBUG]:  peer alternative names: [cn-orc-1.dataone.org, cn-secondary.dataone.org, cn-ucsb-1.dataone.org, cn.dataone.org] [org.apache.http.conn.ssl.SSLConnectionSocketFactory:verifyHostname:484]
metacat 20250213-23:38:42: [DEBUG]:  issuer principal: CN=R10, O=Let's Encrypt, C=US [org.apache.http.conn.ssl.SSLConnectionSocketFactory:verifyHostname:488]
metacat 20250213-23:38:42: [DEBUG]: Connection established 192.168.197.198:45274<->128.111.85.180:443 [org.apache.http.impl.conn.DefaultHttpClientConnectionOperator:connect:146]
metacat 20250213-23:38:42: [DEBUG]: http-outgoing-2: set socket timeout to 30000 [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:setSocketTimeout:88]
metacat 20250213-23:38:42: [DEBUG]: Executing request PUT /cn/v2/node/urn:node:GOA HTTP/1.1 [org.apache.http.impl.execchain.MainClientExec:execute:255]
metacat 20250213-23:38:42: [DEBUG]: Target auth state: UNCHALLENGED [org.apache.http.impl.execchain.MainClientExec:execute:260]
metacat 20250213-23:38:42: [DEBUG]: Proxy auth state: UNCHALLENGED [org.apache.http.impl.execchain.MainClientExec:execute:266]
metacat 20250213-23:38:42: [DEBUG]: http-outgoing-2 >> PUT /cn/v2/node/urn:node:GOA HTTP/1.1 [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:onRequestSubmitted:133]
metacat 20250213-23:38:42: [DEBUG]: http-outgoing-2 >> Content-Length: 2882 [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:onRequestSubmitted:136]
metacat 20250213-23:38:42: [DEBUG]: http-outgoing-2 >> Content-Type: multipart/form-data; boundary=ATO-ok0tCzQ3O1jM4FOQRGXxvFwGbiZ5wh2SbP; charset=US-ASCII [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:onRequestSubmitted:136]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> Host: cn.dataone.org [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:onRequestSubmitted:136]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> Connection: Keep-Alive [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:onRequestSubmitted:136]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> User-Agent: Apache-HttpClient/4.5.13 (Java/17.0.13) [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:onRequestSubmitted:136]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> Accept-Encoding: gzip,deflate [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:onRequestSubmitted:136]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> Via: 1.1 localhost (Apache-HttpClient/4.5.13 (cache)) [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:onRequestSubmitted:136]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> "PUT /cn/v2/node/urn:node:GOA HTTP/1.1[\r][\n]" [org.apache.http.impl.conn.Wire:wire:73]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> "Content-Length: 2881[\r][\n]" [org.apache.http.impl.conn.Wire:wire:73]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> "Content-Type: multipart/form-data; boundary=P-2RPzJHT3IpmFRQvMc2TkBHV89dhCJkk0RaL; charset=US-ASCII[\r][\n]" [org.apache.http.impl.conn.Wire:wire:73
[...]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> "</ns3:node>[\n]" [org.apache.http.impl.conn.Wire:wire:73]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> "[\r][\n]" [org.apache.http.impl.conn.Wire:wire:73]
metacat 20250213-23:34:45: [DEBUG]: http-outgoing-2 >> "--P-2RPzJHT3IpmFRQvMc2TkBHV89dhCJkk0RaL--[\r][\n]" [org.apache.http.impl.conn.Wire:wire:73]
metacat 20250213-23:34:53: [DEBUG]: Connection [id:1][route:{s}->https://cn.dataone.org:443][state:CN=urn:node:GOA, DC=dataone, DC=org] expired @ Thu Feb 13 23:34:50 UTC 2025 [org.apache.http.impl.conn.CPoolEntry:isExpired:82]
metacat 20250213-23:34:53: [DEBUG]: http-outgoing-1: Close connection [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:close:79]
metacat 20250213-23:34:53: [DEBUG]: Connection [id:0][route:{s}->https://cn.dataone.org:443][state:CN=urn:node:GOA, DC=dataone, DC=org] expired @ Thu Feb 13 23:34:49 UTC 2025 [org.apache.http.impl.conn.CPoolEntry:isExpired:82]
metacat 20250213-23:34:53: [DEBUG]: http-outgoing-0: Close connection [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:close:79]
metacat 20250213-23:35:15: [DEBUG]: http-outgoing-2 << "[read] I/O error: Read timed out" [org.apache.http.impl.conn.Wire:wire:87]
metacat 20250213-23:35:15: [DEBUG]: http-outgoing-2: Close connection [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:close:79]
metacat 20250213-23:35:15: [DEBUG]: http-outgoing-2: Shutdown connection [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:shutdown:96]
metacat 20250213-23:35:15: [DEBUG]: Connection discarded [org.apache.http.impl.execchain.ConnectionHolder:abortConnection:129]
metacat 20250213-23:35:15: [ERROR]: Calling CNode.updateNodeCapabilities() with CN URL: https://cn.dataone.org/cn/v2, and nodeId: urn:node:GOA; error message was: class org.dataone.client.exception.ClientSideException: /Read timed out [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:updateCN:411]
org.dataone.service.exceptions.ServiceFailure: class org.dataone.client.exception.ClientSideException: /Read timed out
	at org.dataone.client.rest.HttpMultipartRestClient.doPutRequest(HttpMultipartRestClient.java:448) ~[d1_libclient_java-2.3.1.jar:?]
	at org.dataone.client.v2.impl.MultipartCNode.updateNodeCapabilities(MultipartCNode.java:1585) ~[d1_libclient_java-2.3.1.jar:?]
	at edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater.updateCN(D1AdminCNUpdater.java:406) [metacat.jar:?]
	at edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater.configPreregisteredMN(D1AdminCNUpdater.java:182) [metacat.jar:?]
	at edu.ucsb.nceas.metacat.admin.D1Admin.upRegD1MemberNode(D1Admin.java:433) [metacat.jar:?]
	at edu.ucsb.nceas.metacat.startup.K8sAdminInitializer.initK8sD1Admin(K8sAdminInitializer.java:127) [metacat.jar:?]
	at edu.ucsb.nceas.metacat.startup.K8sAdminInitializer.initializeK8sInstance(K8sAdminInitializer.java:53) [metacat.jar:?]
	at edu.ucsb.nceas.metacat.startup.MetacatInitializer.initAfterMetacatConfig(MetacatInitializer.java:162) [metacat.jar:?]
	at edu.ucsb.nceas.metacat.startup.MetacatInitializer.contextInitialized(MetacatInitializer.java:103) [metacat.jar:?]
	at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4005) [catalina.jar:9.0.96]
[...]
Caused by: org.dataone.client.exception.ClientSideException: /Read timed out
	... 50 more
Caused by: java.net.SocketTimeoutException: Read timed out
	at sun.nio.ch.NioSocketImpl.timedRead(Unknown Source) ~[?:?]
[...]
metacat 20250213-23:39:12: [ERROR]: configPreregisteredMN: *** FAILED *** to push an update of Member Node settings to the CN (nodeId unchanged) [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:configPreregisteredMN:187]
metacat 20250213-23:39:12: [ERROR]: initializeContainerizedD1Admin(): error calling D1Admin.getInstance().upRegD1MemberNode: configPreregisteredMN: *** FAILED *** to push an update of Member Node settings to the CN (nodeId unchanged) [edu.ucsb.nceas.metacat.startup.K8sAdminInitializer:initK8sD1Admin:131]
edu.ucsb.nceas.metacat.admin.AdminException: configPreregisteredMN: *** FAILED *** to push an update of Member Node settings to the CN (nodeId unchanged)
	at edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater.configPreregisteredMN(D1AdminCNUpdater.java:188) ~[metacat.jar:?]
	at edu.ucsb.nceas.metacat.admin.D1Admin.upRegD1MemberNode(D1Admin.java:433) ~[metacat.jar:?]
	at edu.ucsb.nceas.metacat.startup.K8sAdminInitializer.initK8sD1Admin(K8sAdminInitializer.java:127) [metacat.jar:?]
[...]

This implies that the attempted PUT to https://cn.dataone.org/cn/v2/node/urn:node:GOA manages to complete the connection/SSL handshake, but then seems to time out waiting for the response to the PUT

@artntek
Copy link
Contributor Author

artntek commented Feb 14, 2025

I am able to curl (GET) the cn from a container in the same namespace as the metacat pod:

root@busybox-netutils:/# curl https://cn.dataone.org/cn/v2/node/urn:node:GOA

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl"
  href="/cn/xslt/dataone.types.v2.xsl" ?>
<ns3:node xmlns:ns2="http://ns.dataone.org/service/types/v1"
  xmlns:ns3="http://ns.dataone.org/service/types/v2.0" replicate="false" synchronize="true"
  type="mn" state="up">
    <identifier>urn:node:GOA</identifier>
    <name>Gulf of Alaska Data Portal</name>
[...etc]

@taojing2002
Copy link
Contributor

taojing2002 commented Feb 14, 2025 via email

@artntek
Copy link
Contributor Author

artntek commented Feb 19, 2025

May you try to update the node document by the curl command like this:

curl --cert your-node-certificate -X PUT -F @.***' "
https://cn-stage.test.dataone.org/cn/v2/node"

Not sure about this - let's discuss in person:

  • the form clause is incorrectly formatted: PUT -F ***@***.***'. I think curl is looking for key=value pairs? Assuming it should be contactSubject=...

  • I tried with the GOA cert, but I got a CA error, because cn-stage.test.dataone.org uses the test CA.

    tomcat@metacatgoa-0:~$ curl --cert _DELETEME_GOA_node.crt  -X PUT -F 'contactSubject=***@***.***' "https://cn-stage.test.dataone.org/cn/v2/node"
    curl: (56) OpenSSL SSL_read: error:0A000418:SSL routines::tlsv1 alert unknown ca, errno 0
  • I tried with the test.arcticdata.io node cert, which uses the test CA, but got another error:

    tomcat@metacatgoa-0:~$ curl --cert _DELETEME_TESTARCTIC_node.crt  -X PUT -F '[email protected]' "https://cn-stage.test.dataone.org/cn/v2/node"
    <?xml version="1.0" encoding="UTF-8"?>
    <error detailCode="500" errorCode="500" name="ServiceFailure">
      <description>Internal Server Error: The server encountered an unexpected condition which
                   prevented it from fulfilling the request.</description>
    </error>

(...and in reading the CN API docs, it appears I was liucky it failed, because if you update just one field like this, it would overwrite all the other non-specified fields and make them empty 😅)

@artntek
Copy link
Contributor Author

artntek commented Feb 19, 2025

One change since the above thread, though: The pod is now up and running, because the CN check failed in a different way:

metacat 20250218-14:10:45: [ERROR]: Could not check for node with DataONE (500/000): Could
    not get CNode from the underlying context (D1Client.CN_URL)
        [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:isNodeRegistered:497]
metacat 20250218-14:10:45: [INFO]: * * * Handling config changes: UNREGISTERED D1 MEMBER
    NODE... [edu.ucsb.nceas.metacat.admin.D1Admin:upRegD1MemberNode:435]

I double-checked that D1Client.CN_URL was correctly set to https://cn.dataone.org/cn, so perhaps this was just some kind of network glitch or something? I'm guessin gif I recreate the pod, it will go back to the startup error listed above, so for now, I've kept the pod intact for testing.

More context from logs:

metacat 20250218-14:10:14: [INFO]: Running in a container; calling D1Admin::upRegD1MemberNode [edu.ucsb.nceas.metacat.startup.K8sAdminInitializer:initK8sD1Admin:125]

metacat 20250218-14:10:14: [DEBUG]: Get the Node description. [edu.ucsb.nceas.metacat.admin.D1Admin:upRegD1MemberNode:412]

metacat 20250218-14:10:14: [DEBUG]: SystemUtil.getServer - goa.nceas.ucsb.edu [edu.ucsb.nceas.metacat.util.SystemUtil:getServer:157]

metacat 20250218-14:10:14: [DEBUG]: SystemUtil.getServerURL - https://goa.nceas.ucsb.edu [edu.ucsb.nceas.metacat.util.SystemUtil:getServerURL:140]

metacat 20250218-14:10:14: [DEBUG]: SystemUtil.getServer - goa.nceas.ucsb.edu [edu.ucsb.nceas.metacat.util.SystemUtil:getServer:157]

metacat 20250218-14:10:14: [DEBUG]: SystemUtil.getServerURL - https://goa.nceas.ucsb.edu [edu.ucsb.nceas.metacat.util.SystemUtil:getServerURL:140]

metacat 20250218-14:10:14: [DEBUG]: DataONE MN Client certificate set: /var/metacat/certs/d1client.crt [edu.ucsb.nceas.metacat.admin.D1Admin:upRegD1MemberNode:428]

metacat 20250218-14:10:15: [DEBUG]: CookieSpec selected: default [org.apache.http.client.protocol.RequestAddCookies:process:123]

metacat 20250218-14:10:15: [DEBUG]: Auth cache not set in the context [org.apache.http.client.protocol.RequestAuthCache:process:77]

metacat 20250218-14:10:15: [DEBUG]: Cache miss [org.apache.http.impl.client.cache.CachingExec:execute:264]

metacat 20250218-14:10:15: [DEBUG]: Opening connection {s}->https://cn.dataone.org:443 [org.apache.http.impl.execchain.MainClientExec:execute:234]

metacat 20250218-14:10:15: [DEBUG]: Connecting to cn.dataone.org/128.111.85.180:443 [org.apache.http.impl.conn.DefaultHttpClientConnectionOperator:connect:139]

metacat 20250218-14:10:15: [DEBUG]: Connecting socket to cn.dataone.org/128.111.85.180:443 with timeout 30000 [org.apache.http.conn.ssl.SSLConnectionSocketFactory:connectSocket:366]

metacat 20250218-14:10:15: [DEBUG]: Enabled protocols: [TLSv1.2] [org.apache.http.conn.ssl.SSLConnectionSocketFactory:createLayeredSocket:430]

metacat 20250218-14:10:15: [DEBUG]: Enabled cipher suites:[TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256, TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256, TLS_DHE_DSS_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_DSS_WITH_AES_128_GCM_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_AES_256_CBC_SHA256, TLS_DHE_DSS_WITH_AES_256_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_DSS_WITH_AES_128_CBC_SHA256, TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_DHE_DSS_WITH_AES_256_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_DSS_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA256, TLS_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_EMPTY_RENEGOTIATION_INFO_SCSV] [org.apache.http.conn.ssl.SSLConnectionSocketFactory:createLayeredSocket:431]

metacat 20250218-14:10:15: [DEBUG]: Starting handshake [org.apache.http.conn.ssl.SSLConnectionSocketFactory:createLayeredSocket:435]

metacat 20250218-14:10:45: [DEBUG]: http-outgoing-0: Shutdown connection [org.apache.http.impl.conn.LoggingManagedHttpClientConnection:shutdown:96]

metacat 20250218-14:10:45: [DEBUG]: Connection discarded [org.apache.http.impl.execchain.ConnectionHolder:abortConnection:129]

metacat 20250218-14:10:45: [ERROR]: Could not check for node with DataONE (500/000): Could not get CNode from the underlying context (D1Client.CN_URL) [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:isNodeRegistered:497]

metacat 20250218-14:10:45: [INFO]: * * * Handling config changes: UNREGISTERED D1 MEMBER NODE... [edu.ucsb.nceas.metacat.admin.D1Admin:upRegD1MemberNode:435]

metacat 20250218-14:10:45: [DEBUG]: getMostRecentNodeId() returning: urn:node:GOA [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:getMostRecentNodeId:474]

metacat 20250218-14:10:45: [DEBUG]: configUnregisteredMN(): called with nodeId: urn:node:GOA. Most recent previous nodeId was: urn:node:GOA [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:configUnregisteredMN:80]

metacat 20250218-14:10:45: [DEBUG]: canChangeNodeId(): Containerized/Kubernetes deployment detected [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:canChangeNodeId:295]

metacat 20250218-14:10:45: [DEBUG]: canChangeNodeId(): returning false, since '.Values.metacat.dataone.autoRegisterMemberNode'=2525-02-13, and today's date in UTC timezone is: 2025-02-18 [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:canChangeNodeId:302]

metacat 20250218-14:10:45: [INFO]: configUnregisteredMN(): Only a LOCAL nodeId change will be performed, since operator consent to registered with the CN was not provided. If you wish to register Metacat as a DataONE Member Node, you must set the Property: 'metacat.dataone.autoRegisterMemberNode' to match today's date (in UTC timezone) in `values.yaml`. [edu.ucsb.nceas.metacat.admin.D1AdminCNUpdater:configUnregisteredMN:84]

@artntek
Copy link
Contributor Author

artntek commented Feb 19, 2025

SOLVED

Jing's a genius, yet again...

We tried two curl updates from my laptop to the prod CN:

  1. using the ESA node cert to update ESA node details. Took 2 seconds
  2. using the GOA node cert to update GOA node details. Took around 55 seconds

example:

time curl --cert GOA.pem -X PUT -F '[email protected]' "https://cn.test.dataone.org/cn/v2/node"

# where ./node.xml contains the xml doc from https://goa.nceas.ucsb.edu/goa/d1/mn/v2/node

My hypothesis is that the metacat/dataone code timeout is shorter than this, which is why it's failing.

The explanation?

<contactSubject>CN=Matt Jones A729,O=Google,C=US,DC=cilogon,DC=org</contactSubject>

The CN took a l-o-n-g time to do a lookup of all Matt's details, probably because he is a member of so many groups etc(??). Anyway, it generates a ton of log errors etc, and takes a long time. Once we replaced this with Chris' details instead, the PUT took 2 seconds, just like ESA...

@artntek
Copy link
Contributor Author

artntek commented Feb 25, 2025

All up and working, but currently based on an unreleased version of the helm chart in the develop branch (see changelog).

Need to redeploy with the official version after it is released - see #2083

@artntek artntek moved this from In Progress to Blocked in Metacat Releases Feb 25, 2025
@artntek artntek moved this from Blocked to Todo in Metacat Releases Feb 25, 2025
@artntek artntek closed this as completed Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants