-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move ESA to K8s #2062
Comments
Notes on
|
Indexer startup issue: HashStore not yet initialized on fresh install, and indexers come up before metacat - so indexer HashStore lib tries to initialize it, but doesn't have write access: (should self-resolve after metacat pod is up and running, and has initialized HashStore)
|
Database related issuesMetacat startup error. Note that database name is "esa", not "metacat", although database.connectionURI=jdbc:postgresql://metacatesa-postgresql-hl/esa Next step - check debug output for correct props init Error:
This was because pg_hba.conf didn't have the right permissions (expected db name to be Metacat runs 2.19.0 -> 2.19.1 DB script, and the 2.19.1 -> 3.0.0 script successfully, but then failed on the 3.0.0 -> 3.1.0 script:
Discovered this is because ESA has the default value for esa=> \d db_version;
Table "public.db_version"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+----------------------------------------------
db_version_id | bigint | | not null | nextval('db_version_id_seq'::text::regclass) (note the evos=> \d db_version;
Table "public.db_version"
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+----------------------------------------
db_version_id | bigint | | not null | nextval('db_version_id_seq'::regclass) (note Fixed this by doing: ALTER TABLE db_version
ALTER COLUMN db_version_id
SET DEFAULT nextval('db_version_id_seq'::regclass); and then the conversions ran as expected |
MetacatUIstartup error - can't mount PVC due to permissions:
Solved: Incorrect
|
hashstore conversion errors201 failures, with this error: metacat 20250212-16:59:13: [ERROR]: Cannot move the object esa.34.1 to hashstore since null [edu.ucsb.nceas.metacat.admin.upgrade.HashStoreUpgrader:convert:541]
org.dataone.exceptions.MarshallingException: null
at org.dataone.service.util.TypeMarshaller.marshalTypeToOutputStream(TypeMarshaller.java:232) ~[d1_common_java-2.3.0.jar:?]
at org.dataone.service.util.TypeMarshaller.marshalTypeToOutputStream(TypeMarshaller.java:202) ~[d1_common_java-2.3.0.jar:?]
at edu.ucsb.nceas.metacat.admin.upgrade.HashStoreUpgrader.convertSystemMetadata(HashStoreUpgrader.java:491) ~[metacat.jar:?]
at edu.ucsb.nceas.metacat.admin.upgrade.HashStoreUpgrader.convert(HashStoreUpgrader.java:519) ~[metacat.jar:?]
at edu.ucsb.nceas.metacat.admin.upgrade.HashStoreUpgrader.lambda$upgrade$0(HashStoreUpgrader.java:258) ~[metacat.jar:?]
[...]
Caused by: javax.xml.bind.MarshalException
at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:301) ~[jaxb-runtime-2.3.2.jar:2.3.2]
at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:226) ~[jaxb-runtime-2.3.2.jar:2.3.2]
at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:80) ~[jakarta.xml.bind-api-2.3.2.jar:2.3.2]
at org.dataone.service.util.TypeMarshaller.marshalTypeToOutputStream(TypeMarshaller.java:229) ~[d1_common_java-2.3.0.jar:?]
... 9 more
Caused by: org.xml.sax.SAXParseException: cvc-pattern-valid: Value '' is not facet-valid with respect to pattern '[\s]*[\S][\s\S]*' for type 'NonEmptyString'.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source) ~[xercesImpl-2.12.2.jar:?]
[...]
at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:80) ~[jakarta.xml.bind-api-2.3.2.jar:2.3.2]
at org.dataone.service.util.TypeMarshaller.marshalTypeToOutputStream(TypeMarshaller.java:229) ~[d1_common_java-2.3.0.jar:?]
... 9 more With lots of help from Jing, we checked: esa=> select * from systemmetadata where guid='esa.34.1';
-- 1 record; looked fine - nothing missing
esa=> select * from xml_access where guid='esa.34.1';
-- 3 records; looked fine - nothing missing Tried getting the system metadata from the URL, on the original VM host: ...which showed an error: ...so then we checked the esa=> \x
Expanded display is on.
esa=> select * from smreplicationpolicy where guid='esa.34.1';
-[ RECORD 1 ]-------------
guid | esa.34.1
member_node | urn:node:KNB
policy | preferred
policy_id | 449
-[ RECORD 2 ]-------------
guid | esa.34.1
member_node |
policy | blocked
policy_id | 695 this is the problem: for esa=> select count(*) from smreplicationpolicy where member_node='';
count | 201
-- 201 conversion errors, and 201 blank fields!
esa=> select * from smreplicationpolicy where policy='blocked' and not member_node='';
(0 rows)
-- there are no blocked entries with a node id instead of being blank
esa=> select distinct member_node from smreplicationpolicy where policy='preferred';
-[ RECORD 1 ]-------------
member_node | urn:node:KNB ...and there were no restrictions set in # The default replication policy
dataone.replicationpolicy.default.numreplicas=0
dataone.replicationpolicy.default.preferredNodeList=
dataone.replicationpolicy.default.blockedNodeList= ...so we deleted the troublesome entries: esa=> delete from smreplicationpolicy where member_node='' and policy='blocked';
DELETE 201
esa=> COMMIT; Finally, set the status back to 'pending': esa=> update version_history set storage_upgrade_status='pending' where status='1';
UPDATE 1
esa=> COMMIT; ...and restarted the pod. It converted those 201 with no problems. System metadata from the URL works fine on the new k8s host: https://esa-prod.test.dataone.org/esa/d1/mn/v2/meta/esa.34.1 |
Final step: deployed and all set up to point at prod CN. Nick sent an email to ESA to ask thenm to change the DNS to point to k8s. When that happens, it should switch over seamlessly, and we can take down the old version |
Tracking progress for moving https://data.esa.org/ from mn-ucsb-2.dataone.org to k8s prod cluster.
Add any notes to this issue, and follow checklist in sub-issue #2063
The text was updated successfully, but these errors were encountered: