-
Notifications
You must be signed in to change notification settings - Fork 585
OCPEDGE-2084: Add PacemakerStatus CRD for two-node fencing #2544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hello @jaypoulz! Some important instructions when contributing to openshift/api: |
|
@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
58218ce to
96e327f
Compare
2ba442d to
29b9fec
Compare
|
@jaypoulz thank you for the PR, do you mind making the CI happy? |
29b9fec to
26f7821
Compare
|
Hi @saschagrunert :) Working on it! :D |
|
A few open questions I have:
That said, it doesn't work like a normal config - there's no spec and it shouldn't be created during bootstrap. The CRD just needs to be present when the CEO runs an cronjob to post an update to it.
|
b0ff230 to
1b57b09
Compare
b9b727f to
fdd53e9
Compare
|
Yeah, I'll ignore the CI failures for now, running
I'm new to API review, but my gut feeling tells me that a dedicated
You can also try to run it in a container by
Do you mind elaborating on that? Do you mean generating the code for the unions? API docs ref: https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md#writing-a-union-in-go @jaypoulz is there an OpenShift enhancement available for this change? |
etcd/tnf/v1alpha1/tests/pacemakerstatuses.tnf.etcd.openshift.io/DualReplica.yaml
Outdated
Show resolved
Hide resolved
|
@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
f4b651f to
1d19aca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@etcd/README.md`:
- Around line 85-89: Update the comment for fencingAgents -> conditions to match
the earlier clarification for resources: in the fencingAgents entry (symbol:
fencingAgents and the nested key conditions) replace the terse comment ("#
Required: Same 8 conditions as resources") with the expanded clarification used
for resources explaining that the conditions array must contain the same eight
named condition entries (not an omitted/empty placeholder) and show that the
empty "[]" in the example is only an abbreviated/placeholder form in the README;
mirror the exact wording/format used for resources so readers understand they
must supply the full condition list for each fencing agent.
- Around line 83-84: The example shows the Etcd resource with an empty
conditions array which contradicts the documented requirement of a minimum of 8
items; update the example entry for the "Etcd" resource so its "conditions"
field either contains a representative list of the required 8 (or more)
condition objects or replace the empty array with a clear inline comment like "#
abbreviated; show full 8+ conditions in real manifest" to avoid implying an
empty list is valid.
♻️ Duplicate comments (2)
etcd/README.md (1)
1-9: Address documentation duplication with v1alpha1 directory.A past reviewer noted this README may duplicate content that exists in the v1alpha1 directory. Consider consolidating to a single location to avoid maintenance drift—either keep detailed docs here and reference from v1alpha1, or vice versa.
openapi/generated_openapi/zz_generated.openapi.go (1)
26692-26756: Outstanding concern: CEL validation forlastUpdatedmonotonicity andMaxItemsfornodes.The schema description documents that
lastUpdatedcannot go backwards andnodesis limited to 32, but the generated schema doesn't show enforcement viax-kubernetes-validations(CEL) orMaxItems. This was flagged in a previous review and remains unaddressed.
🧹 Nitpick comments (7)
etcd/v1alpha1/tests/pacemakerclusters.etcd.openshift.io/DualReplica.yaml (1)
682-1102: IP validation tests cover key cases; consider expanding IPv6 coverage.The loopback and invalid format rejection tests are good. For completeness, consider adding tests for:
- Valid IPv6 address (e.g.,
2001:db8::1)- IPv6 loopback (
::1)- Link-local addresses (
169.254.x.x,fe80::)- Non-canonical IPv6 (e.g.,
2001:0db8::1with leading zeros)These can be added in a follow-up if the validation rules for these cases are already implemented in the CRD schema.
openapi/openapi.json (5)
4575-4587: Add length validation forAcceptRisk.name.The description mandates non-empty and ≤256 chars, but the schema only has
type: string. Consider enforcingminLength/maxLengthvia kubebuilder validation on the source type and regenerate the OpenAPI.Proposed OpenAPI shape (generated)
"name": { "description": "name is the name of the acceptable risk. It must be a non-empty string and must not exceed 256 characters.", - "type": "string" + "type": "string", + "minLength": 1, + "maxLength": 256 }
5875-5886: Enforce the stated 500-entry limit forconditionalUpdateRisks.The description says the list must not exceed 500 entries, but there’s no
maxItems. Consider adding it in the source type and regenerating.Proposed OpenAPI shape (generated)
"conditionalUpdateRisks": { "description": "conditionalUpdateRisks contains the list of risks associated with conditionalUpdates. ... conditionalUpdateRisks must not contain more than 500 entries.", "type": "array", + "maxItems": 500, "items": { "default": {}, "$ref": "#/definitions/com.github.openshift.api.config.v1.ConditionalUpdateRisk" },
6094-6102: AlignriskNamesconstraints with the doc.The description states max 500 entries and each name ≤256 chars, but the schema doesn’t enforce either. Consider adding
maxItemsandmaxLength(on items) in the source type.Proposed OpenAPI shape (generated)
"riskNames": { "description": "riskNames represents the set of the names ... Entries must be unique and must not exceed 256 characters. riskNames must not contain more than 500 entries.", "type": "array", + "maxItems": 500, "items": { "type": "string", - "default": "" + "default": "", + "maxLength": 256 }, "x-kubernetes-list-type": "set" },
6129-6140: Capconditionsto a single entry as described.The doc says “must not contain more than one entry,” but there’s no
maxItems. Consider enforcing this in the source type and regenerating.Proposed OpenAPI shape (generated)
"conditions": { "description": "conditions represents the observations ... conditions must not contain more than one entry.", "type": "array", + "maxItems": 1, "items": { "default": {}, "$ref": "#/definitions/io.k8s.apimachinery.pkg.apis.meta.v1.Condition" },
11639-11650: Enforce the 1000-entry limit foracceptRisks.The description requires ≤1000 entries, but the schema doesn’t specify
maxItems. Consider adding it in the source type and regenerating.Proposed OpenAPI shape (generated)
"acceptRisks": { "description": "acceptRisks is an optional set ... acceptRisks must not contain more than 1000 entries. Entries in this list must be unique.", "type": "array", + "maxItems": 1000, "items": { "default": {}, "$ref": "#/definitions/com.github.openshift.api.config.v1.AcceptRisk" },openapi/generated_openapi/zz_generated.openapi.go (1)
26758-26784: Consider adding an enum constraint for thetypefield.The description states only
InternalIPis currently supported, but there's no enum constraint to enforce this. For a status-only API written by a trusted controller this is low risk, but adding an enum in the source types would provide schema-level documentation and future-proof the API.💡 Suggested fix in source type (etcd/v1alpha1/types_pacemakercluster.go)
// +kubebuilder:validation:Enum=InternalIP Type string `json:"type"`Then regenerate with
./hack/update-codegen.sh.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (5)
etcd/v1alpha1/zz_generated.crd-manifests/0000_25_etcd_01_pacemakerclusters-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/*etcd/v1alpha1/zz_generated.crd-manifests/0000_25_etcd_01_pacemakerclusters-DevPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/*etcd/v1alpha1/zz_generated.crd-manifests/0000_25_etcd_01_pacemakerclusters-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/*etcd/v1alpha1/zz_generated.crd-manifests/doc.gois excluded by!**/zz_generated.crd-manifests/*etcd/v1alpha1/zz_generated.featuregated-crd-manifests/pacemakerclusters.etcd.openshift.io/DualReplica.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**
📒 Files selected for processing (18)
.golangci.yamletcd/README.mdetcd/install.goetcd/v1alpha1/Makefileetcd/v1alpha1/doc.goetcd/v1alpha1/register.goetcd/v1alpha1/tests/pacemakerclusters.etcd.openshift.io/DualReplica.yamletcd/v1alpha1/types_pacemakercluster.goetcd/v1alpha1/zz_generated.deepcopy.goetcd/v1alpha1/zz_generated.featuregated-crd-manifests.yamletcd/v1alpha1/zz_generated.swagger_doc_generated.gohack/update-payload-crds.shinstall.goopenapi/generated_openapi/zz_generated.openapi.goopenapi/openapi.jsonpayload-manifests/crds/0000_25_etcd_01_pacemakerclusters-CustomNoUpgrade.crd.yamlpayload-manifests/crds/0000_25_etcd_01_pacemakerclusters-DevPreviewNoUpgrade.crd.yamlpayload-manifests/crds/0000_25_etcd_01_pacemakerclusters-TechPreviewNoUpgrade.crd.yaml
🚧 Files skipped from review as they are similar to previous changes (6)
- hack/update-payload-crds.sh
- etcd/v1alpha1/doc.go
- etcd/install.go
- etcd/v1alpha1/zz_generated.featuregated-crd-manifests.yaml
- .golangci.yaml
- etcd/v1alpha1/register.go
🧰 Additional context used
🧬 Code graph analysis (4)
install.go (4)
etcd/install.go (1)
Install(17-17)etcd/v1alpha1/register.go (1)
Install(14-14)apiextensions/install.go (1)
Install(17-17)apiextensions/v1alpha1/register.go (1)
Install(14-14)
etcd/v1alpha1/zz_generated.deepcopy.go (1)
etcd/v1alpha1/types_pacemakercluster.go (7)
PacemakerCluster(465-480)PacemakerClusterStatus(486-522)PacemakerClusterFencingAgentStatus(610-651)PacemakerClusterList(706-715)PacemakerClusterNodeStatus(526-604)PacemakerNodeAddress(414-430)PacemakerClusterResourceStatus(662-697)
etcd/v1alpha1/zz_generated.swagger_doc_generated.go (1)
etcd/v1alpha1/types_pacemakercluster.go (7)
PacemakerCluster(465-480)PacemakerClusterFencingAgentStatus(610-651)PacemakerClusterList(706-715)PacemakerClusterNodeStatus(526-604)PacemakerClusterResourceStatus(662-697)PacemakerClusterStatus(486-522)PacemakerNodeAddress(414-430)
etcd/v1alpha1/types_pacemakercluster.go (1)
etcd/v1alpha1/zz_generated.swagger_doc_generated.go (6)
PacemakerClusterStatus(75-77)PacemakerClusterNodeStatus(54-56)PacemakerNodeAddress(85-87)PacemakerClusterResourceStatus(64-66)PacemakerClusterFencingAgentStatus(31-33)PacemakerCluster(20-22)
🪛 checkmake (0.2.2)
etcd/v1alpha1/Makefile
[warning] 1-1: Missing required phony target "all"
(minphony)
[warning] 1-1: Missing required phony target "clean"
(minphony)
🔇 Additional comments (45)
install.go (1)
58-58: LGTM!The
etcdpackage import andetcd.Installregistration follow the established pattern for API group wiring. Alphabetical ordering is correctly maintained in both the import block and the schemeBuilder list.Also applies to: 95-95
etcd/v1alpha1/Makefile (1)
1-3: LGTM - minimal test-focused Makefile is appropriate.The focused test target correctly delegates to the central test suite. The static analysis warnings about missing
allandcleantargets are false positives—this Makefile intentionally provides only a test entry point for the v1alpha1 API.Ensure a trailing newline is present at end of file to avoid editors auto-adding it (per past review feedback).
etcd/v1alpha1/tests/pacemakerclusters.etcd.openshift.io/DualReplica.yaml (2)
1-24: Solid test fixture foundation with good negative test coverage.The onCreate tests properly validate the singleton naming constraint. The metadata and feature gate configuration are correct.
27-438: Comprehensive positive test case validates full status structure.The full healthy cluster test exercises all required conditions at cluster, node, resource, and fencing agent levels. The expected block correctly mirrors the updated block, confirming the validation accepts properly structured status.
etcd/README.md (2)
21-36: Excellent documentation of status subresource design rationale.The explanation of why atomic create-with-status wasn't feasible, along with the two-step operation flow, provides valuable context for future maintainers. This addresses the design questions raised in past review discussions.
107-142: Well-structured condition reference tables.The condition tables clearly document the expected True/False states and corresponding reasons for cluster, node, and resource levels. This will be valuable for operators and developers working with the healthcheck controller.
openapi/openapi.json (8)
6351-6353: LGTM for the minTLSVersion example update.
8537-8538: LGTM for the Intermediate TLS profile description tweak.
8815-8816: LGTM for the Modern TLS profile description tweak.
9745-9746: LGTM for the Old TLS profile description tweak.
11323-11325: LGTM for the minTLSVersion example update.
11334-11352: LGTM for the TLS profile documentation expansion.
11685-11686: LGTM for the acceptedRisks description update.
14627-14858: LGTM for the new PacemakerCluster schema additions and detailed field docs.openapi/generated_openapi/zz_generated.openapi.go (6)
547-553: LGTM!The type registrations for the new etcd/v1alpha1 Pacemaker types follow the established pattern and maintain alphabetical ordering with adjacent entries.
26393-26434: LGTM!The
PacemakerClusterschema correctly defines a status-only resource with no spec field, aligning with the design for a singleton CRD populated via the status subresource. The compatibility level 4 designation appropriately signals alpha-level stability.
26436-26486: LGTM!The fencing agent status schema properly models STONITH device status with conditions as a list-map keyed by type. The required fields (conditions, name, method) ensure complete status reporting.
26488-26537: LGTM!Standard list type implementation following Kubernetes conventions.
26539-26645: LGTM!The node status schema comprehensively models pacemaker node state including conditions, addresses, resources, and fencing agents. The use of atomic list for addresses (where the first address is used for etcd peer URLs) and list-maps for resources/fencing agents (keyed by name) follows appropriate Kubernetes patterns.
26647-26690: LGTM!The resource status schema correctly models pacemaker resource state for Kubelet and Etcd resources.
etcd/v1alpha1/zz_generated.deepcopy.go (7)
13-42: Deepcopy logic for PacemakerCluster looks correct.
44-65: Deepcopy for PacemakerClusterFencingAgentStatus is correct.
67-98: PacemakerClusterList deepcopy methods look good.
100-140: PacemakerClusterNodeStatus deep copy correctly handles slices.
142-163: PacemakerClusterResourceStatus deepcopy looks correct.
165-194: PacemakerClusterStatus deepcopy aligns with fields.
196-209: PacemakerNodeAddress deepcopy is straightforward and correct.etcd/v1alpha1/zz_generated.swagger_doc_generated.go (7)
14-22: PacemakerCluster Swagger docs match the type description.
24-33: Fencing agent Swagger docs align with fields.
35-43: PacemakerClusterList Swagger docs are consistent.
45-56: Node status Swagger docs look accurate.
58-66: Resource status Swagger docs look accurate.
68-76: Cluster status Swagger docs align with the schema.
79-87: Node address Swagger docs are consistent with the type.etcd/v1alpha1/types_pacemakercluster.go (11)
10-73: Cluster condition types and reasons are consistent.
75-243: Node condition types and reasons are consistent.
245-399: Resource condition types and reasons are consistent.
401-430: Node address enum and validation look coherent with the docs.
432-446: Resource name enum values look correct.
448-480: PacemakerCluster singleton and status subresource annotations look good.
482-522: PacemakerClusterStatus schema and validations look aligned.
524-604: PacemakerClusterNodeStatus schema and validations look solid.
606-651: PacemakerClusterFencingAgentStatus schema looks correct.
653-697: PacemakerClusterResourceStatus schema looks correct.
699-715: PacemakerClusterList follows standard list conventions.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
72fdc82 to
715f512
Compare
| // NoDataTypeChange is disabled because controller-tools now simplifies | ||
| // allOf enum structures to simple enums, changing the type field from | ||
| // "" (unset in allOf entries) to "string". | ||
| toRemove := []string{"NoBools", "NoFloats", "NoUints", "NoMaps", "ConditionsMustHaveProperSSATags", "NoDataTypeChange"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@everettraven Are you happy that data type change is sufficiently covered by crdify now?
| // +kubebuilder:validation:XValidation:rule="self.exists(c, c.type == 'Healthy')",message="conditions must contain a condition of type Healthy" | ||
| // +kubebuilder:validation:XValidation:rule="self.exists(c, c.type == 'InService')",message="conditions must contain a condition of type InService" | ||
| // +kubebuilder:validation:XValidation:rule="self.exists(c, c.type == 'NodeCountAsExpected')",message="conditions must contain a condition of type NodeCountAsExpected" | ||
| // +required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is fine, and you will trigger a linter warning, which we will override. But this being required means that {} is not valid, so the linter should be telling you that you don't need a pointer. Can you try removing those linter exceptions and see if it does actually tell you to remove the pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I made the change above before checking if the linter would correct me on it. Let me revert that and get you a definitive answer.
So - yes - I can confirm that the linter gave the right guidance as I purposely messed with PacemakerClusterStatus which eventually led to the line you suggested above.
| // +required | ||
| Conditions []metav1.Condition `json:"conditions,omitempty"` | ||
|
|
||
| // name is the pacemaker resource name of the fencing agent (e.g., "master-0_redfish", "master-1_ipmi"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while, but, remind me, did we discuss validating this string? Is it always going to be a mixture of alphanumeric characters and some other punctuation? Would be good to limit it if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember if we discussed this one.
So the good news is that we (cluster-etcd-operator) are the ones creating these fencing resources, so we control the name. The name structure is "_"
Right now, the only supported fencing device type is redfish. In the future, we'll likely add support for ipmi, and potential other, more hardware specific options.
All that said, let me look up the API for k8s node names to see what validations are in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nodes are named as DNS 1123 subdomain, so alphanumerics, hyphens and periods. If this has underscores too, there is a variation that allows those, though we'd have to use a CEL matches with the regex. Perhaps we start with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'll see if I can smash together some validation for {valid-node-name}_{valid-fencing-type}. We can always relax that validation down the line, but as long as we can support _redfish we should be good for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I err'ed on the side of explicitly checking for _redfish in the current design. Since other fencing device support is a future feature, we can look to relax this requirement in a future iteration.
715f512 to
91cb132
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@openapi/openapi.json`:
- Around line 5875-5886: The schema for conditionalUpdateRisks currently states
a 500-entry cap in its description but lacks an actual maxItems constraint;
update the OpenAPI schema by adding "maxItems": 500 to the
conditionalUpdateRisks array definition (the property named
conditionalUpdateRisks that references
com.github.openshift.api.config.v1.ConditionalUpdateRisk) or, if the cap is not
required, soften the description to remove the hard limit; after changing the
source type (where ConditionalUpdateRisk is defined) regenerate the openapi.json
so the validation is enforced.
- Around line 6129-6140: The "conditions" array schema currently states it "must
not contain more than one entry" but doesn't enforce this; update the OpenAPI
schema for the conditions property to add "maxItems: 1" (i.e., add maxItems: 1
alongside type/items/x-kubernetes-list-* entries for the "conditions" property
that references io.k8s.apimachinery.pkg.apis.meta.v1.Condition) so the
single-entry constraint is validated by tooling rather than only in the
description.
- Around line 4575-4586: The schema for AcceptRisk.name lacks an enforced
256-char limit; update the Go type for AcceptRisk by adding the kubebuilder
validation tag to the Name field (e.g., +kubebuilder:validation:MaxLength=256 on
the Name field in the AcceptRisk struct) and then run the OpenAPI generator
(rebuild the OpenAPI JSON) so openapi.json includes "maxLength": 256 for
AcceptRisk.name.
- Around line 11639-11649: The OpenAPI schema for the acceptRisks array does not
enforce the documented 1000-entry cap; update the OpenAPI source so the
acceptRisks property includes "maxItems": 1000 (in the same object that defines
"type": "array" and "items") for the definition that references
com.github.openshift.api.config.v1.AcceptRisk, then regenerate the openapi.json
so the maxItems constraint appears in the output.
- Around line 6094-6102: The OpenAPI schema for the riskNames array lacks the
limits described in its description; update the schema for the riskNames field
(the array named "riskNames" used by conditionalUpdateRisks /
spec.desiredUpdate.acceptRisks) to add "maxItems": 500 on the array and add
"maxLength": 256 on each string item (i.e., the items object for riskNames),
then regenerate the OpenAPI artifact so the documented constraints are enforced
in the JSON.
In
`@payload-manifests/crds/0000_10_config-operator_01_authentications-DevPreviewNoUpgrade.crd.yaml`:
- Around line 472-475: Replace the brittle regex validation on discoveryURL with
the same user-info check used by issuerURL: change the x-kubernetes-validations
rule that currently uses '!self.matches(''^https://.+:.+@.+/.*$'')' to use the
same expression as issuerURL (i.e., ensure the rule enforces self.find('@') ==
'' so any '@' in the URL is rejected); keep the existing "must be a valid URL"
validation intact.
In
`@payload-manifests/crds/0000_10_config-operator_01_authentications-TechPreviewNoUpgrade.crd.yaml`:
- Around line 473-482: The current rule with message "discoveryURL must not
contain user info" uses a regex that requires a trailing path and misses URLs
without one; replace that rule value so it detects userinfo in any HTTPS URL
(i.e. match any presence of user:pass@ or any '@' after the scheme) — update the
rule associated with the "discoveryURL must not contain user info" message to a
pattern that checks for user info regardless of path (e.g. match an '@'
following the https:// authority) so URLs like "https://user:pass@host.com" are
rejected.
In
`@payload-manifests/crds/0000_25_etcd_01_pacemakerclusters-DevPreviewNoUpgrade.crd.yaml`:
- Around line 141-142: The CRD's node list documentation and maxItems (the
descriptive text "up to 32 nodes" and the YAML field maxItems: 32) do not match
the Go types/Swagger cap of 5 (max control-plane); to fix, decide which limit is
correct and make them consistent: either update the CRD YAML to set maxItems: 5
and change the human-readable comment to reflect "up to 5 nodes" or update the
Go types/Swagger and regenerate CRDs if 32 is intended; ensure you update the
field that defines the node list (maxItems) and regenerate CRD manifests (or
update the Go struct/tag and swagger annotations) so the runtime and schema
match.
♻️ Duplicate comments (3)
payload-manifests/crds/0000_25_etcd_01_pacemakerclusters-CustomNoUpgrade.crd.yaml (1)
173-176: Fix invalid CEL ip.isCanonical call (schema will fail validation).The rule uses
ip.isCanonical(self), which is invalid CEL; it should parse the string first and then callisCanonical()on the IP object. Please fix in the source type and regenerate all pacemakercluster CRD variants.✅ Suggested fix
- rule: isIP(self) && ip.isCanonical(self) && ip(self).isGlobalUnicast() + rule: isIP(self) && ip(self).isCanonical() && ip(self).isGlobalUnicast()payload-manifests/crds/0000_10_config-operator_01_authentications-CustomNoUpgrade.crd.yaml (1)
472-475: Same discoveryURL user-info gap as DevPreview variant.Please apply the same hardened check here (and any other generated variants) to keep schemas consistent.
openapi/openapi.json (1)
14667-14678: Cardinality limits are described but not enforced in schema.Several fields (e.g., Line 14668 for fencing‑agent conditions, Line 14740 for node conditions, Line 14768 for resources, Line 14821 for cluster conditions, Line 14837 for nodes) describe required min/max counts, but the schema doesn’t include
minItems/maxItems. If those limits are intended, please add the corresponding validation tags in the source types and regenerate; otherwise soften the wording.Also applies to: 14739-14778, 14820-14837
payload-manifests/crds/0000_10_config-operator_01_authentications-DevPreviewNoUpgrade.crd.yaml
Show resolved
Hide resolved
payload-manifests/crds/0000_10_config-operator_01_authentications-TechPreviewNoUpgrade.crd.yaml
Outdated
Show resolved
Hide resolved
payload-manifests/crds/0000_25_etcd_01_pacemakerclusters-DevPreviewNoUpgrade.crd.yaml
Outdated
Show resolved
Hide resolved
91cb132 to
52ada6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@etcd/README.md`:
- Around line 50-83: The conditions shown under status, nodes, and resources
(e.g., the arrays named conditions under status, nodes->conditions, and
resources->conditions, and resource names like Kubelet and Etcd) are abbreviated
and may imply only the type field is required; update the README example to add
a short clarifying note immediately above or beside each conditions block
stating that condition objects are abbreviated for brevity and that full
condition objects must include required fields such as status,
lastTransitionTime/lastUpdated, reason, and message (and any cluster-specific
required fields), and explicitly call out that the example only shows the type
field for readability while the real schema requires the full condition
structure.
In
`@payload-manifests/crds/0000_10_config-operator_01_authentications-CustomNoUpgrade.crd.yaml`:
- Around line 479-482: The CEL rules for discoveryURL currently call url(self)
directly which can throw on invalid strings; update both rules to guard with
isURL(self) by changing the expressions to use isURL(self) &&
url(self).getQuery().size() == 0 and isURL(self) && url(self).getScheme() ==
'https' so the url() calls only run when isURL(self) is true; locate these
checks tied to the discoveryURL validation messages and update the rule
expressions accordingly.
In
`@payload-manifests/crds/0000_10_config-operator_01_authentications-TechPreviewNoUpgrade.crd.yaml`:
- Around line 479-482: The CEL rules for discoveryURL currently call url(self)
directly which can throw on invalid values; update the two rules to guard url()
with isURL(self) (e.g., replace rule: url(self).getQuery().size() == 0 and rule:
url(self).getScheme() == 'https' with guarded forms using isURL(self) &&
url(self)...). Locate the discoveryURL validation rules in this CRD and prepend
each url(...) call with isURL(self) && so evaluation errors are avoided.
♻️ Duplicate comments (4)
payload-manifests/crds/0000_10_config-operator_01_authentications-TechPreviewNoUpgrade.crd.yaml (1)
473-475: Fix discoveryURL user‑info detection to catch URLs without a path.The current regex only matches when a trailing path exists, so values like
https://user@hostcan bypass. Align with the issuerURL rule (or an authority-only pattern) to catch all user‑info cases.🐛 Proposed fix
- - message: discoveryURL must not contain user info - rule: '!self.matches(''^https://.+:.+@.+/.*$'')' + - message: discoveryURL must not contain user info + rule: self.find('@') == ''payload-manifests/crds/0000_10_config-operator_01_authentications-CustomNoUpgrade.crd.yaml (1)
473-475: Fix discoveryURL user‑info detection to catch URLs without a path.Same pattern as the TechPreview variant—
https://user@hostcan slip through. Align with the issuerURL rule (or an authority-only pattern) to catch all user‑info cases.🐛 Proposed fix
- - message: discoveryURL must not contain user info - rule: '!self.matches(''^https://.+:.+@.+/.*$'')' + - message: discoveryURL must not contain user info + rule: self.find('@') == ''payload-manifests/crds/0000_25_etcd_01_pacemakerclusters-CustomNoUpgrade.crd.yaml (1)
171-175: Fix CEL IP canonical check syntax.CEL requires
ip(self).isCanonical();ip.isCanonical(self)will fail validation. Since this manifest is generated, update the source Go marker and regenerate the CRDs.🔧 Proposed fix
- rule: isIP(self) && ip.isCanonical(self) && ip(self).isGlobalUnicast() + rule: isIP(self) && ip(self).isCanonical() && ip(self).isGlobalUnicast()payload-manifests/crds/0000_10_config-operator_01_authentications-DevPreviewNoUpgrade.crd.yaml (1)
472-474: Discovery URL user-info check still missesuser@hostcases.The regex only blocks
user:pass@host/path; use the sameself.find('@') == ''check as issuerURL.🔧 Proposed fix
- - message: discoveryURL must not contain user info - rule: '!self.matches(''^https://.+:.+@.+/.*$'')' + - message: discoveryURL must not contain user info + rule: self.find('@') == ''
🧹 Nitpick comments (2)
openapi/generated_openapi/zz_generated.openapi.go (1)
26730-26749: Minor inconsistency:nodesarray items lack theDefaultproperty.Other array items in this file (e.g.,
resourcesat line 26611,fencingAgentsat line 26633) includeDefault: map[string]interface{}{}, butnodesitems do not. This is likely a code generation quirk based on how the source type is defined.Since
nodesis a required field and this is a generated file, this has no functional impact, but if you want consistency, verify the source type definition matches other similar fields.etcd/v1alpha1/types_pacemakercluster.go (1)
519-522: Consider using value slice instead of pointer slice.Line 521 defines
Nodesas[]*PacemakerClusterNodeStatus(pointer slice), but typically Kubernetes APIs use value slices[]PacemakerClusterNodeStatus. Pointer slices have several drawbacks:
- Individual slice items can be
nil, which has no clear semantic meaning- Adds complexity to DeepCopy implementations
- The CRD schema doesn't indicate items can be null
If the intent is to allow the slice itself to be empty (which
MinItems=0already permits), a value slice withomitemptyachieves the same goal.Suggested change
- Nodes []*PacemakerClusterNodeStatus `json:"nodes,omitempty"` + Nodes []PacemakerClusterNodeStatus `json:"nodes,omitempty"`
payload-manifests/crds/0000_10_config-operator_01_authentications-CustomNoUpgrade.crd.yaml
Outdated
Show resolved
Hide resolved
payload-manifests/crds/0000_10_config-operator_01_authentications-TechPreviewNoUpgrade.crd.yaml
Outdated
Show resolved
Hide resolved
52ada6c to
e6479a3
Compare
|
Should be ready for another review. I had to fix some integration tests due to the update to validation ordering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@openapi/openapi.json`:
- Around line 24952-24955: The schema for the property synchronizedAPI currently
only documents allowed values but doesn't enforce them; update the source type
that generates this OpenAPI (the struct/field that maps to synchronizedAPI) to
constrain it to an enum of "MachineAPI" and "ClusterAPI" and then regenerate the
OpenAPI spec, or if you prefer a softer change, update the description text to
say "expected values" instead of "Valid values"; ensure you apply this fix to
both occurrences of synchronizedAPI in the generated spec so the enforcement or
wording is consistent.
In
`@payload-manifests/crds/0000_10_config-operator_01_infrastructures-CustomNoUpgrade.crd.yaml`:
- Around line 275-279: The current CEL path validation rule using
url(self).getEscapedPath() is too restrictive (regex '^/(api/)?v[0-9]+/{0,1}$')
and rejects valid IBM Cloud endpoints with deeper path segments; update the rule
(the one paired with message: url path must match ...) to relax the regex so it
accepts an optional slash and any additional path segments after the version
(i.e., allow "/vN" or "/api/vN" followed by further path components), leaving
the scheme check url(self).getScheme() == "https" intact.
♻️ Duplicate comments (5)
payload-manifests/crds/0000_10_config-operator_01_authentications-TechPreviewNoUpgrade.crd.yaml (1)
473-482: Fix discoveryURL user-info detection and guardurl()calls.The current regex requires a path, so URLs like
https://user:pass@host.comcan slip through. Also,url(self)can raise on invalid values because CEL rules don’t short‑circuit. Consider a simpler@check and guard theurl()rules.🛠️ Suggested adjustment
- - message: discoveryURL must not contain user info - rule: '!self.matches(''^https://.+:.+@.+/.*$'')' + - message: discoveryURL must not contain user info + rule: self.find('@') == '' ... - - message: discoveryURL must not contain query parameters - rule: url(self).getQuery().size() == 0 - - message: discoveryURL must be a valid https URL - rule: url(self).getScheme() == 'https' + - message: discoveryURL must not contain query parameters + rule: isURL(self) && url(self).getQuery().size() == 0 + - message: discoveryURL must be a valid https URL + rule: isURL(self) && url(self).getScheme() == 'https'payload-manifests/crds/0000_10_config-operator_01_authentications-DevPreviewNoUpgrade.crd.yaml (1)
472-482: DiscoveryURL user-info validation still too narrow.The regex only blocks
user:pass@host/pathand missesuser@hostor URLs without a path. Consider using the sameself.find('@') == ''check asissuerURL.Suggested fix
- - message: discoveryURL must not contain user info - rule: '!self.matches(''^https://.+:.+@.+/.*$'')' + - message: discoveryURL must not contain user info + rule: self.find('@') == ''payload-manifests/crds/0000_10_config-operator_01_authentications-CustomNoUpgrade.crd.yaml (1)
473-482: Guard url(self) with isURL(self) to avoid CEL eval errors.
url(self)throws on invalid strings; each rule is evaluated independently. Please guard both rules withisURL(self).🛠️ Proposed fix
- - message: discoveryURL must not contain query parameters - rule: url(self).getQuery().size() == 0 + - message: discoveryURL must not contain query parameters + rule: isURL(self) && url(self).getQuery().size() == 0 - - message: discoveryURL must be a valid https URL - rule: url(self).getScheme() == 'https' + - message: discoveryURL must be a valid https URL + rule: isURL(self) && url(self).getScheme() == 'https'payload-manifests/crds/0000_25_etcd_01_pacemakerclusters-CustomNoUpgrade.crd.yaml (1)
171-174: Fix CEL syntax:ip.isCanonical(self)is invalid.
Useip(self).isCanonical()(parse first, then call method). Also update the source markers and regenerate CRDs so all variants stay consistent.🩹 Proposed fix
- rule: isIP(self) && ip.isCanonical(self) && ip(self).isGlobalUnicast() + rule: isIP(self) && ip(self).isCanonical() && ip(self).isGlobalUnicast()openapi/openapi.json (1)
14667-14846: Schema doesn’t enforce several documented list constraints.
Multiple fields describe required min/max counts (conditions/resources/fencingAgents/nodes/addresses) but the schema lacks correspondingminItems/maxItems(and IP/pattern constraints). Please add kubebuilder validation tags in the source Go types and regenerate, or soften the wording to avoid implying enforcement.
🧹 Nitpick comments (1)
etcd/v1alpha1/Makefile (1)
1-3: Consider adding minimalall/cleanphony targets to satisfy checkmake.
If the linter gates CI, stub targets or a lint config exception would silence warnings.
openapi/openapi.json
Outdated
| "synchronizedAPI": { | ||
| "description": "synchronizedAPI holds the last stable value of authoritativeAPI. It is used to detect migration cancellation requests and to restore the resource to its previous state. Valid values are \"MachineAPI\" and \"ClusterAPI\". When omitted, the resource has not yet been reconciled by the migration controller.", | ||
| "type": "string" | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constrain synchronizedAPI to the documented values (or soften wording).
The description says the valid values are "MachineAPI" and "ClusterAPI", but the schema doesn’t enforce it. Consider adding an enum in the source types and regenerate, or change the text to “expected values.”
🔧 Expected OpenAPI shape (apply to both occurrences)
"synchronizedAPI": {
"description": "synchronizedAPI holds the last stable value of authoritativeAPI. It is used to detect migration cancellation requests and to restore the resource to its previous state. Valid values are \"MachineAPI\" and \"ClusterAPI\". When omitted, the resource has not yet been reconciled by the migration controller.",
- "type": "string"
+ "type": "string",
+ "enum": [
+ "MachineAPI",
+ "ClusterAPI"
+ ]
},Also applies to: 25059-25062
🤖 Prompt for AI Agents
In `@openapi/openapi.json` around lines 24952 - 24955, The schema for the property
synchronizedAPI currently only documents allowed values but doesn't enforce
them; update the source type that generates this OpenAPI (the struct/field that
maps to synchronizedAPI) to constrain it to an enum of "MachineAPI" and
"ClusterAPI" and then regenerate the OpenAPI spec, or if you prefer a softer
change, update the description text to say "expected values" instead of "Valid
values"; ensure you apply this fix to both occurrences of synchronizedAPI in the
generated spec so the enforcement or wording is consistent.
| - message: url path must match /v[0,9]+ or /api/v[0,9]+ | ||
| rule: matches((url(self).getEscapedPath()), '^/(api/)?v[0-9]+/{0,1}$') | ||
| - message: url must use https scheme | ||
| rule: url(self).getScheme() == "https" | ||
| required: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, examine the CRD file structure around the lines in question
wc -l payload-manifests/crds/0000_10_config-operator_01_infrastructures-CustomNoUpgrade.crd.yamlRepository: openshift/api
Length of output: 152
🏁 Script executed:
# Get context around lines 275-279
sed -n '260,290p' payload-manifests/crds/0000_10_config-operator_01_infrastructures-CustomNoUpgrade.crd.yamlRepository: openshift/api
Length of output: 1737
🏁 Script executed:
# Search for IBMCloud or IBMCloudServiceEndpoint references in the codebase
rg -i 'ibmcloud|ibmcloudserviceendpoint' --type yaml -A 3 -B 3Repository: openshift/api
Length of output: 50370
🏁 Script executed:
# Check for any existing endpoint URL examples or test data
rg -i 'serviceendpoint|endpoint.*url' --type yaml -i | head -30Repository: openshift/api
Length of output: 5027
🏁 Script executed:
# Search for actual URL examples in test files or documentation
find . -type f \( -name "*.yaml" -o -name "*.json" -o -name "*.go" \) | head -20 | xargs grep -l "serviceEndpoint" 2>/dev/null | head -5Repository: openshift/api
Length of output: 39
🏁 Script executed:
# Look for Go test files or examples that might have IBMCloud endpoint examples
fd -t f "test.*\.go$|example.*\.go$" | xargs grep -l "IBMCloud\|serviceEndpoint" 2>/dev/null | head -5Repository: openshift/api
Length of output: 39
🏁 Script executed:
# Check CEL validation syntax by searching for other CEL rules in the CRDs
grep -A 2 "getEscapedPath\|getScheme\|url(self)" payload-manifests/crds/0000_10_config-operator_01_infrastructures-CustomNoUpgrade.crd.yamlRepository: openshift/api
Length of output: 366
🏁 Script executed:
# Search for any documentation or comments about URL patterns in the repo
rg -i "api/v|/v[0-9]" --type yaml -B 2 -A 2 | grep -i "ibmcloud\|endpoint" | head -20Repository: openshift/api
Length of output: 3396
🏁 Script executed:
# Search for other CEL validation examples in the CRDs to understand common patterns
rg "x-kubernetes-validations" -A 10 payload-manifests/crds/ | grep -A 5 "url\|path" | head -40Repository: openshift/api
Length of output: 5453
🏁 Script executed:
# Check if there's any version specification or documentation about CEL support
grep -r "kubernetes-validations\|CEL" --include="*.md" --include="*.txt" . 2>/dev/null | head -10Repository: openshift/api
Length of output: 1460
🏁 Script executed:
# Look for test cases or examples with actual endpoint URLs
rg "https://" --type yaml | grep -i "ibm\|endpoint" | head -20Repository: openshift/api
Length of output: 4051
🏁 Script executed:
# Extract the full context around the IBMCloud URL validation to understand the intent
sed -n '250,300p' payload-manifests/crds/0000_10_config-operator_01_infrastructures-CustomNoUpgrade.crd.yamlRepository: openshift/api
Length of output: 2716
🏁 Script executed:
# Search for specific examples of IBM Cloud service endpoint URLs in documentation
grep -B 5 -A 5 "private\.iam\.cloud\.ibm\.com\|us\.south\.private\.iaas\.cloud\.ibm\.com" payload-manifests/crds/0000_10_config-operator_01_infrastructures-CustomNoUpgrade.crd.yamlRepository: openshift/api
Length of output: 2077
🏁 Script executed:
# Let me test the regex against the documented example URLs
python3 << 'EOF'
import re
import urllib.parse
# The regex from the CRD
regex_pattern = r'^/(api/)?v[0-9]+/{0,1}$'
# Example URLs from the documentation
example_urls = [
"https://private.iam.cloud.ibm.com",
"https://us.south.private.iaas.cloud.ibm.com",
"https://private.iam.cloud.ibm.com/",
"https://private.iam.cloud.ibm.com/v1",
"https://private.iam.cloud.ibm.com/v1/",
"https://private.iam.cloud.ibm.com/api/v1",
"https://private.iam.cloud.ibm.com/identity/token",
"https://api.cloud.ibm.com/v2/resources"
]
regex = re.compile(regex_pattern)
print("Testing regex pattern:", regex_pattern)
print()
for url in example_urls:
parsed = urllib.parse.urlparse(url)
path = parsed.path if parsed.path else "/"
matches = bool(regex.match(path))
print(f"URL: {url}")
print(f" Path: '{path}'")
print(f" Matches: {matches}")
print()
EOFRepository: openshift/api
Length of output: 735
🏁 Script executed:
# Search for Kubernetes CEL documentation or version constraints
grep -r "apiVersion\|spec:" payload-manifests/crds/0000_10_config-operator_01_infrastructures-CustomNoUpgrade.crd.yaml | head -20Repository: openshift/api
Length of output: 136
🏁 Script executed:
# Check if url() and getEscapedPath/getScheme are used elsewhere in the codebase
rg "url\(self\)|getEscapedPath|getScheme" --type yaml | head -20Repository: openshift/api
Length of output: 3499
🏁 Script executed:
# Verify the suggested regex fix works better
python3 << 'EOF'
import re
import urllib.parse
# Original regex from the CRD
original_regex = r'^/(api/)?v[0-9]+/{0,1}$'
# Suggested fix from review
suggested_regex = r'^/(api/)?v[0-9]+(/.*)?$'
# Test cases
test_cases = [
"/v1",
"/v1/",
"/api/v1",
"/api/v1/",
"/v2/resources",
"/api/v2/identity/token",
"/v1/some/deep/path",
"/v",
"/api/v",
"/v1/v1/v1" # edge case
]
original = re.compile(original_regex)
suggested = re.compile(suggested_regex)
print("Comparing regexes:")
print(f"Original: {original_regex}")
print(f"Suggested: {suggested_regex}")
print()
for path in test_cases:
orig_match = bool(original.match(path))
sugg_match = bool(suggested.match(path))
diff = "🔄" if orig_match != sugg_match else "✓"
print(f"{diff} {path:30} | Original: {str(orig_match):5} | Suggested: {str(sugg_match):5}")
EOFRepository: openshift/api
Length of output: 850
The regex is too restrictive for IBM Cloud endpoints with deeper path segments.
The path rule '^/(api/)?v[0-9]+/{0,1}$' rejects valid URLs with additional path components (e.g., /v2/resources, /api/v2/identity/token). Real IBM Cloud service endpoints may include deeper paths beyond the version prefix. The CEL functions url(self).getEscapedPath() and getScheme() are available in Kubernetes validation.
Consider relaxing the regex to allow optional path segments after the version:
Suggested fix
- rule: matches((url(self).getEscapedPath()), '^/(api/)?v[0-9]+/{0,1}$')
+ rule: matches((url(self).getEscapedPath()), '^/(api/)?v[0-9]+(/.*)?$')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - message: url path must match /v[0,9]+ or /api/v[0,9]+ | |
| rule: matches((url(self).getEscapedPath()), '^/(api/)?v[0-9]+/{0,1}$') | |
| - message: url must use https scheme | |
| rule: url(self).getScheme() == "https" | |
| required: | |
| - message: url path must match /v[0,9]+ or /api/v[0,9]+ | |
| rule: matches((url(self).getEscapedPath()), '^/(api/)?v[0-9]+(/.*)?$') | |
| - message: url must use https scheme | |
| rule: url(self).getScheme() == "https" | |
| required: |
🤖 Prompt for AI Agents
In
`@payload-manifests/crds/0000_10_config-operator_01_infrastructures-CustomNoUpgrade.crd.yaml`
around lines 275 - 279, The current CEL path validation rule using
url(self).getEscapedPath() is too restrictive (regex '^/(api/)?v[0-9]+/{0,1}$')
and rejects valid IBM Cloud endpoints with deeper path segments; update the rule
(the one paired with message: url path must match ...) to relax the regex so it
accepts an optional slash and any additional path segments after the version
(i.e., allow "/vN" or "/api/vN" followed by further path components), leaving
the scheme check url(self).getScheme() == "https" intact.
e6479a3 to
42ff6d3
Compare
|
@jaypoulz: This pull request references OCPEDGE-2084 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
42ff6d3 to
d2afa1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@etcd/v1alpha1/Makefile`:
- Around line 1-3: The Makefile only defines a test target and is missing
minimal all and clean targets required by checkmake; update the Makefile to add
an all target (e.g., make all depends on test) and a clean target (no-op or
removing generated artifacts), and include both "all" and "clean" in the .PHONY
declaration alongside "test" so .PHONY lists all three targets and the new
targets are implemented (referencing the Makefile's existing "test" target when
wiring "all").
♻️ Duplicate comments (3)
payload-manifests/crds/0000_25_etcd_01_pacemakerclusters-CustomNoUpgrade.crd.yaml (1)
171-174: Fix invalid CEL call for canonical IP check (still present).
Same issue as previously noted:ip.isCanonical(self)is invalid; parse first withip(self). Regenerate from source so all variants stay aligned.🐛 Proposed fix
- rule: isIP(self) && ip.isCanonical(self) && ip(self).isGlobalUnicast() + rule: isIP(self) && ip(self).isCanonical() && ip(self).isGlobalUnicast()openapi/openapi.json (1)
14663-14842: Documented list-size constraints aren’t enforced in the schema.Lines 14663–14842 describe minimum/maximum item counts for several arrays (conditions, resources, fencingAgents, nodes), but the OpenAPI definitions don’t include corresponding
minItems/maxItems. This leaves the constraints unenforced for API consumers and validation tooling. Please add the appropriate kubebuilder validation markers on the source Go types (e.g.,+kubebuilder:validation:MinItems/MaxItems) and regenerate the OpenAPI.payload-manifests/crds/0000_25_etcd_01_pacemakerclusters-TechPreviewNoUpgrade.crd.yaml (1)
171-175: Verify CELip.isCanonicalusage against the target K8s CEL library.The rule uses
ip.isCanonical(self). Please confirm this is supported in the target Kubernetes/OCP version (some documentation and past reviews suggestip(self).isCanonical()instead).Kubernetes CEL IP library ip.isCanonical(self) versus ip(self).isCanonical() supported syntax
🧹 Nitpick comments (2)
tools/go.mod (1)
38-38: Document and time‑box the controller‑tools fork.The replace to a personal fork is a supply‑chain and maintainability risk unless clearly justified and tracked. Please add a short rationale (and upstream issue/rollback plan) near this replace and verify the fork still builds with the Go toolchain used here.
📌 Suggested inline note
-replace sigs.k8s.io/controller-tools => github.com/jaypoulz/controller-tools v0.0.0-20260119185850-e534bd0a65da +// TEMP: forked controller-tools for validation ordering fix; revert after upstream merges <issue/link> +replace sigs.k8s.io/controller-tools => github.com/jaypoulz/controller-tools v0.0.0-20260119185850-e534bd0a65dapayload-manifests/crds/0000_25_etcd_01_pacemakerclusters-TechPreviewNoUpgrade.crd.yaml (1)
401-425: Alignmethodvalidation with the “redfish-only” constraint.
nameis enforced to end with_redfish, butmethodis an unconstrained string. If the API truly only supports redfish today, consider constrainingmethodto match; otherwise, update the description to avoid implying a restriction.♻️ Possible schema tightening
method: description: |- method is the fencing method used by this agent (e.g., "redfish", "ipmi", "fence_aws"). This is extracted from the pacemaker resource agent type. maxLength: 63 minLength: 1 type: string + enum: + - redfish
| .PHONY: test | ||
| test: | ||
| make -C ../../tests test GINKGO_EXTRA_ARGS=--focus="etcd.openshift.io/v1alpha1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add minimal all/clean targets to satisfy checkmake.
Static analysis flags missing required phony targets; this can break linting if enforced.
🧹 Suggested Makefile tweaks
-.PHONY: test
+.PHONY: all clean test
+all: test
+clean:
+ `@true`
test:
make -C ../../tests test GINKGO_EXTRA_ARGS=--focus="etcd.openshift.io/v1alpha1"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| .PHONY: test | |
| test: | |
| make -C ../../tests test GINKGO_EXTRA_ARGS=--focus="etcd.openshift.io/v1alpha1" | |
| .PHONY: all clean test | |
| all: test | |
| clean: | |
| `@true` | |
| test: | |
| make -C ../../tests test GINKGO_EXTRA_ARGS=--focus="etcd.openshift.io/v1alpha1" |
🧰 Tools
🪛 checkmake (0.2.2)
[warning] 1-1: Missing required phony target "all"
(minphony)
[warning] 1-1: Missing required phony target "clean"
(minphony)
🤖 Prompt for AI Agents
In `@etcd/v1alpha1/Makefile` around lines 1 - 3, The Makefile only defines a test
target and is missing minimal all and clean targets required by checkmake;
update the Makefile to add an all target (e.g., make all depends on test) and a
clean target (no-op or removing generated artifacts), and include both "all" and
"clean" in the .PHONY declaration alongside "test" so .PHONY lists all three
targets and the new targets are implemented (referencing the Makefile's existing
"test" target when wiring "all").
| enum: | ||
| - "" | ||
| - RetentionNumber | ||
| - RetentionSize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is actually allowing more values than previously, should remove the ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any remaining changes referencing these files won't be mine - they belong in the patch that bumps the controller code. I can drop the current commit from here once we've merged the stable sort fix into openshift/k8s-sig-controller-tools, and we can clean up config/v1alpha/backup in that commit (and drop it from here).
config/v1alpha1/types_backup.go
Outdated
| // +kubebuilder:validation:Enum:="";"RetentionNumber";"RetentionSize" | ||
| // +enum | ||
| // +kubebuilder:validation:Enum:="RetentionNumber";"RetentionSize" | ||
| type RetentionType string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we put this back we reduce the diff here and maintain the previous behaviour
| // +kubebuilder:validation:Enum:="";"RetentionNumber";"RetentionSize" | |
| // +enum | |
| // +kubebuilder:validation:Enum:="RetentionNumber";"RetentionSize" | |
| type RetentionType string | |
| // +enum | |
| // +kubebuilder:validation:Enum:="RetentionNumber";"RetentionSize" | |
| type RetentionType string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2544 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I no longer see the code change resulting in this generation change, out of date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just re-gened fresh.
| // +kubebuilder:validation:MinItems=0 | ||
| // +kubebuilder:validation:MaxItems=5 | ||
| // +required | ||
| Nodes []*PacemakerClusterNodeStatus `json:"nodes,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the pointer?
| Nodes []*PacemakerClusterNodeStatus `json:"nodes,omitempty"` | |
| Nodes []PacemakerClusterNodeStatus `json:"nodes,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought if we allowed the zero value (empty list), we needed to be a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, for that you need
| Nodes []*PacemakerClusterNodeStatus `json:"nodes,omitempty"` | |
| Nodes *[]PacemakerClusterNodeStatus `json:"nodes,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops!
| // +kubebuilder:validation:MinLength=1 | ||
| // +kubebuilder:validation:MaxLength=63 | ||
| // +required | ||
| Method string `json:"method,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the only option for this is redfish right? Should we validate that? How pluggable are these that we wouldn't want to be coming back here to add new possible values in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redfish is currently the only allowed option.
I figured we'd treat these the same as topologies - each addition is a meaningful change that requires an update to the API.
They're easy to plug in from a pacemaker perspective, but require changes in CEO and the installer each time a new option is introduced.
I'm fine with being more flexible, but that was my thought process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, we should probably turn this into an enum right?
5d8f959 to
ae78d1e
Compare
…hift with Fencing Introduces etcd.openshift.io/v1alpha1 API group with a PacemakerCluster custom resource. This provides visibility into pacemaker cluster health for Two Node OpenShift with Fencing deployments. This status-only resource reports the health of pacemaker nodes and their managed resources (Kubelet, Etcd) along with fencing agent status. Each node tracks conditions, IP addresses for etcd peer URLs, and per-resource health using positive polarity conditions. The cluster-etcd-operator is responsible for interpreting this status and degrading appropriately when etcd is unhealthy or at risk of being unable to recover automatically in a quorum-loss event. Created with support from Claude Opus 4 (Anthropic)
ae78d1e to
1141715
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@etcd/README.md`:
- Around line 86-88: Update the fencing example in the README to match the
current CRD validation by showing redfish-only naming and method: change the
example name pattern to use the `_redfish` suffix (e.g., "master-0_redfish") and
set method to "redfish" (remove other method examples like ipmi or fence_aws);
reference the example keys `name` and `method` so reviewers can find and align
the README with the CRD's `_redfish` naming constraint.
In `@etcd/v1alpha1/types_pacemakercluster.go`:
- Around line 505-511: The `LastUpdated` field is declared as +required but its
JSON tag includes `omitempty`, which contradicts requiredness; update the
declaration for LastUpdated (metav1.Time) in types_pacemakercluster.go to be
consistent by removing `omitempty` from the
`json:"lastUpdated,omitempty,omitzero"` tag so it becomes
`json:"lastUpdated,omitzero"` (or alternatively change the kubebuilder marker to
+optional if you intend the field to be optional); apply the same pattern used
for other required metav1.Time fields like `LastTransitionTime`.
♻️ Duplicate comments (1)
openapi/generated_openapi/zz_generated.openapi.go (1)
26444-26465: Align documented invariants with enforced schema validations.Several descriptions assert minimum condition/resource counts, a node cap, and monotonic
lastUpdated, but I don’t seeminItems/maxItemsorx-kubernetes-validationsemitted here. If these constraints are intended, please add the corresponding kubebuilder markers in the Go types and regenerate; otherwise consider softening the wording to avoid implying enforcement.#!/bin/bash set -euo pipefail files=$(fd -a 'types_pacemakercluster.go' .) for file in $files; do echo "==> $file" rg -n -C2 'PacemakerCluster(FencingAgentStatus|NodeStatus|ResourceStatus|Status)' "$file" rg -n -C2 'kubebuilder:validation:(MinItems|MaxItems|Enum|XValidation)' "$file" rg -n -C2 'Conditions\s+\[\]|Resources\s+\[\]|Nodes\s+\[\]|LastUpdated' "$file" doneAlso applies to: 26547-26568, 26595-26606, 26655-26676, 26701-26749
🧹 Nitpick comments (3)
etcd/v1alpha1/tests/pacemakerclusters.etcd.openshift.io/DualReplica.yaml (1)
682-1934: Thorough IP address validation test coverage.The IP validation tests cover:
- IPv4 loopback rejection (127.0.0.1)
- Invalid format rejection ("not-an-ip")
- Valid IPv6 acceptance (2001:db8::1)
- IPv6 loopback rejection (::1)
- IPv6 link-local rejection (fe80::1)
This aligns with the
isIP(self) && ip.isCanonical(self) && ip(self).isGlobalUnicast()CEL validation rule onPacemakerNodeAddress.address.Consider adding tests for edge cases if time permits:
- Fencing agent name validation (
_redfishsuffix requirement)- Missing required resources (Kubelet/Etcd)
- Non-canonical IPv6 format (e.g.,
2001:0db8::1)etcd/README.md (1)
18-20: Make the status-only nature explicit.A short, explicit note that this CRD has no spec would prevent confusion for readers expecting a spec section.
📝 Suggested doc tweak
The `PacemakerCluster` resource provides visibility into the health and status of a Pacemaker-managed cluster. It is periodically updated by the cluster-etcd-operator's status collector. +It is a status-only resource; spec is intentionally omitted and must remain empty.payload-manifests/crds/0000_25_etcd_01_pacemakerclusters-DevPreviewNoUpgrade.crd.yaml (1)
401-421: Consider validatingmethodto match supported fencing method.
nameis validated to end with_redfish, butmethodaccepts any string. If only redfish is supported, constrainmethodtoo (preferably via Go type validation so the CRD is regenerated consistently).♻️ Suggested schema tightening
method: description: |- method is the fencing method used by this agent (e.g., "redfish", "ipmi", "fence_aws"). This is extracted from the pacemaker resource agent type. maxLength: 63 minLength: 1 + enum: + - redfish type: string
| - name: <nodename>_<method> # e.g., "master-0_redfish" | ||
| method: <method> # Fencing method: redfish, ipmi, fence_aws, etc. | ||
| conditions: [...] # Same 8 conditions as resources (abbreviated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align fencing method example with current validation (redfish-only).
The README lists multiple methods, but the CRD validation currently enforces _redfish naming. Either update the example to “redfish only” or relax the validation if multiple methods are intended.
📝 Suggested doc tweak
- method: <method> # Fencing method: redfish, ipmi, fence_aws, etc.
+ method: redfish # Currently only redfish is supported (aligns with CRD validation)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: <nodename>_<method> # e.g., "master-0_redfish" | |
| method: <method> # Fencing method: redfish, ipmi, fence_aws, etc. | |
| conditions: [...] # Same 8 conditions as resources (abbreviated) | |
| - name: <nodename>_<method> # e.g., "master-0_redfish" | |
| method: redfish # Currently only redfish is supported (aligns with CRD validation) | |
| conditions: [...] # Same 8 conditions as resources (abbreviated) |
🤖 Prompt for AI Agents
In `@etcd/README.md` around lines 86 - 88, Update the fencing example in the
README to match the current CRD validation by showing redfish-only naming and
method: change the example name pattern to use the `_redfish` suffix (e.g.,
"master-0_redfish") and set method to "redfish" (remove other method examples
like ipmi or fence_aws); reference the example keys `name` and `method` so
reviewers can find and align the README with the CRD's `_redfish` naming
constraint.
| // lastUpdated is the timestamp when this status was last updated. This is useful for identifying | ||
| // stale status reports. It must be a valid timestamp in RFC3339 format. Once set, this field cannot | ||
| // be removed and cannot be set to an earlier timestamp than the current value. | ||
| // +kubebuilder:validation:Format=date-time | ||
| // +required | ||
| LastUpdated metav1.Time `json:"lastUpdated,omitempty,omitzero"` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's check if this is in the etcd package mentioned
echo "=== Looking at the specific file mentioned ==="
if [ -f "etcd/v1alpha1/types_pacemakercluster.go" ]; then
sed -n '505,511p' etcd/v1alpha1/types_pacemakercluster.go
else
echo "File not found at specified path, searching for it..."
find . -name "types_pacemakercluster.go" -type f 2>/dev/null
fi
echo ""
echo "=== Searching for metav1.Time fields with both omitempty and omitzero ==="
rg -n "metav1\.Time.*json:.*omitempty.*omitzero|metav1\.Time.*json:.*omitzero.*omitempty" --type=go -B 1 | head -30
echo ""
echo "=== Searching for metav1.Time fields with just omitzero ==="
rg -n "metav1\.Time.*json:.*omitzero" --type=go -B 1 | grep -v "omitempty" | head -30
echo ""
echo "=== Searching for metav1.Time fields with just omitempty ==="
rg -n "metav1\.Time.*json:.*omitempty" --type=go -B 1 | grep -v "omitzero" | head -30Repository: openshift/api
Length of output: 3488
🏁 Script executed:
#!/bin/bash
echo "=== Checking context around all instances of omitempty,omitzero combination ==="
rg -B 2 "metav1\.Time.*omitempty.*omitzero|metav1\.Time.*omitzero.*omitempty" --type=goRepository: openshift/api
Length of output: 929
🏁 Script executed:
#!/bin/bash
echo "=== Searching for documentation or comments about omitzero ==="
rg -i "omitzero" --type=go -B 3 -A 3 | head -40
echo ""
echo "=== Check if there are metav1.Time fields with omitzero but no omitempty ==="
rg "metav1\.Time.*\`json:.*omitzero" --type=go | grep -v "omitempty"
echo ""
echo "=== Check the actual inconsistency: +required with omitempty ==="
rg -B 3 "json:.*omitempty" --type=go | rg -B 3 "\+required"Repository: openshift/api
Length of output: 37285
🏁 Script executed:
#!/bin/bash
echo "=== Checking other +required metav1.Time fields (if any) for comparison ==="
rg -B 3 "metav1\.Time.*json:" --type=go | rg -B 3 "\+required" | head -50Repository: openshift/api
Length of output: 2807
Remove omitempty or reconsider the +required marker.
The LastUpdated field is marked as +required but uses the omitempty JSON tag, which allows the field to be omitted during JSON serialization. This is contradictory. Other similar +required metav1.Time fields in the codebase (e.g., LastTransitionTime in operatorcontrolplane and machineconfiguration packages) omit the omitempty tag entirely. Either remove omitempty to enforce presence in JSON output, or change the marker to +optional if the field is truly optional.
🤖 Prompt for AI Agents
In `@etcd/v1alpha1/types_pacemakercluster.go` around lines 505 - 511, The
`LastUpdated` field is declared as +required but its JSON tag includes
`omitempty`, which contradicts requiredness; update the declaration for
LastUpdated (metav1.Time) in types_pacemakercluster.go to be consistent by
removing `omitempty` from the `json:"lastUpdated,omitempty,omitzero"` tag so it
becomes `json:"lastUpdated,omitzero"` (or alternatively change the kubebuilder
marker to +optional if you intend the field to be optional); apply the same
pattern used for other required metav1.Time fields like `LastTransitionTime`.
|
@jaypoulz: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Introduces tnf.etcd.openshift.io/v1alpha1 API group with PacemakerStatus custom resource. This provides visibility into Pacemaker cluster health for dual-replica etcd deployments. The status-only resource is populated by a privileged controller and consumed by the cluster-etcd-operator healthcheck controller. Not gated because it's only used by CEO when two-node has transitioned.
Works in conjunction with openshift/cluster-etcd-operator#1487
Depends on kubernetes-sigs/controller-tools#1324, which in turn needs a bump for go 1.25
In the meantime, I've vendored: openshift/kubernetes-sigs-controller-tools#33
Depends on #2657 #2658