Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 PartitionSet e2e #2642

Merged
merged 4 commits into from
Mar 1, 2023
Merged

Conversation

fgiloux
Copy link
Contributor

@fgiloux fgiloux commented Jan 18, 2023

Summary

This PR adds end-to-end tests for PartitionSet reconciliation.

Related issue(s)

Contributes to #2334

@openshift-ci openshift-ci bot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jan 18, 2023
@fgiloux
Copy link
Contributor Author

fgiloux commented Jan 18, 2023

/hold
It will need to be rebased after #2513 has merged

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 18, 2023
@fgiloux fgiloux force-pushed the partitionset-e2e branch 2 times, most recently from c587d0c to 4d12faa Compare February 8, 2023 21:44
@fgiloux
Copy link
Contributor Author

fgiloux commented Feb 9, 2023

/retest

@fgiloux fgiloux requested a review from p0lyn0mial February 9, 2023 06:32
@@ -35,6 +36,7 @@ func generatePartition(name string, matchExpressions []metav1.LabelSelectorRequi
for _, label := range labels {
pname = pname + "-" + strings.ToLower(matchLabels[label])
}
pname = pname[:min(validation.DNS1123SubdomainMaxLength-1, len(pname))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, there is a limit :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a unit test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is useful but I have added it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about collisions? This is why we use the hash in many other places.

Copy link
Contributor Author

@fgiloux fgiloux Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenerateName is used to avoid collisions. I have been requested to have the name carrying some way of identifying a partition. The information was previously set to labels.

@@ -89,10 +89,11 @@ func (c *controller) reconcile(ctx context.Context, partitionSet *topologyv1alph
}

var matchLabelsMap map[string]map[string]string
dimensions := removeDuplicates(partitionSet.Spec.Dimensions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put it into validation/cel ?
that way we wouldn't get any duplicates, right ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sttts wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no easy way of doing that with CEL. CEL is cautious with costly operations like list iteration, although it would be o(n) here.
The current implementation is tolerant to duplicates and still provide a predictable, not surprising for the user, outcome.

limitations under the License.
*/

package partition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename test/e2e/reconciler/partition/partitionset_test.go to test/e2e/reconciler/partitionset/partitionset_test.go ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that

}
return false, spew.Sdump(partitionSet.Status.Conditions)
}, wait.ForeverTestTimeout, 100*time.Millisecond, "expected valid partitionSet")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check that no Partition was created ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we should focus a bit more on the happy cases in e2e, exhaustive negative checking can be fragile as we've learned and requires private fixtures...

partitions, err = partitionClient.Cluster(partitionClusterPath).List(ctx, metav1.ListOptions{})
require.NoError(t, err, "error retrieving partitions")
if len(partitions.Items) == 1 {
return true, ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we examine Partition.Spec.Selector ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

partitions, err = partitionClient.Cluster(partitionClusterPath).List(ctx, metav1.ListOptions{})
require.NoError(t, err, "error retrieving partitions")
if len(partitions.Items) == 2 {
return true, ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check Partition.Spec.Selector ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to validate that we get the right ones ehre

return false, fmt.Sprintf("expected 1 partition, but got %d", len(partitions.Items))
}, wait.ForeverTestTimeout, 100*time.Millisecond, "expected 1 partition")

// The following tests are focused on the admission
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move it to a new separate test that simply checks admission

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of that but did not do it due to the cost of running a private cluster.

}
_, err = partitionSetClient.Cluster(partitionClusterPath).Create(ctx, errorPartitionSet, metav1.CreateOptions{})
require.Error(t, err, "error creating partitionSet expected")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check that a PartitionSet wasn't created ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this is standard CEL validation mechanism. If the expression is evaluated to false an error is returned and the resource is not created.

_, err = partitionSetClient.Cluster(partitionClusterPath).Create(ctx, errorPartitionSet, metav1.CreateOptions{})
require.Error(t, err, "error creating partitionSet expected")

t.Logf("Character not allowed at first place in matchExpressions values")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to check all incorrect values, i'd rely on k8s's validation.
From our perspective the most important thing is to make sure an invalid request will be rejected and won't have any side effect (i.e. deleting a partition).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no kubernetes validation taking place for the admission. This is handled by the CEL expressions I crafted.

_, err = partitionSetClient.Cluster(partitionClusterPath).Create(ctx, errorPartitionSet, metav1.CreateOptions{})
require.Error(t, err, "error creating partitionSet expected")

t.Logf("Partition name cut when the label values sum up")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with checking it on a unit test level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do that apart from copying the CEL expressions directly in the unit test and like, which does not bring value in my opinion.

Copy link
Contributor

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we should write these tests on the shared fixture. Generate your labels and shard names with random suffixes to allow these tests to co-exist even with other instances of the same test in parallel, and mark the shards you're creating as not ready for scehduling so as to not impact any other part of the system.

@@ -215,3 +216,16 @@ func partition(shards []*corev1alpha1.Shard, dimensions []string, shardSelectorL
}
return matchLabelsMap
}

// removeDuplicates makes sure that a value is not found more than once in the slice.
func removeDuplicates(slice []string) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sets.NewString(slice).List()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -35,6 +36,7 @@ func generatePartition(name string, matchExpressions []metav1.LabelSelectorRequi
for _, label := range labels {
pname = pname + "-" + strings.ToLower(matchLabels[label])
}
pname = pname[:min(validation.DNS1123SubdomainMaxLength-1, len(pname))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about collisions? This is why we use the hash in many other places.

@@ -48,3 +50,10 @@ func generatePartition(name string, matchExpressions []metav1.LabelSelectorRequi
},
}
}

func min(a, b int) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we don't need to accumulate this sort of logic. Just inline it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inlining is done by the compiler. I find this way more readable and it seems that it is not an uncommon practice.

t.Parallel()
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
server := framework.PrivateKcpServer(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

defer cancel()
server := framework.PrivateKcpServer(t)

// Create organization and workspace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an org anymore? Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't. It does not hurt either

shardClient := kcpClusterClient.CoreV1alpha1().Shards()
_, err = shardClient.Cluster(core.RootCluster.Path()).Create(ctx, shard1a, metav1.CreateOptions{})
require.NoError(t, err, "error creating shard")
framework.Eventually(t, func() (bool, string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework.EventuallyCondition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not exist when this test was written, does not allow to check values of multiple conditions.

return false, fmt.Sprintf("expected 1 partition, but got %d", partitionSet.Status.Count)
}, wait.ForeverTestTimeout, 100*time.Millisecond, "expected the partition count to be 1")
partitionClient := kcpClusterClient.TopologyV1alpha1().Partitions()
var partitions *topologyv1alpha1.PartitionList
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please continue to t.Logf() the steps

}
shard2, err = shardClient.Cluster(core.RootCluster.Path()).Create(ctx, shard2, metav1.CreateOptions{})
require.NoError(t, err, "error creating shard")
framework.Eventually(t, func() (bool, string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventuallycondition

partitions, err = partitionClient.Cluster(partitionClusterPath).List(ctx, metav1.ListOptions{})
require.NoError(t, err, "error retrieving partitions")
if len(partitions.Items) == 2 {
return true, ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to validate that we get the right ones ehre

@fgiloux
Copy link
Contributor Author

fgiloux commented Feb 10, 2023

IMHO we should write these tests on the shared fixture. Generate your labels and shard names with random suffixes to allow these tests to co-exist even with other instances of the same test in parallel, and mark the shards you're creating as not ready for scehduling so as to not impact any other part of the system.

I had proposed something similar in a past PR. I see you have relaunched a discussion in slack on that. The change in the approach is out of scope of this PR.

@stevekuznetsov
Copy link
Contributor

Sorry, I don't think we need to check in more tests using private fixtures. We're already at critical mass with e2e flakiness and adding much more load there is not reasonable. How hard is it to implement some "not-ready-for-scheduling" field on the shard? We're in alpha versions, can iterate.

@stevekuznetsov
Copy link
Contributor

/hold

@fgiloux
Copy link
Contributor Author

fgiloux commented Feb 10, 2023

Sorry, I don't think we need to check in more tests using private fixtures. We're already at critical mass with e2e flakiness and adding much more load there is not reasonable. How hard is it to implement some "not-ready-for-scheduling" field on the shard? We're in alpha versions, can iterate.

Great! Give me a shout when you agree on something.

@stevekuznetsov
Copy link
Contributor

Sounds like a field shard.spec.schedulable={true,false} would suffice, and we'd just need to update the workspace reconciler to choose only the schedulable subset when workspace.spec.location is unset.

@p0lyn0mial
Copy link
Contributor

FYI #2782 should unblock this PR.

@fgiloux fgiloux force-pushed the partitionset-e2e branch 3 times, most recently from 57f2bbc to 41c2d83 Compare February 23, 2023 09:49
@fgiloux
Copy link
Contributor Author

fgiloux commented Feb 23, 2023

FYI #2782 should unblock this PR.

This does not seem to be sufficient:

    apibinding_test.go:348: === Testing identity wildcards
    apibinding_test.go:379: Verify "root:e2e-workspace-wqp4k:consumer-1-bound-against-1" bound to service provider 1 ("root:e2e-workspace-wqp4k:service-provider-1") wildcard list works
    apibinding_test.go:351: Get APIBinding for workspace root:e2e-workspace-wqp4k:consumer-1-bound-against-1
    apibinding_test.go:360: Doing a wildcard identity list for wildwest.dev/v1alpha1, Resource=cowboys:6440e16b668082bc273a03875cca56af91f1855ab12089d0a1d79dfdd1e8019b against root:e2e-workspace-wqp4k:consumer-1-bound-against-1 workspace on shard partition-shard-1a
    apibinding_test.go:361: kubeconfig for shard "partition-shard-1a" not found

pushing a commit to fix it. I am not sure that there are no other issues.

@fgiloux fgiloux force-pushed the partitionset-e2e branch 6 times, most recently from c5ad38b to da8db7a Compare February 23, 2023 14:14
@fgiloux
Copy link
Contributor Author

fgiloux commented Feb 23, 2023

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 23, 2023
@fgiloux
Copy link
Contributor Author

fgiloux commented Feb 23, 2023

@p0lyn0mial @stevekuznetsov I have refactored the tests to use a shared server. I have also added a commit to patch some sensitivity to non-schedulable shards within the APIBinding tests.
Can you please have a look? I am hoping we can get this PR merged now.

@sttts
Copy link
Member

sttts commented Mar 1, 2023

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 1, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 1, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 1, 2023
@openshift-merge-robot openshift-merge-robot merged commit 4cb7736 into kcp-dev:main Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants